A Novel Technique for Segmenting Data for Training an Ensemble Regressor
DOI:
https://doi.org/10.37934/araset.55.2.113Keywords:
Ensembles, ensemble regression, meta-unit, clustering, difference metricsAbstract
This work presents a novel method for training an ensemble regressor. The main idea of the work is to expose the meta-unit, as well as individual regressors, to all categories of data in the training set. This ensures that the ensemble is able to handle any input it receives because it has seen something like it before. However, in order to do this, we first need to categorize the training data into classes. We perform the categorization by using a random forest and then analysing its output to find similarities in the data. We then train the ensemble using a subset of each category of the data. This ensures that it has no surprises when it works on the actual data. We process the output of the random forest using several different techniques to determine the best way to cluster the data. Our results indicate that using the proposed technique can result in an R2 of 99.9% compared to an R2 of 98.7% for no categorization of the training data of the meta unit. Proving that the concept results in better performance.