Random Subspace Learning (RASSEL) with data driven weighting schemes
MetadataShow full item record
We present a novel adaptation of the random subspace learning approach to regression analysis and classification of high dimension low sample size data, in which the use of the individual strength of each explanatory variable is harnessed to achieve a consistent selection of a predictively optimal collection of base learners. In the context of random subspace learning, random forest (RF) occupies a prominent place as can be seen by the vast number of extensions of the random forest idea and the multiplicity of machine learning applications of random forest. The adaptation of random subspace learning presented in this paper differs from random forest in the following ways: (a) instead of using trees as RF does, we use multiple linear regression (MLR) as our regression base learner and the generalized linear model (GLM) as our classification base learner and (b) rather than selecting the subset of variables uniformly as RF does, we present the new concept of sampling variables based on a multinomial distribution with weights (success ’probabilities’) driven through p independent one-way analysis of variance (ANOVA) tests on the predic- tor variables. The proposed framework achieves two substantial benefits, namely, (1) the avoidance of the extra computational burden brought by the permutations needed by RF to de-correlate the predictor variables, and (2) the substantial reduc- tion in the average test error gained with the base learners used.
Document typePeer reviewed
Document versionFinal PDF
SourceMathematics for Applications. 2018 vol. 7, č. 1, s. 11-30. ISSN 1805-3629
- 2018/1