Random Forest Project
Design weighted random forest
Manipulate random forest at tree level using importance score and multi-match info for each feature
For each tree ,compute weight as
For each subject ,compute prediction as based on OBB prediction
Manipulate info at tree level
Extract sample prediction at each tree;
Get the feature usage info at each tree
Generate simulation
Generate dataset with features and predictors, predictors are used to create y. Pathway are small datasets comprised of randomly sample of features and predictors. we create artificial true pathway by fix number of predictors.
This simulate metabolomics data, in which features matching to multiple pathway owing to the LC-MS m/z matching bias.
When lower the weight of tree with multi-match feature, hopefully, false pathway which have artificial high accuracy owing to multi-match will be suppressed. Thus , true pathway related to disease will be distinguished from the false.
Testify the effectiveness
Plot number of true predictor in tree vs. accuracy, adjusted by number of feature each tree
The plot show the higher true predictors, the higher accuracy!!
figure 1