Random Forest Project

less than 1 minute read

Design weighted random forest

Manipulate random forest at tree level using importance score and multi-match info for each feature

For each tree ,compute weight as

For each subject ,compute prediction as based on OBB prediction

Manipulate info at tree level

Extract sample prediction at each tree;

Get the feature usage info at each tree

alt alt

Generate simulation

Generate dataset with features and predictors, predictors are used to create y. Pathway are small datasets comprised of randomly sample of features and predictors. we create artificial true pathway by fix number of predictors.

This simulate metabolomics data, in which features matching to multiple pathway owing to the LC-MS m/z matching bias.

When lower the weight of tree with multi-match feature, hopefully, false pathway which have artificial high accuracy owing to multi-match will be suppressed. Thus , true pathway related to disease will be distinguished from the false.

alt

Testify the effectiveness

Plot number of true predictor in tree vs. accuracy, adjusted by number of feature each tree

The plot show the higher true predictors, the higher accuracy!!

altfigure 1