🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Assessing features performance on different datasets"

User: "ollestrat"
New Altair Community Member
Updated by Jocelyn
Hello,

My question is:
How to identify the features that work best on various different datasets? This means those features have to be robust and transferable and independent by the specific characteristics of an individual dataset.

My data:
- two-class problem
- 7 datasets with about 50 identical numerical features (ranges can  differ significantly, but the question is not to find robust thresholds but rather identifying the key features that have a good performance across all datasets)
- Each dataset with about 5000 instances for training and testing

My ideas so far:
- select for each of the 7 datasets an optimal feature subset (e.g. by a wrapper feature selection) and simply count the occurences over all 7 results
- also, calculate "information gain" of features for the individual datasets. The average out of all 7 tests will reveal the robust features (?    ..hopefully).


Do you think the ideas are worth to follow? Can you give me a hint to some problems, improvements, RapidMiner algorithms etc. as I'm relatively new to RM and data mining?

Thanks and Greetings
ollestrat

Find more posts tagged with