Discretization before or after Feature Selection?

green_tea
green_tea New Altair Community Member
edited November 5 in Community Q&A
Hello Rapidminer community,
I posted this question yesterday evening as well, however it has somehow disappeared after I edited it. I'm not sure if it will come back, so I thought I will ask again.

I have the following situation: I have a labelled dataset with 80+ features and ~3 million rows. I want to do a feature selection to get the ~10 most relevant features. The resulting features have to be discretized as I can only have a limited amount of different possibilities. For example, if a feature has values between 0-100 I will have to discretize it into 2-5 bins. Now I am unsure if I have to discretize all 80 variables first and then do the feature selection or if I can do the discretization only on the 10 most relevant features. How would this effect my result? I greatly appreciate your answers and explanations!

Best Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    The issue is that features in general can behave quite differently after discretization than in their raw forms. Discretization both masks information and also transforms the input space.  While it is "allowable" to do it either way, I think you would need to be pretty careful if you did feature selection first because what you have selected is not necessarily having the same relationship to your label after you transform it subsquently.  

    It also matters to this discussion what types of models you are using for both feature selection and your subsequent work.  Some modeling algorithms will inherently discretize their continuous inputs (e.g., think tree-based alogorithms) in which case your selection can probably be done afterwards based on what is used in the initial screening, but where you will be better off using the splits that those trees find when doing your discretization.  Other approaches create functional relationships (e.g., think linear regression or neural networks) in which case a discretized input could be very different from its raw form.

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    i would agree with @lionelderkrikor , but with a bit less "force". I think it's statically legal to do both. But i don't see any reason to do a FS on a different feature representation than you use for learning?

    BR,
    Martin
  • Maerkli
    Maerkli New Altair Community Member
    Hallo Green_Tea,

    Martin and Lionel are two RapidMiner authorities - I can't contradict them.However, I would recommand to look this training given by Markus Hofmann, another RM senior person:  https://www.youtube.com/watch?v=Nmo5puHRBwE
    Maerkli



  • green_tea
    green_tea New Altair Community Member
    first of all, thanks for the very fast replies and explanations!
    As @mschmitz asked
    But i don't see any reason to do a FS on a different feature representation than you use for learning?
    I will actually not use the resulting dataset for learning, but will combine the selected features into an "activity key" that I have to use for another tool. That is also the reason why I have to discretize the features, as too many different possibilities will limit the usability of that key. By doing the discretization afterwards, I would safe a lot of work.


  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    The issue is that features in general can behave quite differently after discretization than in their raw forms. Discretization both masks information and also transforms the input space.  While it is "allowable" to do it either way, I think you would need to be pretty careful if you did feature selection first because what you have selected is not necessarily having the same relationship to your label after you transform it subsquently.  

    It also matters to this discussion what types of models you are using for both feature selection and your subsequent work.  Some modeling algorithms will inherently discretize their continuous inputs (e.g., think tree-based alogorithms) in which case your selection can probably be done afterwards based on what is used in the initial screening, but where you will be better off using the splits that those trees find when doing your discretization.  Other approaches create functional relationships (e.g., think linear regression or neural networks) in which case a discretized input could be very different from its raw form.
  • MartinLiebig
    MartinLiebig
    Altair Employee

    For the Dec-Tree example: If you discretize first, you enforce specific splits (your bin-boundaries). This changes what the tree can do. You further reduce the ability of the tree to split into a tree. This is a quasi-pre-pruning. Thus it makes a big difference for a tree if you do it before or not.

    BR,
    Martin
  • Telcontar120
    Telcontar120 New Altair Community Member
    @mschmitz Agreed, that is why I said if using a tree method it would be better to do modeling first and then use the splits found by the tree in discretization.  Sorry if that was not clear.

  • green_tea
    green_tea New Altair Community Member
    Thanks for the input!
    I decided to discretize first and am doing the feature selection right now. I will probably also do the same evaluation without discretization to see how much of a difference it makes.