How to find the most important features in a dataset?

Christos_Karapapas
Christos_Karapapas New Altair Community Member
edited November 5 in Community Q&A
I have a dataset in csv format with more than 500 columns, I have imported it to a database marking every column as polynomial since they all hold different types of information and now, I want to find which of those are the most important.  

So far, I have managed to get a table with the feature and its weight, using the weight by "X" operator, but the problem is that on the results I get every feature-value separately on a different row. Instead what I want is to aggregate by feature and have a single weight for each of them. I tried using the aggregate operator but with no luck.

As an example, this is what I get:
feature01-value05, weight:0,71
feature01-value13, weight:0,69
feature09-value03, weight:0,55

Instead I want something like this:
feature01, weight:0,7
feature09, weight:0,55

Best Answer

  • Christos_Karapapas
    Christos_Karapapas New Altair Community Member
    Answer ✓
    Thank you so much Lionel! 

    I finally managed to figure it out. I was getting a ArrayIndexOutOfBoundsException on the Weight by Information Gain operator due to some missing values in my dataset, so I was trying with various (wrong) operators to overcome this problem. One of those was the nominal to numerical which apparently caused this behavior. Once i replaced it with the (obviously right for this job) Replace Missing Values operator everything worked as expected.

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @chris_skg,

    I'm not able to get the results you obtained...
    Here the results I get by applying Weight by Information Gain operator to the Golf dataset : 



    In order we can reproduce what you observe and understand what's going on, can you please share : 
     - your XML process or your file process (.rmp file)
     - your data

    Regards,

    Lionel


  • Christos_Karapapas
    Christos_Karapapas New Altair Community Member
    Answer ✓
    Thank you so much Lionel! 

    I finally managed to figure it out. I was getting a ArrayIndexOutOfBoundsException on the Weight by Information Gain operator due to some missing values in my dataset, so I was trying with various (wrong) operators to overcome this problem. One of those was the nominal to numerical which apparently caused this behavior. Once i replaced it with the (obviously right for this job) Replace Missing Values operator everything worked as expected.
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    OK, @chris_skg,

    Glad that you finally found a solution ! 

    Regards,

    Lionel