"Filtering the Results"
Legacy User
New Altair Community Member
Hey guys,
I want to compare the Rapid Miner for a university project with IBM Omnifind. For that I´d like to run the same scenario in both aplications. Don´t worry it´s a really simple one. I´ll give you the descripton and then what my problem is.
Scenario:
I use the NHTSA data base which contains many many problem reports of cars in America. I splitted every report in a seperate file. Now I want to compare the problem reports in a Correlation Matrix und filter it for the keyword fire. What I can see now is that I have a strong correlation between a car brand and a part of a car.
How to do this in RapidMiner:
I splitted the main file so that I have 1000 files each containing a problem report. Then I load the files via:
Textinput->StringTokenizer->English Stopwordfilter->TokenLengthFilter->Porterstemmer.
After that I use the Correlation Matrix. The thing is that I get too many data. I want to filter the results so that I use only the files which contain the keyword I want to filter. In my case that is "fire". Is that possible? I get at the moment a wide range Correlation Matrix but can´t really use it. Plotting the results is not possible because of too much data.
I hope that you can help me.
Cheers
Benjamin
I want to compare the Rapid Miner for a university project with IBM Omnifind. For that I´d like to run the same scenario in both aplications. Don´t worry it´s a really simple one. I´ll give you the descripton and then what my problem is.
Scenario:
I use the NHTSA data base which contains many many problem reports of cars in America. I splitted every report in a seperate file. Now I want to compare the problem reports in a Correlation Matrix und filter it for the keyword fire. What I can see now is that I have a strong correlation between a car brand and a part of a car.
How to do this in RapidMiner:
I splitted the main file so that I have 1000 files each containing a problem report. Then I load the files via:
Textinput->StringTokenizer->English Stopwordfilter->TokenLengthFilter->Porterstemmer.
After that I use the Correlation Matrix. The thing is that I get too many data. I want to filter the results so that I use only the files which contain the keyword I want to filter. In my case that is "fire". Is that possible? I get at the moment a wide range Correlation Matrix but can´t really use it. Plotting the results is not possible because of too much data.
I hope that you can help me.
Cheers
Benjamin
0
Answers
-
ok, let´s specify my wish. I´d like to filter for some key words my dataset and do then a CorrelationMatrix. So that I can see if I filter for my keyword Fire that we have a strong correlation between Ford and door. Maybe I have to use AttributeWeightSelection.
please help0 -
Hi,
the solution is quite simple: just use the operator "ExampleFilter" before applying the correlation matrix and filter out all examples where the TFIDF value for the keyword (here: Fire) or it's corresponding wordstem is 0. After that, you should apply a "RemoveUselessAttributes" operator to filter out all now constant attributes. Then apply the correlation matrix.
Cheers,
Ingo0 -
about the example filter. I set the parameter string to fire but I don´t really know how to set the condition class. Can you tell me what I need to set here. If I set the parameter string then I get from every configuration that it doesn´t work with a parameter.0
-
Hi,
you have to use the [tt]attribute_value_filter[/tt] option of the [tt]condition_class[/tt] parameter. As [tt]parameter_string[/tt] you have to specify a condition. Whenever an example does not fulfill the condition, it is filtered from the example set. The following code should work for your example.
Hope that helps,
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="Fire<>0"/>
</operator>
Tobias0