🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Filtering the Results"

User: "Legacy User"
New Altair Community Member
Updated by Jocelyn
Hey guys,

I want to compare the Rapid Miner for a university project with IBM Omnifind. For that I´d like to run the same scenario in both aplications. Don´t worry it´s a really simple one. I´ll give you the descripton and then what my problem is.

Scenario:
I use the NHTSA data base which contains many many problem reports of cars in America. I splitted every report in a seperate file. Now I want to compare the problem reports in a Correlation Matrix und filter it for the keyword fire. What I can see now is that I have a strong correlation between a car brand and a part of a car.

How to do this in RapidMiner:

I splitted the main file so that I have 1000 files each containing a problem report. Then I load the files via:
Textinput->StringTokenizer->English Stopwordfilter->TokenLengthFilter->Porterstemmer.

After that I use the Correlation Matrix. The thing is that I get too many data. I want to filter the results so that I use only the files which contain the keyword I want to filter. In my case that is "fire". Is that possible? I get at the moment a wide range Correlation Matrix but can´t really use it. Plotting the results is not possible because of too much data.

I hope that you can help me.

Cheers
Benjamin

Find more posts tagged with