Search a word list in a database

keskinov
keskinov New Altair Community Member
edited November 5 in Community Q&A
Hi everyone,

I'm trying to find out how to search a word list in a database with RapidMiner for some time now, but cannot find a solution. I will be extremely thankful if somebody can help me.

I'm working with a database consisting of 25.000 rows with relative large amount of text data. I have extracted a word list with the top 10% of the most frequent words in this database. The total count of unique words is around 35.000. The most frequent (over 100) words are around 3500. I need to find out in which rows I can find each word of the most frequent list. In other words I need to create a binary vector matrix with 25.000 rows and 3500 columns. Can I do that with RapidMiner?

Thank you very much in advance!

Answers

  • keskinov
    keskinov New Altair Community Member
    Hi everyone,

    I have one more question. Does somebody know a rule or an example from an article that states how many percent from the most frequent words in database should be analyzed so that one can become the most reliable results (f.e. the top 10%)?

    Thank you!