"Text mining ( creat a bag of word)"

nabilophone11
nabilophone11 New Altair Community Member
edited November 5 in Community Q&A
Hi every body

Please can you tel me how to insert a list of word ( about 600) automaticly (as attribut) in rapidminer 5.1.11, i find just a manual way with " Generate attribut "

thanks for help


N

Answers

  • colo
    colo New Altair Community Member
    Hi,

    this all depends on how the data is available. You probably have the words in some kind of list (Excel, CSV, Database)!? You can use the import wizard or the import operators like "Read Database" to import the data into RapidMiner. If you have the words as plain text you need operators from the text processing extension like "Read Document" or "Process Documents from Data". But you will have to provide more details for further help.

    Regards
    Matthias
  • nabilophone11
    nabilophone11 New Altair Community Member
    Thanks Matthias,

    Actualy, i have an excel file with 30000 lines and 3 attribute : id, text attribut(adresse), and label (yes/no), i have a bout 600 word who help me to say yes for the adress, sow i want to creat 600 attribut automaticly, i don't have probleme to import data, my pb is with how to creat the 600 attribut ( i didn't find a way to creat a list of word.... :'(

    Thanks

    N
  • nabilophone11
    nabilophone11 New Altair Community Member
    Please i need helpp  :'(
  • colo
    colo New Altair Community Member
    Hi,

    I'm not really sure about your intention. Do you need the RapidMiner wordlist format? In this case I don't know how to create one instead of creating word vectors by one of the "Process Documents" operators. What do you mean by "help me to say yes"? What classifier do you want to use?

    Regards
    Matthias
  • nabilophone11
    nabilophone11 New Altair Community Member
    Hi,

    i want to creat a matrix with this 600 attribut, if one of them is true, my class(label) is positive else negative...sorry about my english so i didn't find a way to insert all this attribut automaticly. i think that rapidminer word list format can help yes so i tried to instal WVTOOL but i doesn' work with rapidminer 5.1.11 ?

    regards,

    N

  • colo
    colo New Altair Community Member
    Hi,

    creating a wordlist for these words should be possible by writing them into a single document (e.g. one word per line or separated by some other whitespace), importing this to RapidMiner, creating a word vector using "Process Documents" (with tokenization inside). The "Process Documents" operator should deliver the desired wordlist. But I have my doubts, if this will really help you, since your classifier seems only to depend on the word lookup. I'm not sure which approach would make sense and my time is limited at the moment... sorry.

    Perhaps someone else may help?

    Regards
    Matthias
  • nabilophone11
    nabilophone11 New Altair Community Member
    Thanks Matthias

    after creating the word list, i'm thinking about using SVM model for learning...i will let you know about the result...


    Best,

    N
  • nabilophone11
    nabilophone11 New Altair Community Member
    now i  get my 800 attribut ( bag of words)...success....but not finish yet because i have to find the way to get a matrix with 0/1 for evey attribut of my bag of words...

    Do you have an idea about the perfect way to get the result  ?

    Best,


    N
  • nabilophone11
    nabilophone11 New Altair Community Member
    Hi every body,

    I get the result with 10% of error...i'm trying to perform my model...do you have any  suggestion ?

    i wan't to know how to get a new attribut who give me all attribut with value = true by line  ? is that possible

    Thank you for your help

    Best,

    N
  • nabilophone11
    nabilophone11 New Altair Community Member
    Hi,

    It is possible in rapid miner to creat an result attribut who regroup all the attribut with value = Yes 


    Ex :
                ID Label  AT1  AT2  AT3  AT4....      (what i need)   
    row1 : 1    Yes    YES  NO    Yes  NO...        AT1, AT3
    row2 : 2    Yes    NO  NO      Yes  NO...          AT3*
    .
    .
    Please need your help ! thank you very much !