General Results
naxiota
New Altair Community Member
Hi, I am new to RapidMiner and I am learning how to use it correctly. I did a simple process of tokenize but the results are quite unusual from other videos and instruction that I saw. To go more into details, I used two operators (read document, process document) and tokenize in the sub-process; but I am receving this results, with an uncomfartable layout. I am attaching a screenshot, do you know where I am doing wrong ? why am I getting this result ?
Thanks!
Tagged:
0
Best Answer
-
Hello @naxiota,
Ok, let's see how to help you. Since you are learning, I prefer you to see and repeat with me:
You have one file. Here I used a version of the Holy Bible (please see my disclaimer at the end), so I used the Open File operator to read it, extract the content with the Read Document operator and then pass it as a document to Process Documents.
Inside the Process Documents super-operator (the kinds of operators that let you put stuff inside those), I did this:
Well...- Tokenizing is boring because it's more an internal representation of words, you won't get meaningful things from that.
- Also, it is a good idea to Transform Cases because it brings together words with mixed cases (those are not considered the same under normal comparison algorithms).
- Now on to something useful: Let's extract all the Nouns from the content. You may be able to do many other things, but this is a simple exercise: you can extract the POS tags using this https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
I added the file but please try figuring things out with the pictures I sent and the explanations first, it helps better.
Hope this helps,
Rodrigo.
10
Answers
-
Hello @naxiota,
Ok, let's see how to help you. Since you are learning, I prefer you to see and repeat with me:
You have one file. Here I used a version of the Holy Bible (please see my disclaimer at the end), so I used the Open File operator to read it, extract the content with the Read Document operator and then pass it as a document to Process Documents.
Inside the Process Documents super-operator (the kinds of operators that let you put stuff inside those), I did this:
Well...- Tokenizing is boring because it's more an internal representation of words, you won't get meaningful things from that.
- Also, it is a good idea to Transform Cases because it brings together words with mixed cases (those are not considered the same under normal comparison algorithms).
- Now on to something useful: Let's extract all the Nouns from the content. You may be able to do many other things, but this is a simple exercise: you can extract the POS tags using this https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
I added the file but please try figuring things out with the pictures I sent and the explanations first, it helps better.
Hope this helps,
Rodrigo.
10 -
Thank you Rodrigo! do you have any tips on what document or guide use it for improve my use of RapidMiner ?Many thanks !0
-
Hello,
It all depends on what you are trying to accomplish. The best way is to check tutorials available on your Repository tab. Since RapidMiner is huge, I would recommend you to take a project on your own and learn how to transform data.
I can provide you with a few things, but currently I’m at the office. Ping me later!0