General Results

Hi, I am new to RapidMiner and I am learning how to use it correctly. I did a simple process of tokenize but the results are quite unusual from other videos and instruction that I saw. To go more into details, I used two operators (read document, process document) and tokenize in the sub-process; but I am receving this results, with an uncomfartable layout. I am attaching a screenshot, do you know where I am doing wrong ? why am I getting this result ?

Thanks!

Find more posts tagged with

AI Studio

Accepted answers

rfuentealba

Hello @naxiota,

Ok, let's see how to help you. Since you are learning, I prefer you to see and repeat with me:

You have one file. Here I used a version of the Holy Bible (please see my disclaimer at the end), so I used the Open File operator to read it, extract the content with the Read Document operator and then pass it as a document to Process Documents.

Inside the Process Documents super-operator (the kinds of operators that let you put stuff inside those), I did this:

Well...

Tokenizing is boring because it's more an internal representation of words, you won't get meaningful things from that.
Also, it is a good idea to Transform Cases because it brings together words with mixed cases (those are not considered the same under normal comparison algorithms).
Now on to something useful: Let's extract all the Nouns from the content. You may be able to do many other things, but this is a simple exercise: you can extract the POS tags using this https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Now you can do whatever you want with natural language processing.

I added the file but please try figuring things out with the pictures I sent and the explanations first, it helps better.

Hope this helps,

Rodrigo.

example.rmp

All comments

rfuentealba

Hello @naxiota,

Ok, let's see how to help you. Since you are learning, I prefer you to see and repeat with me:

Well...

Tokenizing is boring because it's more an internal representation of words, you won't get meaningful things from that.
Also, it is a good idea to Transform Cases because it brings together words with mixed cases (those are not considered the same under normal comparison algorithms).
Now on to something useful: Let's extract all the Nouns from the content. You may be able to do many other things, but this is a simple exercise: you can extract the POS tags using this https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

example.rmp

naxiota

Thank you Rodrigo! do you have any tips on what document or guide use it for improve my use of RapidMiner ?

Many thanks !

rfuentealba

Hello,

It all depends on what you are trying to accomplish. The best way is to check tutorials available on your Repository tab. Since RapidMiner is huge, I would recommend you to take a project on your own and learn how to transform data.

I can provide you with a few things, but currently I’m at the office. Ping me later!