Question

Hi everyone!

I am new to RapidMiner. All my background is in Python language. I will explain my problem but unfortunately, I can't provide any images right now. I follow some tutorials for creating a word2vec model and saving it ( or another option we can download a pre-train model). However, I have huge cuorps around 100,000 records. So, I am sure there are a huge number of words will be. but the model shows me only around 2000 words even when I try to make the window size and frequency of the word low. This is the first problem. Now coming to the second problem. I used the word2vec that I built with 2000 words. After that, i saw some tutorials on how to use embedding layers and text to embedding ID. They used a format with 4 columns ( ID, batch, word, label). they tokenized the sentence and put each token in a new row. I did my best to have the same format. But, even when I did it. I end up with two problems. This format will take up huge space when the data is too large and when I use word2vec with text to embedding id will replace the words with -2 for all of them I don't know why and what -2 means here?

if anyone did text classification with deep learning and word2vec I would appreciate his support. I really need a solution for these problems or at least an example of how to do it in RapidMiner. I have the 9.10.4 RapidMiner version.

Thanks in advance!.