🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Using Word2Vec with LSTM

User: "mhm"
New Altair Community Member
Updated by Jocelyn
Hi everyone! 

I am new to RapidMiner. All my background is in Python language. I will explain my problem but unfortunately, I can't provide any images right now. I follow some tutorials for creating a word2vec model and saving it ( or another option we can download a pre-train model). However, I have huge cuorps around 100,000 records. So, I am sure there are a huge number of words will be. but the model shows me only around 2000 words even when I try to make the window size and frequency of the word low. This is the first problem. Now coming to the second problem. I used the word2vec that I built with 2000 words. After that, i saw some tutorials on how to use embedding layers and text to embedding ID. They used a format with 4 columns ( ID, batch, word, label). they tokenized the sentence and put each token in a new row. I did my best to have the same format. But, even when I did it. I end up with two problems. This format will take up huge space when the data is too large and when I use word2vec with text to embedding id will replace the words with -2 for all of them I don't know why and what -2 means here? 

if anyone did text classification with deep learning and word2vec I would appreciate his support. I really need a solution for these problems or at least an example of how to do it in RapidMiner. I have the 9.10.4 RapidMiner version.

Thanks in advance!.
Sort by:
1 - 1 of 11
    User: "mhm"
    New Altair Community Member
    OP
    This is the error that I mentioned above when the dataset is huge this error appears. but now in this image, the data is small I don't know why this error appeared.