Using Word2Vec with LSTM

mhm
mhm New Altair Community Member
edited November 2024 in Community Q&A
Hi everyone! 

I am new to RapidMiner. All my background is in Python language. I will explain my problem but unfortunately, I can't provide any images right now. I follow some tutorials for creating a word2vec model and saving it ( or another option we can download a pre-train model). However, I have huge cuorps around 100,000 records. So, I am sure there are a huge number of words will be. but the model shows me only around 2000 words even when I try to make the window size and frequency of the word low. This is the first problem. Now coming to the second problem. I used the word2vec that I built with 2000 words. After that, i saw some tutorials on how to use embedding layers and text to embedding ID. They used a format with 4 columns ( ID, batch, word, label). they tokenized the sentence and put each token in a new row. I did my best to have the same format. But, even when I did it. I end up with two problems. This format will take up huge space when the data is too large and when I use word2vec with text to embedding id will replace the words with -2 for all of them I don't know why and what -2 means here? 

if anyone did text classification with deep learning and word2vec I would appreciate his support. I really need a solution for these problems or at least an example of how to do it in RapidMiner. I have the 9.10.4 RapidMiner version.

Thanks in advance!.

Answers

  • mhm
    mhm New Altair Community Member
    This is the error that I mentioned above when the dataset is huge this error appears. but now in this image, the data is small I don't know why this error appeared.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.