🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Text processing operators on example set"

User: "laurahajnalka"
New Altair Community Member
Updated by Jocelyn

Hello Everyone,

 

I have several csv files, that looks the same: they have 2 attributes; a word list (extracted from a document), and their occurrences. First, I have to filter them. For that, I made a Stopword Dictionary. Then, I have to make one huge matrix out of them, where there are the remaining words in the header, and every document represents a line. 

The "Process Documents from Files" operator works almost perfectly, BUT the occurrences lost. This operator wants to count its own occurrence, so it is going to be 1 or 0, if the given word is presented in a document or nor not. How can I use the previously counted numbers?

I also tried it with "Read CSV", "Nominal to text" and "Process Documents from Data" operators, but in this way, I can't even filter the words.

I'll also need the name of the files in the final matrix at the beginning of the lines. I already found out how to use an existing macro, but I do not know how to make one. I would like to make a file_name macro, but I don't know how to do that. 
I am a newbie, so if you know the answer for one of the questions, please detail it as much as possible, because what is obvious to you, it may not be for me.

 

Thank you in advance!

Laura

Find more posts tagged with