🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Transforming output from Process Docs to create a word list/document

User: "DataLib"
New Altair Community Member
Updated by Jocelyn
Hi there...

We have a challenge to create word/tag clouds from a database system...

Easy I thought, create a table with the first column being Document ID, another column for the word and then a third column as the count of that word in the document (we probably wouldn’t use the 3rd column, but just in case).  In this way we could create a very quick word cloud no matter what the user selects as the subset of documents.

So I have set up the job in Rapid Miner, reading the records from the database including only the Document ID and the full text field, passed it through the Process Documents element (tokenise, transform case, filter stop word, filter tokens, stem)... Job done...

Unfortunately no... and here is my problem. 

The data that comes out from the Process Document element has the Document ID as the first column, but then every word that is found is the name of the remaining columns... I have looked at Transpose and Pivot, but neither of these do what I need....

We did think about saving the output as CSV and then doing something outside of Rapid Miner, but it would then mean it will be a manual process rather than something I can automate hourly to deal with new records.

Any thoughts or ideas will be most appreciated.

Find more posts tagged with