[SOLVED] Transform Document-Term matrix to flat table?

Question

A newbie question.

I have a simple process which uses „Data to Documents“, „Process Documents“ and „Tokenize“  to turn a list of strings into a wordlist.The second result is my ExampleSet turned into a Document-Term Matrix.

My question is: How can I transform the Document-Term matrix (Document_ID x Term) to a flat table with three attributes (Document_ID, Term, occurrences)?

Regards,

Roland

RWingerter · Answer

Hi Marcin,

thank you very much, it works like a charm. Marcin  wrote:
The operator you are looking for is the "De-Pivot" which is indeed not easy to use (in my opinion).

I had looked at the "De-Pivot" operator, but I had no idea how to adress the attribute names. I am not saying I understand your code (that will certainly take a while), but for now I am just happy to have a solution. Thanks again.

Kind regards

Roland

Skirzynski · Answer

The operator you are looking for is the "De-Pivot" which is indeed not easy to use (in my opinion). Unfortunately it cannot handle special attributes, thus, you have to explicitly exclude Query_ID in the "attribute_name" with a negative lookahead, which is not optimal, but it works.

RWingerter · Answer

Hi Marcin, thanks for your reply. Here is my example data and my simple process. The input is a list of user queries (query_id, query, frequency), which is processed with "Process Documents from Data". The result is a word list and a document-term matrix. In addition, I would like to get a term-document table with Term, Query_ID, and TF*IDF, e.g. Term Query_ID TF*IDF --------------------------------- Term1 1 0.34 Term1 2 0.23 Term2 3 1.00 I tried various things without success. Maybe it's not difficult to do, but I didn't manage. Sample data: Query_ID;Query;frequency 1;hautarzt;103921 2;zahnarzt;101684 3;augenarzt;89233 4;frauenarzt;75116 5;arzt;70755 6;ärzte;65176 7;zahnärzte;57836 8;allgemeinarzt;54111 9;tierarzt;52387 10;augenärzte;49855 11;hautärzte;33141 12;kinderarzt;32989 13;kinderärzte;26377 14;hno arzt;22984 15;tierärzte;22090 16;frauenärzte;20694 17;lungenfacharzt;16468 18;praktische ärzte;14175 19;hno-ärzte;13290 20;hausarzt;12595 21;hautarztpraxen;12262 22;allgemeinärzte;11906 23;ärzte allgemeinmedizin und praktische ärzte;11781 24;ärzte orthopädie;10833 25;hals nasen ohrenärzte;5457 26;hno ärzte;4607 27;hals nasen ohren arzt;4319 28;ärzte innere medizin;4053 29;ärzte urologie;3886 30;ärzte frauenheilkunde und geburtshilfe;3837 Code: Any and all help welcome. Thank you Roland