Any Ideas?
Scotty
New Altair Community Member
Hi All,
I am trying to convert the following output from
Link cluster able adsl adsl_faceplate alarms
http://test1 cluster_2 .0 .0 .0 .0
http://test2 cluster_2 .0 .0 .0 .0
http://test3 cluster_0 .1 .0 .0 .0
http://test4 cluster_2 .0 .0 .0 .0
http://test5 cluster_1 .0 .1 .0 .0
http://test6 cluster_1 .0 .0 .0 .0
http://test7 cluster_0 .0 .0 .0 .0
http://test8 cluster_2 .0 .0 .0 .0
http://test9 cluster_1 .0 .0 .0 .0
http://test10 cluster_0 .1 .0 .0 .0
to
Link Cluster Word Score
http://test1 cluster_2 able .0
http://test2 cluster_2 able .0
http://test3 cluster_0 able .1
http://test4 cluster_2 able .0
http://test5 cluster_1 able .0
http://test6 cluster_1 able .0
http://test7 cluster_0 able .0
http://test8 cluster_2 able .0
http://test9 cluster_1 able .0
http://test10 cluster_0 able .1
http://test1 cluster_2 adsl .0
http://test2 cluster_2 adsl .0
http://test3 cluster_0 adsl .0
http://test4 cluster_2 adsl .0
http://test5 cluster_1 adsl .1
http://test6 cluster_1 adsl .0
http://test7 cluster_0 adsl .0
http://test8 cluster_2 adsl .0
http://test9 cluster_1 adsl .0
Any ideas how this could be done?
There are thousands of rows and columns
Thanks
S
I am trying to convert the following output from
Link cluster able adsl adsl_faceplate alarms
http://test1 cluster_2 .0 .0 .0 .0
http://test2 cluster_2 .0 .0 .0 .0
http://test3 cluster_0 .1 .0 .0 .0
http://test4 cluster_2 .0 .0 .0 .0
http://test5 cluster_1 .0 .1 .0 .0
http://test6 cluster_1 .0 .0 .0 .0
http://test7 cluster_0 .0 .0 .0 .0
http://test8 cluster_2 .0 .0 .0 .0
http://test9 cluster_1 .0 .0 .0 .0
http://test10 cluster_0 .1 .0 .0 .0
to
Link Cluster Word Score
http://test1 cluster_2 able .0
http://test2 cluster_2 able .0
http://test3 cluster_0 able .1
http://test4 cluster_2 able .0
http://test5 cluster_1 able .0
http://test6 cluster_1 able .0
http://test7 cluster_0 able .0
http://test8 cluster_2 able .0
http://test9 cluster_1 able .0
http://test10 cluster_0 able .1
http://test1 cluster_2 adsl .0
http://test2 cluster_2 adsl .0
http://test3 cluster_0 adsl .0
http://test4 cluster_2 adsl .0
http://test5 cluster_1 adsl .1
http://test6 cluster_1 adsl .0
http://test7 cluster_0 adsl .0
http://test8 cluster_2 adsl .0
http://test9 cluster_1 adsl .0
Any ideas how this could be done?
There are thousands of rows and columns
Thanks
S
Tagged:
0
Answers
-
Hi,
maybe if you describe rules used for conversion, it will be easer to help you. Because I don't see any. Look at operators for generating attributes (
Generate Attributes, Generate Aggregation, ...)
Cheers,
Vaclav0 -
Hi Vaclav,
Sorry, I will explain a bit more.
I use the k-means clustering operator to cluster text from a webcrawl that have been pre-processed (split into tokens, stop words removed etc).
The cluster set result which consists of 3500 examples of data detailing the URL, the cluster result and the 8500 attributes from the text looks like
Link cluster able adsl adsl_faceplate alarms .......................(8500)...............z
http://test1 cluster_2 .0 .0 .0 .0 .....................................0
http://test2 cluster_2 .0 .0 .0 .0 .......................................0
http://test3 cluster_0 .1 .0 .0 .0 ...................................0
http://test4 cluster_2 .0 .0 .0 .0 ......................................0
http://test5 cluster_1 .0 .1 .0 .0 ......................................0
http://test6 cluster_1 .0 .0 .0 .0 ......................................0
http://test7 cluster_0 .0 .0 .0 .0 ......................................0
http://test8 cluster_2 .0 .0 .0 .0 ......................................0
http://test9 cluster_1 .0 .0 .0 .0 ......................................0
http://test10 cluster_0 .1 .0 .0 .0 ......................................0
....
....
....
(3500)
...
...
http://test3500 cluster_0 .1 .0 .0 .0 ......................................0
I am looking to try and get the data into the following format.
Link Cluster Word TF-IDF Score
http://test1 cluster_2 able .0
http://test1 cluster_2 adsl .0
http://test1 cluster_2 adsl_faceplate .0
http://test1 cluster_2 alarms .0
http://test1 cluster_2 ....... .0
http://test1 cluster_2 z .0
http://test2 cluster_2 able .0
http://test2 cluster_2 adsl .0
http://test2 cluster_2 adsl_faceplate .0
http://test2 cluster_2 alarms .0
http://test2 cluster_2 ....... .0
http://test2 cluster_2 z .0
http://test3 cluster_0 able .0
http://test3 cluster_0 adsl .0
http://test3 cluster_0 adsl_faceplate .0
http://test3 cluster_0 alarms .0
http://test3 cluster_0 ....... .0
http://test3 cluster_0 z .0
....
....
http://test3500 cluster_0 able .0
http://test3500 cluster_0 adsl .0
http://test3500 cluster_0 adsl_faceplate .0
http://test3500 cluster_0 alarms .0
http://test3500 cluster_0 ....... .0
http://test3500 cluster_0 z .0
Does this make a bit more sense?
Thanks
Scott0 -
Hi,
you can use the operator "Pivot" and "De-Pivot" for tasks like this. You can find examples on myexperiment.org:
http://www.myexperiment.org/search?filter=TYPE_ID%28%2262%22%29&;query=pivoting
Simply install the Community Extension for RapidMiner to access and directly download the processes uploaded there (search the forum for more information about the Community Extension).
Cheers,
Ingo0 -
Hi Ingo,
Thanks for the advice. Maybe you could point me to the example that is closest to what I am trying to do. Although similar I think the output I am after is very different.
I suspect de-pivot is somehow involved.
Many Thanks
Scott0