Any Ideas?

Scotty
Scotty New Altair Community Member
edited November 5 in Community Q&A
Hi All,

I am trying to convert the following output from

Link cluster able adsl adsl_faceplate alarms
http://test1 cluster_2 .0 .0 .0 .0
http://test2 cluster_2 .0 .0 .0 .0
http://test3 cluster_0 .1 .0 .0 .0
http://test4 cluster_2 .0 .0 .0 .0
http://test5 cluster_1 .0 .1 .0 .0
http://test6 cluster_1 .0 .0 .0 .0
http://test7 cluster_0 .0 .0 .0 .0
http://test8 cluster_2 .0 .0 .0 .0
http://test9 cluster_1 .0 .0 .0 .0
http://test10 cluster_0 .1 .0 .0 .0

to 


Link Cluster Word Score
http://test1 cluster_2 able .0
http://test2 cluster_2 able .0
http://test3 cluster_0 able .1
http://test4 cluster_2 able .0
http://test5 cluster_1 able .0
http://test6 cluster_1 able .0
http://test7 cluster_0 able .0
http://test8 cluster_2 able .0
http://test9 cluster_1 able .0
http://test10 cluster_0 able .1
http://test1 cluster_2 adsl .0
http://test2 cluster_2 adsl .0
http://test3 cluster_0 adsl .0
http://test4 cluster_2 adsl .0
http://test5 cluster_1 adsl .1
http://test6 cluster_1 adsl .0
http://test7 cluster_0 adsl .0
http://test8 cluster_2 adsl .0
http://test9 cluster_1 adsl .0

Any ideas how this could be done?
There are thousands of rows and columns

Thanks
S
Tagged:

Answers

  • StaryVena
    StaryVena New Altair Community Member
    Hi,
    maybe if you describe rules used for conversion, it will be easer to help you. Because I don't see any. Look at operators for generating attributes (
    Generate Attributes, Generate Aggregation, ...)

    Cheers,
    Vaclav
  • Scotty
    Scotty New Altair Community Member
    Hi Vaclav,

    Sorry, I will explain a bit more.

    I use the k-means clustering operator to cluster text from a webcrawl that have been pre-processed (split into tokens, stop words removed etc).

    The cluster set result which consists of 3500 examples of data detailing the URL, the cluster result and the 8500 attributes from the text looks like


    Link            cluster    able  adsl  adsl_faceplate  alarms .......................(8500)...............z
    http://test1 cluster_2  .0  .0  .0  .0 .....................................0
    http://test2 cluster_2  .0  .0  .0  .0 .......................................0
    http://test3 cluster_0  .1  .0  .0  .0 ...................................0
    http://test4 cluster_2  .0  .0  .0  .0 ......................................0
    http://test5 cluster_1  .0  .1  .0  .0 ......................................0
    http://test6 cluster_1  .0  .0  .0  .0 ......................................0
    http://test7 cluster_0  .0  .0  .0  .0 ......................................0
    http://test8 cluster_2  .0  .0  .0  .0 ......................................0
    http://test9 cluster_1  .0  .0  .0  .0 ......................................0
    http://test10 cluster_0  .1  .0  .0  .0 ......................................0
    ....
    ....
    ....
    (3500)
    ...
    ...
    http://test3500 cluster_0  .1  .0  .0  .0 ......................................0

    I am looking to try and get the data into the following format.

    Link            Cluster      Word  TF-IDF Score
    http://test1 cluster_2  able  .0
    http://test1 cluster_2  adsl  .0
    http://test1 cluster_2  adsl_faceplate  .0
    http://test1 cluster_2  alarms  .0
    http://test1 cluster_2  .......  .0
    http://test1 cluster_2  z  .0
    http://test2 cluster_2  able  .0
    http://test2 cluster_2  adsl  .0
    http://test2 cluster_2  adsl_faceplate  .0
    http://test2 cluster_2  alarms  .0
    http://test2 cluster_2  .......  .0
    http://test2 cluster_2  z  .0
    http://test3 cluster_0  able  .0
    http://test3 cluster_0  adsl  .0
    http://test3 cluster_0  adsl_faceplate  .0
    http://test3 cluster_0  alarms  .0
    http://test3 cluster_0  .......  .0
    http://test3 cluster_0  z  .0
    ....
    ....
    http://test3500 cluster_0  able  .0
    http://test3500 cluster_0  adsl  .0
    http://test3500 cluster_0  adsl_faceplate  .0
    http://test3500 cluster_0  alarms  .0
    http://test3500 cluster_0  .......  .0
    http://test3500 cluster_0  z  .0

    Does this make a bit more sense?

    Thanks
    Scott
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    you can use the operator "Pivot" and "De-Pivot" for tasks like this. You can find examples on myexperiment.org:

    http://www.myexperiment.org/search?filter=TYPE_ID%28%2262%22%29&;query=pivoting

    Simply install the Community Extension for RapidMiner to access and directly download the processes uploaded there (search the forum for more information about the Community Extension).

    Cheers,
    Ingo
  • Scotty
    Scotty New Altair Community Member
    Hi Ingo,

    Thanks for the advice. Maybe you could point me to the example that is closest to what I am trying to do. Although similar I think the output I am after is very different.

    I suspect de-pivot is somehow involved.

    Many Thanks

    Scott