Log the value of a special data?
T-Unit
New Altair Community Member
Hi everyone,
i'm trying to do some prediction with a data set of labeled data. The prediction works fine but now i want to log the value of an attribute of only 1 row (in each row the name of a movie is listed) of the data set to see very fast in which cluster a special single movie was sorted in. With k (k=2...20) as the number of clusters the log-file should look like this:
I think the solution to my problem is very easy but i can't get it right now.
Thx for any help.
Regards,
Thomas
i'm trying to do some prediction with a data set of labeled data. The prediction works fine but now i want to log the value of an attribute of only 1 row (in each row the name of a movie is listed) of the data set to see very fast in which cluster a special single movie was sorted in. With k (k=2...20) as the number of clusters the log-file should look like this:
k | cluster in which the single movie was sorted in |
2 | cluster_3 |
3 | cluster_0 |
... | ... |
20 | cluster_6 |
I think the solution to my problem is very easy but i can't get it right now.
Thx for any help.
Regards,
Thomas
Tagged:
0
Answers
-
Hey Thomas,
I am sorry, I don't understand what you want to do. Please give some example rows of the data from which you want to create the log.
By the way, why do you want the values as log? It will surely be possible, but from what I understand you more easily create an example set (and write it to disk via Write CSV, if desired). But before going into the details, please give us some more background.
Best, Marius0 -
Hi Marius,
thx for your fast reply.
I have 19 files, each file consisting the data of 1700 clustered movies. In each file the same movies are listed, but in the different files, there are different amounts of clusters.
For example:
In the first file the 1700 movies are separated into 2 clusters (cluster_0, cluster 1).
In the second file the 1700 movies are separated into 3 clusters (cluster_0, cluster 1, cluster_2).
...
In the 19th file the 1700 movies are separated into 20 clusters (cluster_0, cluster 1, ... , cluster_19).
So, depending on the absolute amount of clusters one movie might be classified into different clusters (maybe cluster_0 if there are only 2 clusters or maybe cluster_17 if there are 20 clusters at all).
My RM-Process does the following:
Step 1. Open a file with the clustered data for k=2 clusters.
Step 2. Train a decisiontree on the clustered data and validated the model.
Step 3. Use the model to classify an unclustered movie ("The Dark Knight Rises") (the movie was not included in data set when the model was trained!).
Step 4. Close the file. Increase k by 1 and proceed with step 1.
This process goes for k=2 to k=20.
As result i got the information in which cluster "The Dark Knight Rises" was classified, when the total amount of clusters changes from k=2 to k=20. So this data i want to log, to see very easy where the movie was located at. Sure, i might export this information into an excel-file or something for each of the changing k (as result: 19 different files), but i thought it is possible to collect this information and save it into only one file.
Maybe you (or other users) got another idea for a smart solution?
Regards,
Thomas
0 -
Hi,
the attached process creates a clustering with different k's on the training data and creates Decision Tree models on the clustering. In the second loop, the models are iterated and the new data is classified. Starting at the output of the loop it's up to you to use the classification results.
You will need the Append operator, and then probably the Pivot operator. Read about the pivot operation e.g. in this thread: http://rapid-i.com/rapidforum/index.php/topic,5678.msg20117.html
Best, Marius0