Splitting output into multiple (many) csv
Hi
Question from a newbee. I have a process built in RapidMiner studio that creates an output containing anywhere between 100 and 5000 rows (depending on starting input). I want to write out the output as one csv per row. At the moment I can get the full data set using the Write CSV operator, but that just gives me one file with everything, when I want 1 csv per record. I've tried doing this in post-processing by adding a new section to the Python script that handles the data after it's been through the process, but the formatting of the CSV is causing problems. I really want it to come out of RapidMiner in separate files to maintain the integrity of the results.
Any thoughts appreciated?
Thanks
Best Answer
-
Hi @MichaelWall
Welcome to RapidMiner community.
See if the attached process helps you. You can open this process from FIle>>Import Process
You may need to change path of the csv location
But here is what it does
I am going to loop examples(rows), basically one row at a time,
Inside the loop you filter to current row number and then write that one row to one csv
the filename is the rownumber.csv
If you need to name the file differenty, then that should be possible with additonal operator, but hopefully this will get you started
0
Answers
-
Hi @MichaelWall
Welcome to RapidMiner community.
See if the attached process helps you. You can open this process from FIle>>Import Process
You may need to change path of the csv location
But here is what it does
I am going to loop examples(rows), basically one row at a time,
Inside the loop you filter to current row number and then write that one row to one csv
the filename is the rownumber.csv
If you need to name the file differenty, then that should be possible with additonal operator, but hopefully this will get you started
0 -
Thanks for this, works really well, much faster than the existing process I am replicating. The key thing was to set the iteration macro on the Loop Examples operator to row_number so it indexed through each row.
0