How to handle a CSV file that has JSON columns in Rapidminer Studio
chenx
New Altair Community Member
Hello everyone,
I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.
Your help is much appreciated!
Thanks,
xc
I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.
Your help is much appreciated!
Thanks,
xc
Tagged:
0
Best Answer
-
Hi cx,
It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.
Best,
Roland0
Answers
-
Hi @chenx,
I had a quick test and it's definitely possible to read it in, however I would ask what format you have planned? Is it the case that the json entries need to turn into additional columns?
Best,
Roland0 -
Hi Roland,
Thanks for the quick response. Most datasets I used to build models in Rapidminer Studio have regular rows and columns. In those datasets, each cell has only one value. This dataset embeds JSON data in several columns. I wonder if I need to transform this dataset into a regular dataset in which each cell has only one value before I supply it to an algorithm to create a model or if Rapidminer Studio can read this data format and process it correctly to build models. If I need to transform manually before I can build models, I would think I need to transform it into a regular dataset in which each cell has only one value. I probably need to do some pivotings on the dataset.
Thanks,
xc0 -
I did a quick experiment to create a decision tree model using the Auto Model. It seems that Auto Model can handle the CSV file with JSON columns. It created a model that uses the values in genres, crew, cast, and production_companies. Those columns are JSON columns. Does it mean at least the Auto Model in Rapidminer Studio can read the CSV file with some JSON columns correctly?0
-
Hi xc,
This surprises me slightly as I'm not seeing the same behaviour. Would you be able to send a screen shot of the Select Task stage of Auto Model, like I've shown here:
Best,
Roland0 -
Hi Roland,
Please see the attached image for the Select Task stage and the result. I didn't see the genre values in the decision tree, but cast and crew values are.
Thanks,
xc0 -
Hi cx,
It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.
Best,
Roland0