Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
How to handle a CSV file that has JSON columns in Rapidminer Studio
chenx
Hello everyone,
I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.
Your help is much appreciated!
Thanks,
xc
Find more posts tagged with
AI Studio
Accepted answers
RolandJones
Hi cx,
It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.
Best,
Roland
All comments
RolandJones
Hi
@chenx
,
I had a quick test and it's definitely possible to read it in, however I would ask what format you have planned? Is it the case that the json entries need to turn into additional columns?
Best,
Roland
chenx
Hi Roland,
Thanks for the quick response. Most datasets I used to build models in Rapidminer Studio have regular rows and columns. In those datasets, each cell has only one value. This dataset embeds JSON data in several columns. I wonder if I need to transform this dataset into a regular dataset in which each cell has only one value before I supply it to an algorithm to create a model or if Rapidminer Studio can read this data format and process it correctly to build models. If I need to transform manually before I can build models, I would think I need to transform it into a regular dataset in which each cell has only one value. I probably need to do some pivotings on the dataset.
Thanks,
xc
chenx
I did a quick experiment to create a decision tree model using the Auto Model. It seems that Auto Model can handle the CSV file with JSON columns. It created a model that uses the values in genres, crew, cast, and production_companies. Those columns are JSON columns. Does it mean at least the Auto Model in Rapidminer Studio can read the CSV file with some JSON columns correctly?
RolandJones
Hi xc,
This surprises me slightly as I'm not seeing the same behaviour. Would you be able to send a screen shot of the Select Task stage of Auto Model, like I've shown here:
Best,
Roland
chenx
Hi Roland,
Please see the attached image for the Select Task stage and the result. I didn't see the genre values in the decision tree, but cast and crew values are.
Thanks,
xc
RolandJones
Hi cx,
It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.
Best,
Roland
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups