How to handle a CSV file that has JSON columns in Rapidminer Studio

chenx
chenx New Altair Community Member
edited November 5 in Community Q&A
Hello everyone,

I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.

Your help is much appreciated!

Thanks,
xc
Tagged:

Best Answer

  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Answer ✓
    Hi cx,

    It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.

    Best,

    Roland

Answers

  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Hi @chenx,

    I had a quick test and it's definitely possible to read it in, however I would ask what format you have planned? Is it the case that the json entries need to turn into additional columns?

    Best,

    Roland
  • chenx
    chenx New Altair Community Member
    Hi Roland,

    Thanks for the quick response. Most datasets I used to build models in Rapidminer Studio have regular rows and columns. In those datasets, each cell has only one value. This dataset embeds JSON data in several columns. I wonder if I need to transform this dataset into a regular dataset in which each cell has only one value before I supply it to an algorithm to create a model or if  Rapidminer Studio can read this data format and process it correctly to build models. If I need to transform manually before I can build models, I would think I need to transform it into a regular dataset in which each cell has only one value. I probably need to do some pivotings on the dataset. 

    Thanks,
    xc
  • chenx
    chenx New Altair Community Member
    I did a quick experiment to create a decision tree model using the Auto Model. It seems that Auto Model can handle the CSV file with JSON columns. It created a model that uses the values in genres, crew, cast, and production_companies. Those columns are JSON columns. Does it mean at least the Auto Model in Rapidminer Studio can read the CSV file with some JSON columns correctly?
  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Hi xc,

    This surprises me slightly as I'm not seeing the same behaviour. Would you be able to send a screen shot of the Select Task stage of Auto Model, like I've shown here:


    Best,

    Roland
  • chenx
    chenx New Altair Community Member
    Hi Roland,

    Please see the attached image for the Select Task stage and the result. I didn't see the genre values in the decision tree, but cast and crew values are.

    Thanks,
    xc
  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Answer ✓
    Hi cx,

    It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.

    Best,

    Roland