How to handle a CSV file that has JSON columns in Rapidminer Studio

chenx
chenx New Altair Community Member
edited November 2024 in Community Q&A
Hello everyone,

I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.

Your help is much appreciated!

Thanks,
xc
Tagged:

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • RolandJones
    RolandJones
    Altair Employee
    Answer ✓
    Hi cx,

    It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.

    Best,

    Roland

Answers

  • RolandJones
    RolandJones
    Altair Employee
    Hi @chenx,

    I had a quick test and it's definitely possible to read it in, however I would ask what format you have planned? Is it the case that the json entries need to turn into additional columns?

    Best,

    Roland
  • chenx
    chenx New Altair Community Member
    Hi Roland,

    Thanks for the quick response. Most datasets I used to build models in Rapidminer Studio have regular rows and columns. In those datasets, each cell has only one value. This dataset embeds JSON data in several columns. I wonder if I need to transform this dataset into a regular dataset in which each cell has only one value before I supply it to an algorithm to create a model or if  Rapidminer Studio can read this data format and process it correctly to build models. If I need to transform manually before I can build models, I would think I need to transform it into a regular dataset in which each cell has only one value. I probably need to do some pivotings on the dataset. 

    Thanks,
    xc
  • chenx
    chenx New Altair Community Member
    I did a quick experiment to create a decision tree model using the Auto Model. It seems that Auto Model can handle the CSV file with JSON columns. It created a model that uses the values in genres, crew, cast, and production_companies. Those columns are JSON columns. Does it mean at least the Auto Model in Rapidminer Studio can read the CSV file with some JSON columns correctly?
  • RolandJones
    RolandJones
    Altair Employee
    Hi xc,

    This surprises me slightly as I'm not seeing the same behaviour. Would you be able to send a screen shot of the Select Task stage of Auto Model, like I've shown here:


    Best,

    Roland
  • chenx
    chenx New Altair Community Member
    Hi Roland,

    Please see the attached image for the Select Task stage and the result. I didn't see the genre values in the decision tree, but cast and crew values are.

    Thanks,
    xc
  • RolandJones
    RolandJones
    Altair Employee
    Answer ✓
    Hi cx,

    It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.

    Best,

    Roland

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.