JSON behind URL to Excel
gianluca_scheid
New Altair Community Member
Dear Rapidminer Community
I have about 55 URLs each containing JSON data. How can I crawl the JSON data behind the URLs and combine/convert them to one .xlsx file?
Best regards
GL
I have about 55 URLs each containing JSON data. How can I crawl the JSON data behind the URLs and combine/convert them to one .xlsx file?
Here are three URLs as a sample:
Best regards
GL
Tagged:
1
Best Answer
-
Hi Gianluca,
There is a third method which uses a Python script inside the RM process.
So you have first :
- to install the Python Scripting extension from the Marketplace.
- to install Python on your computer
More over you have to copy the link of your 55 URLs in the Excel file in attached file :
Here a screenshot of this XLS file with the 3 URLs you shared :
Secondly open the Python script and modify the 2 following variables :
- path : it is the absolute path where the Excel file containing your URL links is stored. Don't forget the quotes ('')
- path_to_JSON : it is the absolute path of the directory where all your 55 JSON files will be stored after extraction of the URL content. Don't forget the quotes ('')
Thirdly, set, in the Write Excel operator parameters, the filename and the directory where your final output excel file will be stored :
Finally execute the process. In addition of the generated excel output file, the data will be available as an example set inside RapidMiner.
Here a screenshot of the output excel file :
Hope this helps,
Regards,
Lionel
PS : the process is in attached file
6
Answers
-
hi @gianluca_scheid I'm not sure what you mean by crawl the JSON data. If you want to convert it to a tabular format, you have a couple of choices...
- you can use the JSON to Data operator in the Text Processing extension. Unfortunately this will simply "flatten" your JSON into one huge row. You can of course use other operators or Turbo Prep to massage it further.
- you can use the much better JSON tools in the Web Automation extension. Note this is a paid extension sold by a third party (Old World Computing).
Scott
2 -
Hi Gianluca,
There is a third method which uses a Python script inside the RM process.
So you have first :
- to install the Python Scripting extension from the Marketplace.
- to install Python on your computer
More over you have to copy the link of your 55 URLs in the Excel file in attached file :
Here a screenshot of this XLS file with the 3 URLs you shared :
Secondly open the Python script and modify the 2 following variables :
- path : it is the absolute path where the Excel file containing your URL links is stored. Don't forget the quotes ('')
- path_to_JSON : it is the absolute path of the directory where all your 55 JSON files will be stored after extraction of the URL content. Don't forget the quotes ('')
Thirdly, set, in the Write Excel operator parameters, the filename and the directory where your final output excel file will be stored :
Finally execute the process. In addition of the generated excel output file, the data will be available as an example set inside RapidMiner.
Here a screenshot of the output excel file :
Hope this helps,
Regards,
Lionel
PS : the process is in attached file
6