JSON to data and de-pivot with missing attributes and values

eduard_wagner
eduard_wagner New Altair Community Member
edited November 5 in Community Q&A

Hi everyone,

 

after all i could not find out how to get this problem fixed: i have a JSON file that is inconsistent with its attribute and values, some are missing. Like this:

[{"date":1465632900,"high":0.00199281,"low":0.00199281,"open":0.00199281,"close":0.00199281,"volume":0.00078269,"quoteVolume":0.39276167,"weightedAverage":0.00199281},{"date":1465633200, "low":0.00199281,"open":0.00199281,"close":0.00199281,"volume":0.00034535,"quoteVolume":0.17329899,"weightedAverage":0.00199281}]

As you see for the second entry there is the seconed attribute "high" and its value missing. Its not possible to De-Pivot the JSON Data without creating the missing attributes+values before, this error shows up: "series must have the same lenght: no conversion is performed. I tried the "Declare missing value", but of cause this needs the attribute before correcting any value.

 

Unfortunataly the dataset is pretty large. Is there an way/operator to get the process fixed?

Thanks a lot!

 

Best Answer

  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓

    Hello @eduard_wagner - yes this gets rather icky with JSON arrays.  :)  This is a workaround for you:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="[{&quot;date&quot;:1465632900,&quot;high&quot;:0.00199281,&quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00078269,&quot;quoteVolume&quot;:0.39276167,&quot;weightedAverage&quot;:0.00199281},{&quot;date&quot;:1465633200, &quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00034535,&quot;quoteVolume&quot;:0.17329899,&quot;weightedAverage&quot;:0.00199281}]"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="179" y="34"/>
    <operator activated="true" class="transpose" compatibility="7.6.001" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
    <operator activated="true" class="split" compatibility="7.6.001" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="id"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="split_pattern" value="\][.]"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="581" y="34">
    <parameter key="replace_what" value="\["/>
    </operator>
    <operator activated="true" class="pivot" compatibility="7.6.001" expanded="true" height="82" name="Pivot" width="90" x="715" y="34">
    <parameter key="group_attribute" value="id_1"/>
    <parameter key="index_attribute" value="id_2"/>
    <parameter key="consider_weights" value="false"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="7.6.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="835" y="34">
    <parameter key="replace_what" value=".*1[_]"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

Answers

  • sgenzer
    sgenzer
    Altair Employee

    Hello @eduard_wagner - welcome to the community.  So there are a few ways to parse out JSON files: 1) use the "JSON to Data" operator; 2) use the "JSON to XML" operator to convert to XML, and then use "Read XML", 3) if you're getting the JSON from a webservice, you can use the built-in JSON parser in the Enrich Data via Webservice operator.  These operators are in the Text Processing / Web Mining extensions (accessible via the Marketplace).

     

    If you need more help, perhaps post your RapidMiner process in this thread as XML using the </> tool.  It makes it much easier for us to help.

     

    Scott

  • eduard_wagner
    eduard_wagner New Altair Community Member

    Thanks for your help @sgenzer. I got the process as you discribed, here is the xml:

     

    Create Document  --> JSON to Data --> De-Pivot --> res

    (for this example i use a reduced JSON dataset)

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="[{&quot;date&quot;:1465632900,&quot;high&quot;:0.00199281,&quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00078269,&quot;quoteVolume&quot;:0.39276167,&quot;weightedAverage&quot;:0.00199281},{&quot;date&quot;:1465633200, &quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00034535,&quot;quoteVolume&quot;:0.17329899,&quot;weightedAverage&quot;:0.00199281}]"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="7.4.001" expanded="true" height="82" name="JSON To Data" width="90" x="246" y="34"/>
    <operator activated="true" class="de_pivot" compatibility="7.5.001" expanded="true" height="82" name="De-Pivot" width="90" x="648" y="34">
    <list key="attribute_name">
    <parameter key="close" value="\[\d+\]\.close"/>
    <parameter key="high" value="\[\d+\]\.high"/>
    <parameter key="low" value="\[\d+\]\.low"/>
    <parameter key="open" value="\[\d+\]\.open"/>
    <parameter key="date" value="\[\d+\]\.date"/>
    <parameter key="volume" value="\[\d+\]\.volume"/>
    <parameter key="quoteVolume" value="\[\d+\]\.quoteVolume"/>
    <parameter key="weightedAverage" value="\[\d+\]\.weightedAverage"/>
    </list>
    <parameter key="index_attribute" value="id"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     

    The process error occurs in the De-pivot operator, when it tries to create the attributes and fill in the values. Problem is, that the attribute/JSON key "high" and its value is missing for the second object of the JSON dataset. 

    Is there a way that the depivot operator skips missing attributes/values or inserts predefined values? I tried to catch the error with the "Replace / Declare missung value" operators, but they cant be applied, since the attributes are not specified before depivoting. Also the "Generate Attribute" operator didnt work.

     

    Hope you see the problem in the xml :) 

     

    Thanks a lot!

    Eduard

  • sgenzer
    sgenzer
    Altair Employee
    Answer ✓

    Hello @eduard_wagner - yes this gets rather icky with JSON arrays.  :)  This is a workaround for you:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="[{&quot;date&quot;:1465632900,&quot;high&quot;:0.00199281,&quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00078269,&quot;quoteVolume&quot;:0.39276167,&quot;weightedAverage&quot;:0.00199281},{&quot;date&quot;:1465633200, &quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00034535,&quot;quoteVolume&quot;:0.17329899,&quot;weightedAverage&quot;:0.00199281}]"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="179" y="34"/>
    <operator activated="true" class="transpose" compatibility="7.6.001" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
    <operator activated="true" class="split" compatibility="7.6.001" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="id"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="split_pattern" value="\][.]"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="581" y="34">
    <parameter key="replace_what" value="\["/>
    </operator>
    <operator activated="true" class="pivot" compatibility="7.6.001" expanded="true" height="82" name="Pivot" width="90" x="715" y="34">
    <parameter key="group_attribute" value="id_1"/>
    <parameter key="index_attribute" value="id_2"/>
    <parameter key="consider_weights" value="false"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="7.6.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="835" y="34">
    <parameter key="replace_what" value=".*1[_]"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

  • eduard_wagner
    eduard_wagner New Altair Community Member

    It works, awesome, thanks @sgenzer! I see, the pivot operator groups according to the id attribute and when pivoting the missing values are shown as question marks. 

  • danhlee
    danhlee New Altair Community Member

    Where do you place this XML code?

  • danhlee
    danhlee New Altair Community Member

    Hey @sgenzer

     

    Thanks so much for the info! I'll check it out now.