Bug with Loop Values?

User: "pari1234"
New Altair Community Member
Updated by Jocelyn

Hi RM team, I'm trying to call Facebook graph API using Enrich Data by Webservice operator which I'm using inside the Loop Values operator that outputs a collection of documents. Input data is a csv with a bunch of facebook business page usernames. Basically, as far as I understand, the Loop Values operator is supposed to grab each username and return me some facebook content for each handle, but -

 

  • it is only doing that partially
  • each document in the collection from Loop Values should only contain data for one username however it contains all the usernames and only one row of data per user.

Attached:

  1. RM process
  2. Input excel
  3. JSON output from facebook API from an API testing platform.

Any help will be greatly appreciated as I'm kind of on a deadline for this. Thank you.

 

PROCESS

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.5.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
<parameter key="excel_file" value="C:\Users\Pari\Documents\BDC\Socials\Facebook Scrapper\Test\TestHandles.xlsx"/>
<parameter key="imported_cell_range" value="A1:A5"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Username.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.5.001" expanded="true" height="82" name="Loop Values" width="90" x="313" y="34">
<parameter key="attribute" value="Username"/>
<process expanded="true">
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="message" value="$..message"/>
<parameter key="post id" value="$..id"/>
</list>
<parameter key="url" value="https://graph.facebook.com/v2.10/&amp;lt;%Username%&amp;gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
<list key="request_properties"/>
<parameter key="encoding" value="UTF-8"/>
</operator>
<connect from_port="input 1" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

JSON Output

 

 

{
"data": [
{
"created_time": "2017-10-31T12:01:32+0000",
"message": "Click to read news on #Tableau latest conference.\n#BigData #Tech",
"id": "1563861787269208_1910035019318548"
},
{
"created_time": "2017-10-30T22:02:02+0000",
"message": "\"South Australia is about to get “Big Doctor”, cloud-based artificial intelligence that analyses our health and intervenes when it spots something amiss.\"-Brad Crouch",
"id": "1563861787269208_1909800592675324"
},
{
"created_time": "2017-10-30T21:21:00+0000",
"message": "Why you should welcome Artificial Intelligence with open arms",
"id": "1563861787269208_1909790786009638"
},
{
"created_time": "2017-10-30T12:00:59+0000",
"message": "\"AI will put bankers out of work? Some people think these advances will boost productivity, enabling industries to actually increase the number of jobs\"",
"id": "1563861787269208_1909600706028646"
},
{
"created_time": "2017-10-27T12:01:38+0000",
"message": "What's Elon Musks stance on Artificial Intelligence?",
"id": "1563861787269208_1908177749504275"
}
],
"paging": {
"cursors": {
"before": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5TXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9qazFPVEE1TURrME56STJOelkzTlRZAeU9ROE1ZAWEJwWDNOMGIzSjVYMmxrRHlFeE5UWXpPRFl4TnpnM01qWTVNakE0WHpFNU1UQXdNelV3TVRrek1UZAzFORGdQQkhScGJXVUdXZAmhtSEFFPQZDZD",
"after": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
},
"next": "https://graph.facebook.com/v2.10/1563861787269208/posts?pretty=1&limit=5&after=Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
}
}

 

Sort by:
1 - 1 of 11
    User: "sgenzer"
    Altair Employee
    Accepted Answer

    ah I see.  Sorry about that.  :)  So this is a common challenge that we are currently working - parsing JSON arrays as a response to some webservice.  There are a couple of workarounds that you can use in the meanwhile...converting to XML is probably the easiest.  RapidMiner handles XML much, much better than JSON in its current version.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/TestHandles.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Username.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="246" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="jsonResponse" value=".*"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries">
    <parameter key="message" value="$..message"/>
    <parameter key="post id" value="$..id"/>
    </list>
    <parameter key="url" value="https://graph.facebook.com/v2.10/&amp;lt;%Username%&amp;gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
    <list key="request_properties"/>
    <parameter key="encoding" value="UTF-8"/>
    </operator>
    <operator activated="true" class="loop_examples" compatibility="7.6.001" expanded="true" height="103" name="Loop Examples" width="90" x="380" y="34">
    <process expanded="true">
    <operator activated="true" class="filter_example_range" compatibility="7.6.001" expanded="true" height="82" name="Filter Example Range" width="90" x="45" y="34">
    <parameter key="first_example" value="%{example}"/>
    <parameter key="last_example" value="%{example}"/>
    </operator>
    <operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="179" y="34">
    <parameter key="select_attributes_and_weights" value="true"/>
    <list key="specify_weights">
    <parameter key="jsonResponse" value="1.0"/>
    </list>
    </operator>
    <operator activated="true" class="text:combine_documents" compatibility="7.5.000" expanded="true" height="82" name="Combine Documents" width="90" x="313" y="34"/>
    <operator activated="true" class="web:json_to_xml" compatibility="7.3.000" expanded="true" height="68" name="JSON to XML" width="90" x="447" y="34"/>
    <operator activated="true" class="text:write_document" compatibility="7.5.000" expanded="true" height="82" name="Write Document" width="90" x="581" y="34">
    <parameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
    </operator>
    <operator activated="true" class="advanced_file_connectors:read_xml" compatibility="7.6.001" expanded="true" height="68" name="Read XML" width="90" x="715" y="34">
    <parameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
    <parameter key="xpath_for_examples" value="//json/data"/>
    <enumeration key="xpaths_for_attributes">
    <parameter key="xpath_for_attribute" value="created_time[1]/text()"/>
    <parameter key="xpath_for_attribute" value="id[1]/text()"/>
    <parameter key="xpath_for_attribute" value="message[1]/text()"/>
    </enumeration>
    <list key="namespaces"/>
    <parameter key="use_default_namespace" value="false"/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="created_time[1]/text().true.attribute_value.attribute"/>
    <parameter key="1" value="id[1]/text().true.attribute_value.attribute"/>
    <parameter key="2" value="message[1]/text().true.attribute_value.attribute"/>
    </list>
    </operator>
    <connect from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
    <connect from_op="Data to Documents" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
    <connect from_op="Combine Documents" from_port="document" to_op="JSON to XML" to_port="document"/>
    <connect from_op="JSON to XML" from_port="document" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read XML" to_port="file"/>
    <connect from_op="Read XML" from_port="output" to_port="output 1"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Union Append" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" class="loop_collection" compatibility="7.6.001" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <process expanded="true">
    <operator activated="false" breakpoints="after" class="select" compatibility="7.6.001" expanded="true" height="68" name="Select (5)" width="90" x="112" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <operator activated="true" class="branch" compatibility="7.6.001" expanded="true" height="82" name="Branch (2)" width="90" x="313" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{iteration}==1"/>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.6.001" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
    <parameter key="name" value="LoopData"/>
    </operator>
    <operator activated="true" class="union" compatibility="7.6.001" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
    <connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
    <connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
    <connect from_op="Union (2)" from_port="union" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="remember" compatibility="7.6.001" expanded="true" height="68" name="Remember (5)" width="90" x="581" y="34">
    <parameter key="name" value="LoopData"/>
    </operator>
    <connect from_port="single" to_op="Branch (2)" to_port="condition"/>
    <connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
    <connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select" compatibility="7.6.001" expanded="true" height="68" name="Select (6)" width="90" x="179" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
    <connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
    <connect from_op="Select (6)" from_port="selected" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
    <connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_op="Loop Examples" to_port="example set"/>
    <connect from_op="Loop Examples" from_port="output 1" to_op="Union Append" to_port="in 1"/>
    <connect from_op="Union Append" from_port="out 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Is this better?


    Scott