A program to recognize and reward our most engaged community members
Igor,I tried your suggestion of using the Enrich data by WebService" operator to create such atribute, however I am not sure about:
1. What quesry type to use
2. and what the regular expression would have to look like for this to work.
I do have a API key from detectlanguage key and I am able to pass data to the detectlanguage.com service. Now teh question is how do I get the value from languge parsed out.
Thanks in advance for your help.
hello @tibi - welcome to the community. This is an old thread but maybe I can help? Can you please post your XML process (see instructions on the right)?
Thanks.
Scott
Thank you for writing back. Atatched is my XML code. I edited teh code so that it does not show my API key.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Samorincan_facebook_statuses - orig short" width="90" x="313" y="238"> <parameter key="repository_entry" value="//Facebooklanguage/data/Samorincan_facebook_statuses - orig short"/> </operator> <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="581" y="238"> <parameter key="query_type" value="Regular Expression"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"/> <list key="regular_region_queries"/> <list key="xpath_queries"/> <list key="namespaces"/> <list key="index_queries"/> <list key="jsonpath_queries"/> <parameter key="url" value="http://ws.detectlanguage.com/0.2/detect?q=&lt;%status_message%&gt;&amp;key=MYKYHERE"/> <parameter key="delay" value="1"/> <list key="request_properties"/> </operator> <connect from_op="Retrieve Samorincan_facebook_statuses - orig short" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/> <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>
hello @tibi - looks like an encoding issue. Give this a try (again deleting API key):
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="generate_data_user_specification" compatibility="8.0.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34"> <list key="attribute_values"> <parameter key="message" value=""buenos dias señor""/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="179" y="34"> <parameter key="url_attribute" value="message"/> <parameter key="encoding" value="UTF-8"/> </operator> <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="34"> <parameter key="query_type" value="JsonPath"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"> <parameter key="foo" value=".*"/> </list> <list key="regular_region_queries"/> <list key="xpath_queries"/> <list key="namespaces"/> <list key="index_queries"/> <list key="jsonpath_queries"> <parameter key="language" value="$..language"/> <parameter key="isReliable" value="$..isReliable"/> <parameter key="confidence" value="$..confidence"/> </list> <parameter key="url" value="http://ws.detectlanguage.com/0.2/detect?q=&lt;%message%&gt;&amp;key=YOUR-KEY-HERE"/> <list key="request_properties"/> </operator> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Encode URLs" to_port="example set input"/> <connect from_op="Encode URLs" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/> <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator></process>
Scott,
Yes. That is waht it was. Thank you!
One more thing. When I have text string with two languages in it, the API on the web actaully returns 2 sets of values for language, isReliable and confidence. I actually need these values. Here is an example what gets returned by the API in this situation:
data detections 0 language "sk"isReliable trueconfidence 13.381 language "hu"isReliable falseconfidence 14.68
I assume I have to edit the jsonpath queries for the Enrich Data by Webservice operator. Any suggestions, please?
Thanks,
Tibor
ok I think that would be fine but...can you please give me a text string that will give that result?
[EDIT: ok I got a snippet from the DetectLanguage site. So I have never found a reliable way to parse JSON beyond simple ways using that operator so, strangely enough, I find it more straightforward to convert to XML and go from there. It looks completely bizarre but until RapidMiner makes a good Read JSON operator, this is what I have found works best for me.]
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="generate_data_user_specification" compatibility="8.0.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34"> <list key="attribute_values"> <parameter key="message" value=""jak sie jambo prosze bardzo""/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="179" y="34"> <parameter key="url_attribute" value="message"/> <parameter key="encoding" value="UTF-8"/> </operator> <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="34"> <parameter key="query_type" value="Regular Expression"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"> <parameter key="foo" value=".*"/> </list> <list key="regular_region_queries"/> <list key="xpath_queries"/> <list key="namespaces"/> <list key="index_queries"/> <list key="jsonpath_queries"> <parameter key="language" value="$..language"/> <parameter key="isReliable" value="$..isReliable"/> <parameter key="confidence" value="$..confidence"/> </list> <parameter key="url" value="http://ws.detectlanguage.com/0.2/detect?q=&lt;%message%&gt;&amp;key=e3ee4a9dd9b7fe4fd597f363a8a2d02e"/> <list key="request_properties"/> </operator> <operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="447" y="34"> <parameter key="select_attributes_and_weights" value="true"/> <list key="specify_weights"> <parameter key="foo" value="1.0"/> </list> </operator> <operator activated="true" class="text:combine_documents" compatibility="7.5.000" expanded="true" height="82" name="Combine Documents" width="90" x="581" y="34"/> <operator activated="false" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="581" y="136"> <parameter key="text" value="{ "data":{ "detections":[ { "isReliable":true, "confidence":39.45, "language":"es" }, { "isReliable":false, "confidence":3.08, "language":"pt" } ] } }"/> </operator> <operator activated="true" class="web:json_to_xml" compatibility="7.3.000" expanded="true" height="68" name="JSON to XML" width="90" x="715" y="34"/> <operator activated="true" class="text:write_document" compatibility="7.5.000" expanded="true" height="82" name="Write Document" width="90" x="849" y="34"/> <operator activated="true" class="advanced_file_connectors:read_xml" compatibility="8.0.001" expanded="true" height="68" name="Read XML" width="90" x="981" y="85"> <parameter key="file" value="/Users/genzerconsulting/Desktop/Untitled 3.xml"/> <parameter key="xpath_for_examples" value="//json"/> <enumeration key="xpaths_for_attributes"> <parameter key="xpath_for_attribute" value="data[1]/detections[1]/isReliable[1]/text()"/> <parameter key="xpath_for_attribute" value="data[1]/detections[1]/confidence[1]/text()"/> <parameter key="xpath_for_attribute" value="data[1]/detections[1]/language[1]/text()"/> <parameter key="xpath_for_attribute" value="data[1]/detections[2]/isReliable[1]/text()"/> <parameter key="xpath_for_attribute" value="data[1]/detections[2]/confidence[1]/text()"/> <parameter key="xpath_for_attribute" value="data[1]/detections[2]/language[1]/text()"/> </enumeration> <list key="namespaces"/> <parameter key="use_default_namespace" value="false"/> <list key="annotations"/> <list key="data_set_meta_data_information"> <parameter key="0" value="isReliable[1].true.nominal.attribute"/> <parameter key="1" value="confidence[1].true.numeric.attribute"/> <parameter key="2" value="language[1].true.nominal.attribute"/> <parameter key="3" value="isReliable[2].true.nominal.attribute"/> <parameter key="4" value="/confidence[2].true.numeric.attribute"/> <parameter key="5" value="language[2].true.nominal.attribute"/> </list> </operator> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Encode URLs" to_port="example set input"/> <connect from_op="Encode URLs" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/> <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_op="Data to Documents" to_port="example set"/> <connect from_op="Data to Documents" from_port="documents" to_op="Combine Documents" to_port="documents 1"/> <connect from_op="Combine Documents" from_port="document" to_op="JSON to XML" to_port="document"/> <connect from_op="JSON to XML" from_port="document" to_op="Write Document" to_port="document"/> <connect from_op="Write Document" from_port="file" to_op="Read XML" to_port="file"/> <connect from_op="Read XML" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>