Converting text file containing articles into solr index json format
Vicky
New Altair Community Member
Hi Folks,
I'm creating a chatbot to retrieve content from an article. I have about 10 text files. When I tried using solr, it's accepting json/xml with key/value pair format in it.
How do I convert the text to this format?
Please help.
I'm creating a chatbot to retrieve content from an article. I have about 10 text files. When I tried using solr, it's accepting json/xml with key/value pair format in it.
How do I convert the text to this format?
Please help.
Tagged:
0
Answers
-
I don't have one. It's a general blog article collection. Wondering how that can be converted.
One sample I see in solr example is[{"id" : "978-0641723445","cat" : ["book","hardcover"],"name" : "The Lightning Thief","author" : "Rick Riordan","series_t" : "Percy Jackson and the Olympians","sequence_i" : 1,"genre_s" : "fantasy","inStock" : true,"price" : 12.50,"pages_i" : 384}0 -
yep so you just build it. Let me see if I can build this example for you so you can see...
0 -
ok this is everything except for the 'cat' field which can be built in a similar way if you understand what I'm doing here:
<?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" breakpoints="after" class="utility:create_exampleset" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="187"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="America/New_York"/> <parameter key="input_csv_text" value="id,name,author,series_t,sequence_i,genre_s,inStock,price,pages_i 978-0641723445,The Lightning Thief,Rick Riordan,Percy Jackson and the Olympians,1,fantasy,true,12.50,384"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> <description align="center" color="transparent" colored="false" width="126">everything except cat</description> </operator> <operator activated="true" class="text:data_to_json" compatibility="8.2.000" expanded="true" height="82" name="Data To JSON" width="90" x="179" y="187"> <parameter key="ignore_arrays" value="false"/> <parameter key="generate_array" value="false"/> <parameter key="include_missing_values" value="false"/> </operator> <operator activated="true" class="text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document" width="90" x="179" y="34"> <parameter key="text" value="["/> <parameter key="add label" value="false"/> <parameter key="label_type" value="nominal"/> <description align="center" color="transparent" colored="false" width="126">[</description> </operator> <operator activated="true" class="text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document (2)" width="90" x="179" y="340"> <parameter key="text" value="]"/> <parameter key="add label" value="false"/> <parameter key="label_type" value="nominal"/> <description align="center" color="transparent" colored="false" width="126">[</description> </operator> <operator activated="true" class="text:combine_documents" compatibility="8.2.000" expanded="true" height="124" name="Combine Documents" width="90" x="313" y="136"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Data To JSON" to_port="example set 1"/> <connect from_op="Data To JSON" from_port="documents" to_op="Combine Documents" to_port="documents 2"/> <connect from_op="Create Document" from_port="output" to_op="Combine Documents" to_port="documents 1"/> <connect from_op="Create Document (2)" from_port="output" to_op="Combine Documents" to_port="documents 3"/> <connect from_op="Combine Documents" from_port="document" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Scott
[PS nice choice of book - love Percy Jackson!]1 -
Thanks. About to travel for some hours. I'll check it out.1