Converting text file containing articles into solr index json format

Vicky
Vicky New Altair Community Member
edited November 2024 in Community Q&A
Hi Folks,
I'm creating a chatbot to retrieve content from an article. I have about 10 text files. When I tried using solr, it's accepting json/xml with key/value pair format in it.
How do I convert the text to this format?

Please help.

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • sgenzer
    sgenzer
    Altair Employee
    hi @Vicky yes that can be done pretty easily. Do you have the JSON format / sample JSON that Solr is looking for?
  • Vicky
    Vicky New Altair Community Member
    I don't have one. It's a general blog article collection. Wondering how that can be converted.

    One sample I see in solr example is 

    [
      {
        "id" : "978-0641723445",
        "cat" : ["book","hardcover"],
        "name" : "The Lightning Thief",
        "author" : "Rick Riordan",
        "series_t" : "Percy Jackson and the Olympians",
        "sequence_i" : 1,
        "genre_s" : "fantasy",
        "inStock" : true,
        "price" : 12.50,
        "pages_i" : 384
      }
  • sgenzer
    sgenzer
    Altair Employee
    yep so you just build it. Let me see if I can build this example for you so you can see...


  • sgenzer
    sgenzer
    Altair Employee
    ok this is everything except for the 'cat' field which can be built in a similar way if you understand what I'm doing here:

    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="utility:create_exampleset" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="187">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="America/New_York"/>
            <parameter key="input_csv_text" value="id,name,author,series_t,sequence_i,genre_s,inStock,price,pages_i&#10;978-0641723445,The Lightning Thief,Rick Riordan,Percy Jackson and the Olympians,1,fantasy,true,12.50,384"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
            <description align="center" color="transparent" colored="false" width="126">everything except cat</description>
          </operator>
          <operator activated="true" class="text:data_to_json" compatibility="8.2.000" expanded="true" height="82" name="Data To JSON" width="90" x="179" y="187">
            <parameter key="ignore_arrays" value="false"/>
            <parameter key="generate_array" value="false"/>
            <parameter key="include_missing_values" value="false"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document" width="90" x="179" y="34">
            <parameter key="text" value="["/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
            <description align="center" color="transparent" colored="false" width="126">[</description>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document (2)" width="90" x="179" y="340">
            <parameter key="text" value="]"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
            <description align="center" color="transparent" colored="false" width="126">[</description>
          </operator>
          <operator activated="true" class="text:combine_documents" compatibility="8.2.000" expanded="true" height="124" name="Combine Documents" width="90" x="313" y="136"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Data To JSON" to_port="example set 1"/>
          <connect from_op="Data To JSON" from_port="documents" to_op="Combine Documents" to_port="documents 2"/>
          <connect from_op="Create Document" from_port="output" to_op="Combine Documents" to_port="documents 1"/>
          <connect from_op="Create Document (2)" from_port="output" to_op="Combine Documents" to_port="documents 3"/>
          <connect from_op="Combine Documents" from_port="document" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Scott

    [PS nice choice of book - love Percy Jackson!]
  • Vicky
    Vicky New Altair Community Member
    Thanks. About to travel for some hours. I'll check it out.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.