Federalist Papers

btibert
btibert New Altair Community Member
edited November 2024 in Community Q&A
Has anyone had success in bringing in the federalist papers dataset?  The JSON form can be found here: http://ptrckprry.com/course/ssd/data/federalist.json

These are the following steps I have attempted:

- Parse the json into a csv, but the new line character seems to be getting stuck when using read csv
- Using python extension operator configured to a local conda environment.  Same result

Regarding point 2 above, in pandas outside of the RM, the dataframe is exactly what I wanted.  

For context, I use this in class to show we can use text and the similarity within to classify the author.

import pandas as pd
URL = "http://ptrckprry.com/course/ssd/data/federalist.json"
fed = pd.read_json(URL, lines=True)
fed.head()
Tagged:

Best Answer

  • btibert
    btibert New Altair Community Member
    Answer ✓
    Thanks, I was actually able to port another file from SAS using Read SAS which did the trick.

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @btibert,

    As a partial answer, have you tried Read Document (after downloading the federalist.json file on your computer) and JSON to Data operators ?

    Below, the process.

    Hope this helps,

    Regards,

    Lionel

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="text:read_document" compatibility="8.2.000" expanded="true" height="68" name="Read Document" width="90" x="179" y="136">
            <parameter key="file" value="C:\Users\Lionel\Desktop\json.json"/>
            <parameter key="extract_text_only" value="true"/>
            <parameter key="use_file_extension_as_type" value="true"/>
            <parameter key="content_type" value="txt"/>
            <parameter key="encoding" value="SYSTEM"/>
          </operator>
          <operator activated="true" class="text:json_to_data" compatibility="8.2.000" expanded="true" height="82" name="JSON To Data" width="90" x="380" y="136">
            <parameter key="ignore_arrays" value="false"/>
            <parameter key="limit_attributes" value="false"/>
            <parameter key="skip_invalid_documents" value="false"/>
            <parameter key="guess_data_types" value="true"/>
            <parameter key="keep_missing_attributes" value="false"/>
            <parameter key="missing_values_aliases" value=", null, NaN, missing"/>
          </operator>
          <connect from_op="Read Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
          <connect from_op="JSON To Data" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • btibert
    btibert New Altair Community Member
    Thanks for this, but right now that only parses the first entry.  There are ~85 or so entries
  • btibert
    btibert New Altair Community Member
    Answer ✓
    Thanks, I was actually able to port another file from SAS using Read SAS which did the trick.
  • sgenzer
    sgenzer
    Altair Employee
    @btibert the JSON parsing on "JSON to Data" is pretty terrible. I'd strongly recommend trying OWC's "Web Automation" extension which is much more powerful.

    Scott