[SEMI-SOLVED] Reading CSV file of unknown structure into purely nominal/text

tennenrishin
tennenrishin New Altair Community Member
edited November 2024 in Community Q&A
What is the easiest way to read a CSV file that has an unknown set (and number) of attributes (named in the first row), into an exampleset where each value is read simply as a nominal (or text) attribute?

My attempt,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
       <parameter key="csv_file" value="/blahblahblah/VTX.csv"/>
       <parameter key="column_separators" value=","/>
       <parameter key="parse_numbers" value="false"/>
       <list key="annotations"/>
       <list key="data_set_meta_data_information"/>
     </operator>
     <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
parses numeric-appearing data as numeric attributes.

Failing that, what is the easiest way to do it if the number of attributes is known (but not the names)?

My attempt:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
       <parameter key="csv_file" value="/blahblahblah/VTX.csv"/>
       <parameter key="column_separators" value=","/>
       <parameter key="parse_numbers" value="false"/>
       <list key="annotations"/>
       <list key="data_set_meta_data_information">
         <parameter key="0" value=".true.nominal.regular"/>
         <parameter key="1" value=".true.nominal.regular"/>
         <parameter key="2" value=".true.nominal.regular"/>
         <parameter key="3" value=".true.nominal.regular"/>
         <parameter key="4" value=".true.nominal.regular"/>
         <parameter key="5" value=".true.nominal.regular"/>
       </list>
     </operator>
     <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
only reads the last attribute and discards the rest.

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • tennenrishin
    tennenrishin New Altair Community Member
    Forgot to say please  ;D
  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    if you just use the CSV operator as in your first example, you can simply follow it up with a "Numerical to Polynominal" operator, set to include all attributes. Or if you like, you can even follow that one up with a "Nominal to Text" operator. After that, all your attributes are of the type 'Text'.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
           <parameter key="csv_file" value="/blahblahblah/VTX.csv"/>
           <parameter key="column_separators" value=","/>
           <parameter key="parse_numbers" value="false"/>
           <list key="annotations"/>
           <list key="data_set_meta_data_information"/>
         </operator>
         <operator activated="true" class="numerical_to_polynominal" compatibility="5.3.013" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="246" y="30"/>
         <operator activated="true" class="nominal_to_text" compatibility="5.3.013" expanded="true" height="76" name="Nominal to Text" width="90" x="380" y="30"/>
         <connect from_op="Read CSV" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
         <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
         <connect from_op="Nominal to Text" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Regards,
    Marco
  • tennenrishin
    tennenrishin New Altair Community Member
    Thanks Marco,

    but then "00005" ends up as "5", for example. I need plain text original attributes, and I don't know their names at design time. This seems like a very basic requirement, or am I missing something obvious?

    Regards,
    Isak
  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    unfortunately I think there is no out of the box way atm. I've modified your second process to at least do what you want:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="5.3.013" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
            <parameter key="csv_file" value="/blahblahblah/VTX.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="parse_numbers" value="false"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value=".true.nominal.attribute"/>
              <parameter key="1" value=".true.nominal.attribute"/>
              <parameter key="2" value=".true.nominal.attribute"/>
              <parameter key="3" value=".true.nominal.attribute"/>
              <parameter key="4" value=".true.nominal.attribute"/>
              <parameter key="5" value=".true.nominal.attribute"/>
            </list>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • tennenrishin
    tennenrishin New Altair Community Member
    Thanks!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.