CSV with uncommon header can't be processed correctly
mugicagonzalez_
New Altair Community Member
Hi all,
I am using the "Read CSV" operator to read a CSV-file with multiple lines. The problem is that the first few lines are all technical information that are not in a valid CSV format, so I define them as Comment. But then, only column one of the last row with the values is read.
Is this a common error? I think it might be caused because there are more lines, with different amount of columns, but because I define these as Comment I don't understand why it doesn't work.
This is my operator for "TEST_Jette.csv"
Thanks in advance
Pello
I am using the "Read CSV" operator to read a CSV-file with multiple lines. The problem is that the first few lines are all technical information that are not in a valid CSV format, so I define them as Comment. But then, only column one of the last row with the values is read.
Is this a common error? I think it might be caused because there are more lines, with different amount of columns, but because I define these as Comment I don't understand why it doesn't work.
This is my operator for "TEST_Jette.csv"
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="read_csv" compatibility="8.1.003" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34"> <parameter key="csv_file" value="/Users/pello/Downloads/TEST_Jette.csv"/> <parameter key="skip_comments" value="true"/> <parameter key="parse_numbers" value="false"/> <parameter key="decimal_character" value=","/> <parameter key="first_row_as_names" value="false"/> <list key="annotations"> <parameter key="0" value="Comment"/> <parameter key="1" value="Comment"/> <parameter key="2" value="Comment"/> <parameter key="3" value="Comment"/> <parameter key="4" value="Comment"/> <parameter key="5" value="Comment"/> <parameter key="6" value="Comment"/> <parameter key="7" value="Comment"/> <parameter key="8" value="Comment"/> <parameter key="9" value="Comment"/> <parameter key="10" value="Comment"/> <parameter key="11" value="Comment"/> <parameter key="12" value="Comment"/> <parameter key="13" value="Comment"/> <parameter key="14" value="Comment"/> <parameter key="15" value="Comment"/> <parameter key="16" value="Comment"/> <parameter key="17" value="Name"/> </list> <parameter key="encoding" value="UTF-8"/> <parameter key="read_all_values_as_polynominal" value="true"/> <list key="data_set_meta_data_information"> <parameter key="0" value="timestamp.true.polynominal.attribute"/> </list> </operator> <connect from_op="Read CSV" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Thanks in advance
Pello
1
Best Answer
-
SOLVED! Thanks to to jczgalla (can't post link to thread)!
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="open_file" compatibility="8.1.003" expanded="true" height="68" name="Open File" width="90" x="45" y="34"> <parameter key="filename" value="/Users/pello/Downloads/TEST_Jette.csv"/> </operator> <operator activated="true" class="text:read_document" compatibility="8.1.000" expanded="true" height="68" name="Read Document" width="90" x="179" y="34"> <parameter key="extract_text_only" value="false"/> </operator> <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="313" y="34"> <parameter key="query_type" value="Regular Expression"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"> <parameter key="text" value="((?:[^"]+?|"(.|\n)*?"|)*?)\n"/> </list> <list key="regular_region_queries"/> <list key="xpath_queries"/> <list key="namespaces"/> <list key="index_queries"/> <list key="jsonpath_queries"/> <process expanded="true"> <operator activated="true" class="text:remove_document_parts" compatibility="8.1.000" expanded="true" height="68" name="Remove Document Parts" width="90" x="45" y="34"> <parameter key="deletion_regex" value="""/> </operator> <connect from_port="segment" to_op="Remove Document Parts" to_port="document"/> <connect from_op="Remove Document Parts" from_port="document" to_port="document 1"/> <portSpacing port="source_segment" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="447" y="34"> <parameter key="text_attribute" value="text"/> </operator> <operator activated="true" class="select_attributes" compatibility="8.1.003" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="text"/> </operator> <operator activated="true" class="filter_example_range" compatibility="8.1.003" expanded="true" height="82" name="Filter Example Range" width="90" x="715" y="34"> <parameter key="first_example" value="18"/> <parameter key="last_example" value="19"/> </operator> <operator activated="true" class="split" compatibility="8.1.003" expanded="true" height="82" name="Split" width="90" x="849" y="34"> <parameter key="split_pattern" value=";"/> </operator> <operator activated="true" class="rename_by_example_values" compatibility="8.1.003" expanded="true" height="82" name="Rename by Example Values" width="90" x="983" y="34"/> <connect from_op="Open File" from_port="file" to_op="Read Document" to_port="file"/> <connect from_op="Read Document" from_port="output" to_op="Cut Document" to_port="document"/> <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/> <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/> <connect from_op="Filter Example Range" from_port="example set output" to_op="Split" to_port="example set input"/> <connect from_op="Split" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/> <connect from_op="Rename by Example Values" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
2
Answers
-
SOLVED! Thanks to to jczgalla (can't post link to thread)!
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="open_file" compatibility="8.1.003" expanded="true" height="68" name="Open File" width="90" x="45" y="34"> <parameter key="filename" value="/Users/pello/Downloads/TEST_Jette.csv"/> </operator> <operator activated="true" class="text:read_document" compatibility="8.1.000" expanded="true" height="68" name="Read Document" width="90" x="179" y="34"> <parameter key="extract_text_only" value="false"/> </operator> <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="313" y="34"> <parameter key="query_type" value="Regular Expression"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"> <parameter key="text" value="((?:[^"]+?|"(.|\n)*?"|)*?)\n"/> </list> <list key="regular_region_queries"/> <list key="xpath_queries"/> <list key="namespaces"/> <list key="index_queries"/> <list key="jsonpath_queries"/> <process expanded="true"> <operator activated="true" class="text:remove_document_parts" compatibility="8.1.000" expanded="true" height="68" name="Remove Document Parts" width="90" x="45" y="34"> <parameter key="deletion_regex" value="""/> </operator> <connect from_port="segment" to_op="Remove Document Parts" to_port="document"/> <connect from_op="Remove Document Parts" from_port="document" to_port="document 1"/> <portSpacing port="source_segment" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="447" y="34"> <parameter key="text_attribute" value="text"/> </operator> <operator activated="true" class="select_attributes" compatibility="8.1.003" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="text"/> </operator> <operator activated="true" class="filter_example_range" compatibility="8.1.003" expanded="true" height="82" name="Filter Example Range" width="90" x="715" y="34"> <parameter key="first_example" value="18"/> <parameter key="last_example" value="19"/> </operator> <operator activated="true" class="split" compatibility="8.1.003" expanded="true" height="82" name="Split" width="90" x="849" y="34"> <parameter key="split_pattern" value=";"/> </operator> <operator activated="true" class="rename_by_example_values" compatibility="8.1.003" expanded="true" height="82" name="Rename by Example Values" width="90" x="983" y="34"/> <connect from_op="Open File" from_port="file" to_op="Read Document" to_port="file"/> <connect from_op="Read Document" from_port="output" to_op="Cut Document" to_port="document"/> <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/> <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/> <connect from_op="Filter Example Range" from_port="example set output" to_op="Split" to_port="example set input"/> <connect from_op="Split" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/> <connect from_op="Rename by Example Values" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
2 -
yes sorry @mugicagonzalez_ we don't allow "Newbies" to use hyperlinks any more due to high numbers of clickbait spammers.
[Helpful hint from community manager - if you just "like" a few posts or mark something as solution or practically anything else, you will gain points and move way beyond Newbie quickly!!]
Scott
3 -
Hi,
RapidMiner Studio 9.1 will feature a better way of skipping lines and defining the header row in combination with the structural changes that come with it. So the workaround above will soon be no longer necessary.
Regards,
Marco1 -
Already answered.
0