"[SOLVED] Importing: text with fixed length attributes"
Find more posts tagged with
Sort by:
1 - 4 of
41
Hi Marcin,
I already had a look at the CSV reader but it requires the use of a delimiter.
The file I have has no such delimiters. As an example, assume I have a line:
AABBCCCC
In this case I have 3 attributes with lengths 2, 2 and 3 respectively.
The attribute value would be AA, BB and CCCC. Note that no separator
exists.
Write now I am preparing AWK scripts to deal with this but I assumed
Rapidminer can deal with this type of data easily.
Thanks for the feedback.
I already had a look at the CSV reader but it requires the use of a delimiter.
The file I have has no such delimiters. As an example, assume I have a line:
AABBCCCC
In this case I have 3 attributes with lengths 2, 2 and 3 respectively.
The attribute value would be AA, BB and CCCC. Note that no separator
exists.
Write now I am preparing AWK scripts to deal with this but I assumed
Rapidminer can deal with this type of data easily.
Thanks for the feedback.
Hello
You could read the text file using the regular expression "\r\n" to read each complete line.
Then use the operator Generate Extract to split each line into the required components using regular expressions.
Here's an example
Andrew
You could read the text file using the regular expression "\r\n" to read each complete line.
Then use the operator Generate Extract to split each line into the required components using regular expressions.
Here's an example
<?xml version="1.0" encoding="UTF-8" standalone="no"?>The text file contains this
<process version="5.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
<process expanded="true" height="679" width="841">
<operator activated="true" class="read_csv" compatibility="5.3.000" expanded="true" height="60" name="Read CSV" width="90" x="112" y="75">
<parameter key="csv_file" value="C:\logs\fixedwidth.txt"/>
<parameter key="column_separators" value=""\r\n""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="text:generate_extract" compatibility="5.3.000" expanded="true" height="60" name="Generate Extract" width="90" x="246" y="75">
<parameter key="source_attribute" value="att1"/>
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="a1" value="(.{2})"/>
<parameter key="a2" value="(?:.{2})(.{3})"/>
<parameter key="a3" value="(?:.{5})(.{3})"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Generate Extract" to_port="Example Set"/>
<connect from_op="Generate Extract" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
AABBCCCCand the result looks like this
ABCDBBDS
ABDBQBDD
AASHHFGU
a1 a2 a3regards
AA BBC CCC
AB CDB BDS
AB DBQ BDD
AA SHH FGU
Andrew
You probably need the "Read CSV" operator. This operator can read a structured data set from a text file. Use the wizard to import and configure this operator correctly. For instance it is important to specify the separator so the operator knows where the value for an attribute begins and stops.
Best regards
Marcin