[SOLVED] How to add column names from a database table to a headerless text file
gvanvuuren
New Altair Community Member
I've got a standard csv text file, but without headers, which I want to write to an existing database schema. So first I load both the csv file and the database table into RM. Then I need to extract the column names from the existing database table. Then I need to rename the default column names assigned by RM to the loaded csv file by using the extracted column names from the database. And only then can I write the modified csv file back to the database.
The problem is that I cannot seem to join the db column names with the headerless csv columns. I've tried all the operators I could think of, but it doesn't seem possible to extract the db table's meta information into a seperate row in order to prepend the row to the csv file.
I'm sure such a simple ETL task shouldn't be a problem for RM. So I'd appreciate it if someone could point me in the right direction. Is there a way to extract meta-information from a table?
Thanks
Gideon
The problem is that I cannot seem to join the db column names with the headerless csv columns. I've tried all the operators I could think of, but it doesn't seem possible to extract the db table's meta information into a seperate row in order to prepend the row to the csv file.
I'm sure such a simple ETL task shouldn't be a problem for RM. So I'd appreciate it if someone could point me in the right direction. Is there a way to extract meta-information from a table?
Thanks
Gideon
Tagged:
0
Answers
-
Hey, I'm sorry, it's not that easy. You need some macro magic and a loop. Please see the attached process for details. In your setting, instead of Generate Data you would read the CSV file.
Maybe you can do the job also with some clever Transposings and Joins.
Best, Marius0 -
Hi Marius,
thanks. I appreciate the effort. Could you please re-attach the process? I cannot see it.
But I think what's missing is an operator that appends rows to a reference table based on the column indices (and types) of the reference table, instead of expecting column names to match. I've been meaning to try my hand at writing an operator, so maybe this is my opportunity
Best
Gideon0 -
Well, I can't see it neither. Probably someone ::) forgot to insert the process. Here you go:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
<process expanded="true" height="341" width="547">
<operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="10"/>
<parameter key="number_of_attributes" value="450"/>
</operator>
<operator activated="true" class="remember" compatibility="5.3.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
<parameter key="name" value="data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator activated="true" class="read_database" compatibility="5.3.000" expanded="true" height="60" name="Read Database" width="90" x="45" y="120">
<parameter key="connection" value="GezondeKas"/>
<parameter key="query" value="SHOW COLUMNS FROM WeeklyAggregatedValue;"/>
<enumeration key="parameters"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="5.3.000" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="120">
<parameter key="macro" value="attributeCount"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="loop" compatibility="5.3.000" expanded="true" height="76" name="Loop" width="90" x="313" y="120">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="iterations" value="%{attributeCount}"/>
<process expanded="true" height="439" width="1003">
<operator activated="true" class="extract_macro" compatibility="5.3.000" expanded="true" height="60" name="Extract Macro (2)" width="90" x="45" y="30">
<parameter key="macro" value="attributeName"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="Field"/>
<parameter key="example_index" value="%{iteration}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="recall" compatibility="5.3.000" expanded="true" height="60" name="Recall" width="90" x="179" y="30">
<parameter key="name" value="data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
<parameter key="old_name" value="att%{iteration}"/>
<parameter key="new_name" value="%{attributeName}"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="remember" compatibility="5.3.000" expanded="true" height="60" name="Remember (2)" width="90" x="648" y="30">
<parameter key="name" value="data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<connect from_port="input 1" to_op="Extract Macro (2)" to_port="example set"/>
<connect from_op="Recall" from_port="result" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Remember (2)" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="5.3.000" expanded="true" height="60" name="Recall (2)" width="90" x="447" y="120">
<parameter key="name" value="data"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
<connect from_op="Read Database" from_port="output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
<connect from_op="Recall (2)" from_port="result" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
I still haven't got it working quite yet, but I got the idea, thanks!
The operators "Extract Macro", "Set Macro", "Loop Attributes", and "Rename" will do the trick.0