ETL standard text modification
Hello
What are the options for preparing a dataset within Rapidminer Studio when it comes to string attributes ?.
I know I can split and Combine, but is there a way to remove White space, transform into upper lower or capital case etc, with and without generating a new attribute. Is there a dedicated Operator for this or do I have to use the Script Operator ?
I there an other Operator showing pre build function or any kind of help when comming to scripting, maccros, or regular expressions ?
thanks ;D
What are the options for preparing a dataset within Rapidminer Studio when it comes to string attributes ?.
I know I can split and Combine, but is there a way to remove White space, transform into upper lower or capital case etc, with and without generating a new attribute. Is there a dedicated Operator for this or do I have to use the Script Operator ?
I there an other Operator showing pre build function or any kind of help when comming to scripting, maccros, or regular expressions ?
thanks ;D
Find more posts tagged with
Sort by:
1 - 7 of
71
Hi,
Generating Attribute is indeed the best way to do this.
What you can do if you don't want to create new attributes, is just to use the old attribute name and it will overrides the values.
For pre-build functions take a look at the tutorial process in the help to the Generate Attributes operator, it shows a lot of examples what you can do with macros and expressions.
Best,
David
Generating Attribute is indeed the best way to do this.
What you can do if you don't want to create new attributes, is just to use the old attribute name and it will overrides the values.
For pre-build functions take a look at the tutorial process in the help to the Generate Attributes operator, it shows a lot of examples what you can do with macros and expressions.
Best,
David
Thanks all for the information
I have a last question though regarding simple and more complex cross table, I saw the example with pivoting operator, and also check de pivot and transpose.
I am unable to create the same kind form on the left with the input data on the right of this picture

Any hint how to proceed ? ???
Thanks
I have a last question though regarding simple and more complex cross table, I saw the example with pivoting operator, and also check de pivot and transpose.
I am unable to create the same kind form on the left with the input data on the right of this picture

Any hint how to proceed ? ???
Thanks
I think you need to combine aggreagte and pivot in some way to do this. I would need to sit down to get this.
OWC's stats extension (commerical) includes an operator which should do the trick. See: https://oldworldcomputing.com/products/statistics-extension-for-rapidminer
Best,\Martin
Edit: Apparently there is a demo process, see http://www.myexperiment.org/workflows/1488.html
OWC's stats extension (commerical) includes an operator which should do the trick. See: https://oldworldcomputing.com/products/statistics-extension-for-rapidminer
Best,\Martin
Edit: Apparently there is a demo process, see http://www.myexperiment.org/workflows/1488.html
Here you go. The trick is to use depivot with the right regular expression. - just to confirm, this takes the cross-table from the Excel sheet and converts it into an exampleset as displayed on the left of your picture.
^(?!Type$|myID$).*$ -> this performs a negative lookahead for all atrribute names separated by | within the brackets (the $ tells it that the attribute name must end that way so "Type" is excluded from the depivot, but "Type2" is included. Any other attributes get included in the depivoted operation.
^(?!Type$|myID$).*$ -> this performs a negative lookahead for all atrribute names separated by | within the brackets (the $ tells it that the attribute name must end that way so "Type" is excluded from the depivot, but "Type2" is included. Any other attributes get included in the depivoted operation.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.0.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">
<parameter key="excel_file" value="C:\Users\user\Desktop\ExcelSummaryTable.xlsx"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="transpose" compatibility="7.0.000" expanded="true" height="82" name="Transpose" width="90" x="179" y="136"/>
<operator activated="true" class="rename_by_example_values" compatibility="7.0.000" expanded="true" height="82" name="Rename by Example Values" width="90" x="179" y="238"/>
<operator activated="true" class="rename" compatibility="7.0.000" expanded="true" height="82" name="Rename" width="90" x="313" y="238">
<parameter key="old_name" value="Country"/>
<parameter key="new_name" value="Type"/>
<list key="rename_additional_attributes">
<parameter key="A" value="myID"/>
</list>
</operator>
<operator activated="true" class="de_pivot" compatibility="7.0.000" expanded="true" height="82" name="De-Pivot" width="90" x="380" y="34">
<list key="attribute_name">
<parameter key="myValue" value="^(?!Type$|myID$).*$"/>
</list>
<parameter key="index_attribute" value="Country"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Transpose" to_port="example set input"/>
<connect from_op="Transpose" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
<connect from_op="Rename by Example Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
<connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Generate Attribute has quite some text function. Further there is of course the text mining extension.
~Martin