ETL standard text modification

Neon67
Neon67 New Altair Community Member
edited November 5 in Community Q&A
Hello

What are the options for preparing a dataset within Rapidminer Studio when it comes to string attributes ?.

I know I can split and Combine, but is there a way to remove White space, transform into upper lower or capital case etc, with and without generating a new attribute. Is there a dedicated Operator for this or do I have to use the Script Operator ?

I there an other Operator showing pre build function or any kind of help when comming to scripting, maccros, or regular expressions ?

thanks  ;D

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    Generate Attribute has quite some text function. Further there is of course the text mining extension.

    ~Martin
  • David_A
    David_A New Altair Community Member
    Hi,

    Generating Attribute is indeed the best way to do this.
    What you can do if you don't want to create new attributes, is just to use the old attribute name and it will overrides the values.

    For pre-build functions take a look at the tutorial process in the help to the Generate Attributes operator, it shows a lot of examples what you can do with macros and expressions.

    Best,
    David
  • Neon67
    Neon67 New Altair Community Member
    Thanks all for the information

    I have a last question though regarding simple and more complex cross table, I saw the example with pivoting operator, and also check de pivot and transpose.
    I am unable to create the same kind form on the left with the input data on the right of this picture

    image


    Any hint how to proceed ?  ???

    Thanks
  • MartinLiebig
    MartinLiebig
    Altair Employee
    I think you need to combine aggreagte and pivot in some way to do this. I would need to sit down to get this.

    OWC's stats extension (commerical) includes an operator which should do the trick. See: https://oldworldcomputing.com/products/statistics-extension-for-rapidminer

    Best,\Martin

    Edit: Apparently there is a demo process, see http://www.myexperiment.org/workflows/1488.html
  • Neon67
    Neon67 New Altair Community Member
    Thanks for the tip
    I reviewed the second link and the problem is that he is doing an aggregation, and in my case I shouldn't have one.
    I am looking for a free and quickest solution, the less operation needed, the better actually.

    If you have any other suggestions please feel free to share  :)
  • JEdward
    JEdward New Altair Community Member
    Here you go.  The trick is to use depivot with the right regular expression.  - just to confirm, this takes the cross-table from the Excel sheet and converts it into an exampleset as displayed on the left of your picture.

    ^(?!Type$|myID$).*$  -> this performs a negative lookahead for all atrribute names separated by | within the brackets (the $ tells it that the attribute name must end that way so "Type" is excluded from the depivot, but "Type2" is included.  Any other attributes get included in the depivoted operation.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="read_excel" compatibility="7.0.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">
           <parameter key="excel_file" value="C:\Users\user\Desktop\ExcelSummaryTable.xlsx"/>
           <list key="annotations"/>
           <list key="data_set_meta_data_information"/>
         </operator>
         <operator activated="true" class="transpose" compatibility="7.0.000" expanded="true" height="82" name="Transpose" width="90" x="179" y="136"/>
         <operator activated="true" class="rename_by_example_values" compatibility="7.0.000" expanded="true" height="82" name="Rename by Example Values" width="90" x="179" y="238"/>
         <operator activated="true" class="rename" compatibility="7.0.000" expanded="true" height="82" name="Rename" width="90" x="313" y="238">
           <parameter key="old_name" value="Country"/>
           <parameter key="new_name" value="Type"/>
           <list key="rename_additional_attributes">
             <parameter key="A" value="myID"/>
           </list>
         </operator>
         <operator activated="true" class="de_pivot" compatibility="7.0.000" expanded="true" height="82" name="De-Pivot" width="90" x="380" y="34">
           <list key="attribute_name">
             <parameter key="myValue" value="^(?!Type$|myID$).*$"/>
           </list>
           <parameter key="index_attribute" value="Country"/>
           <parameter key="create_nominal_index" value="true"/>
         </operator>
         <connect from_op="Read Excel" from_port="output" to_op="Transpose" to_port="example set input"/>
         <connect from_op="Transpose" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
         <connect from_op="Rename by Example Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
         <connect from_op="Rename" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
         <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • Neon67
    Neon67 New Altair Community Member
    Thanks a lot  :)