replace hyphen
I am trying to replace a hyphen from a Grade attribute by using the Replace operator. I would like to replace it with text that describes no value has been entered (i.e., Not indicated). The problem is that the attribute includes values such as - (the hyphen I want to replace), A-, B-, C-. Using the replace operator replaces all of the hyphens (including those being used as minuses). I tried using the regular expression, \b[-]\b, but that is not working. I also tried, \b["-"]\b without success.
Find more posts tagged with
Sort by:
1 - 10 of
101
Hi @pb42 ,
in the Replace Operator you need to use the expression
^-$in the replace what parameter and replace it by Not indicated.
That way only the single hyphens are replaced and the minuses (i.e. A-, B-,...) are kept.
Short explanation:
RapidMiner uses the Java RegEx functions: The ^ represents the beginning of a line, the $ represents the end of a line.
Happy Mining,
Edin
Hi @sgnarkhede2016 ,
If I understood you correctly you want to have the entries in the Attributes completed by leading and trailing double quotes. Value => "Value"
In this case you replace:
^(.+)$by
"$1"Happy Mining,
Edin
P.S.:
The Operator Generate Attributes could have also been used. The expression would have been: "\"" + AttributeName + "\""where AttributeName would be the name of the Attribute which values you want to change.
@Edin_Klapic
Hello
I work on a data for a store and I want to analyze the basket of customers, for the name of columns I have alot of symbols and RM is not able to understand them also I can not replace all of them because they are in different types. Could you please tell me how can I solve it?
Also I think it can be useful if RM team can solve this problem for the next version of RM( Future request)
Thank you in advance
sara
Hello
I work on a data for a store and I want to analyze the basket of customers, for the name of columns I have alot of symbols and RM is not able to understand them also I can not replace all of them because they are in different types. Could you please tell me how can I solve it?
Also I think it can be useful if RM team can solve this problem for the next version of RM( Future request)
Thank you in advance
sara
Hi @sara20 ,
Although your problem is somewhat similar to the abovementioned "hyphen"-issue it affects Names of Attributes and not Attribute values.
Thus, I suggest for the future that you rather open a new thread in case the answers in a thread don't provide the help you need. That also makes it easier to find for users which might have a similar problem in the future.
You can use "Rename by Replacing" to replace certain patterns represented by Regular Expressions. But only 1 at a time.
So, unfortunately, the solution to your problem is not yet (as of version 9.6) a single Operator solution. Please find attached a quick solution using "Rename by Replacing" in loops together with some self created dictionary with which you are hopefully able to achieve your desired goal.
Happy Mining,
Edin
<?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.5.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="179" y="34">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="concurrency:loop_attributes" compatibility="9.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="attribute_name_macro" value="loop_attribute"/>
<parameter key="reuse_results" value="true"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="utility:create_exampleset" compatibility="9.5.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
<parameter key="generator_type" value="comma separated text"/>
<parameter key="number_of_examples" value="100"/>
<parameter key="use_stepsize" value="false"/>
<list key="function_descriptions"/>
<parameter key="add_id_attribute" value="false"/>
<list key="numeric_series_configuration"/>
<list key="date_series_configuration"/>
<list key="date_series_configuration (interval)"/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="input_csv_text" value="old,new o,- i,%"/>
<parameter key="column_separator" value=","/>
<parameter key="parse_all_as_nominal" value="true"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="trim_attribute_names" value="true"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (4)" width="90" x="246" y="85">
<parameter key="macro" value="number_of_examples"/>
<parameter key="macro_type" value="number_of_examples"/>
<parameter key="statistics" value="average"/>
<parameter key="attribute_name" value=""/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="concurrency:loop" compatibility="9.5.001" expanded="true" height="103" name="Loop (2)" width="90" x="380" y="187">
<parameter key="number_of_iterations" value="%{number_of_examples}"/>
<parameter key="iteration_macro" value="iteration"/>
<parameter key="reuse_results" value="true"/>
<parameter key="enable_parallel_execution" value="false"/>
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (5)" width="90" x="112" y="34">
<parameter key="macro" value="old_character"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="statistics" value="average"/>
<parameter key="attribute_name" value="old"/>
<parameter key="example_index" value="%{iteration}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (6)" width="90" x="246" y="34">
<parameter key="macro" value="new_character"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="statistics" value="average"/>
<parameter key="attribute_name" value="new"/>
<parameter key="example_index" value="%{iteration}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="delay" compatibility="9.5.001" expanded="true" height="103" name="only to ensure execution order (2)" width="90" x="447" y="85">
<parameter key="delay" value="none"/>
<parameter key="delay_amount" value="1000"/>
<parameter key="min_delay_amount" value="0"/>
<parameter key="max_delay_amount" value="1000"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="9.5.001" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="581" y="136">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="replace_what" value="%{old_character}"/>
<parameter key="replace_by" value="%{new_character}"/>
</operator>
<connect from_port="input 1" to_op="Extract Macro (5)" to_port="example set"/>
<connect from_port="input 2" to_op="only to ensure execution order (2)" to_port="through 2"/>
<connect from_op="Extract Macro (5)" from_port="example set" to_op="Extract Macro (6)" to_port="example set"/>
<connect from_op="Extract Macro (6)" from_port="example set" to_op="only to ensure execution order (2)" to_port="through 1"/>
<connect from_op="only to ensure execution order (2)" from_port="through 1" to_port="output 1"/>
<connect from_op="only to ensure execution order (2)" from_port="through 2" to_op="Rename by Replacing (2)" to_port="example set input"/>
<connect from_op="Rename by Replacing (2)" from_port="example set output" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Loop (2)" to_port="input 2"/>
<connect from_op="Create ExampleSet" from_port="output" to_op="Extract Macro (4)" to_port="example set"/>
<connect from_op="Extract Macro (4)" from_port="example set" to_op="Loop (2)" to_port="input 1"/>
<connect from_op="Loop (2)" from_port="output 2" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="147"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
<connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.5.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="179" y="34">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="concurrency:loop_attributes" compatibility="9.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="attribute_name_macro" value="loop_attribute"/>
<parameter key="reuse_results" value="true"/>
<parameter key="enable_parallel_execution" value="true"/>
<process expanded="true">
<operator activated="true" class="utility:create_exampleset" compatibility="9.5.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
<parameter key="generator_type" value="comma separated text"/>
<parameter key="number_of_examples" value="100"/>
<parameter key="use_stepsize" value="false"/>
<list key="function_descriptions"/>
<parameter key="add_id_attribute" value="false"/>
<list key="numeric_series_configuration"/>
<list key="date_series_configuration"/>
<list key="date_series_configuration (interval)"/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="input_csv_text" value="old,new o,- i,%"/>
<parameter key="column_separator" value=","/>
<parameter key="parse_all_as_nominal" value="true"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="trim_attribute_names" value="true"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (4)" width="90" x="246" y="85">
<parameter key="macro" value="number_of_examples"/>
<parameter key="macro_type" value="number_of_examples"/>
<parameter key="statistics" value="average"/>
<parameter key="attribute_name" value=""/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="concurrency:loop" compatibility="9.5.001" expanded="true" height="103" name="Loop (2)" width="90" x="380" y="187">
<parameter key="number_of_iterations" value="%{number_of_examples}"/>
<parameter key="iteration_macro" value="iteration"/>
<parameter key="reuse_results" value="true"/>
<parameter key="enable_parallel_execution" value="false"/>
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (5)" width="90" x="112" y="34">
<parameter key="macro" value="old_character"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="statistics" value="average"/>
<parameter key="attribute_name" value="old"/>
<parameter key="example_index" value="%{iteration}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (6)" width="90" x="246" y="34">
<parameter key="macro" value="new_character"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="statistics" value="average"/>
<parameter key="attribute_name" value="new"/>
<parameter key="example_index" value="%{iteration}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="delay" compatibility="9.5.001" expanded="true" height="103" name="only to ensure execution order (2)" width="90" x="447" y="85">
<parameter key="delay" value="none"/>
<parameter key="delay_amount" value="1000"/>
<parameter key="min_delay_amount" value="0"/>
<parameter key="max_delay_amount" value="1000"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="9.5.001" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="581" y="136">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="replace_what" value="%{old_character}"/>
<parameter key="replace_by" value="%{new_character}"/>
</operator>
<connect from_port="input 1" to_op="Extract Macro (5)" to_port="example set"/>
<connect from_port="input 2" to_op="only to ensure execution order (2)" to_port="through 2"/>
<connect from_op="Extract Macro (5)" from_port="example set" to_op="Extract Macro (6)" to_port="example set"/>
<connect from_op="Extract Macro (6)" from_port="example set" to_op="only to ensure execution order (2)" to_port="through 1"/>
<connect from_op="only to ensure execution order (2)" from_port="through 1" to_port="output 1"/>
<connect from_op="only to ensure execution order (2)" from_port="through 2" to_op="Rename by Replacing (2)" to_port="example set input"/>
<connect from_op="Rename by Replacing (2)" from_port="example set output" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Loop (2)" to_port="input 2"/>
<connect from_op="Create ExampleSet" from_port="output" to_op="Extract Macro (4)" to_port="example set"/>
<connect from_op="Extract Macro (4)" from_port="example set" to_op="Loop (2)" to_port="input 1"/>
<connect from_op="Loop (2)" from_port="output 2" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="147"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
<connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Sort by:
1 - 1 of
11
Hi @pb42 ,
in the Replace Operator you need to use the expression
^-$in the replace what parameter and replace it by Not indicated.
That way only the single hyphens are replaced and the minuses (i.e. A-, B-,...) are kept.
Short explanation:
RapidMiner uses the Java RegEx functions: The ^ represents the beginning of a line, the $ represents the end of a line.
Happy Mining,
Edin
Hello
This is very similar with your question
https://community.rapidminer.com/discussion/comment/63840#Comment_63840
I hope this helps
mbs