Generate Attribute with number of specific words in a cell
hi
I am a newbie . . .
I have some excel data and I want to Generate new Attribute with a number of specific word in it, without tokenizing or anything like that
how should I calculate that? can I do this with a regular expression?
for example, I want to count the word "out" in column star and generate new column with the number of the word "out" in it.
sorry for my English
Find more posts tagged with
Tnx a lot for your help, it worked very well . . .
but if I have more than one attr column and I want to generate attr for each one it doesn't work with subset attr selection and it'll be too complicated
is there a simple solution for this?
and I have one more Q
how can I count all of the words in one cell and put it in a new column (like this pic)
thank U
Hi @neginz,
Here a process which perform what you want to do :
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.2.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Extract_Count_Word\Extract_Count_Word.xlsx"/>
<parameter key="imported_cell_range" value="A1:B5"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="star.true.polynominal.attribute"/>
<parameter key="1" value="out.true.attribute_value.attribute"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="out" value="star"/>
</list>
</operator>
<operator activated="true" class="set_macro" compatibility="8.2.001" expanded="true" height="82" name="Set word to count" width="90" x="313" y="34">
<parameter key="macro" value="word"/>
<parameter key="value" value="out"/>
</operator>
<operator activated="true" class="replace" compatibility="8.2.001" expanded="true" height="82" name="Replace" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="out"/>
<parameter key="replace_what" value="\b(?!%{word}\b)\w+"/>
</operator>
<operator activated="true" class="replace" compatibility="8.2.001" expanded="true" height="82" name="Replace (2)" width="90" x="581" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="out"/>
<parameter key="replace_what" value=","/>
</operator>
<operator activated="true" class="split" compatibility="8.2.001" expanded="true" height="82" name="Split" width="90" x="715" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="out"/>
<parameter key="split_pattern" value="[" "]|[" "]"/>
</operator>
<operator activated="true" class="replace" compatibility="8.2.001" expanded="true" height="82" name="Replace (3)" width="90" x="849" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="attribute" value="star"/>
<parameter key="regular_expression" value="out.*"/>
<parameter key="replace_what" value="[" "]"/>
<parameter key="replace_by" value="?"/>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="8.2.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="983" y="34">
<parameter key="attribute_name" value="out"/>
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="attribute" value="out"/>
<parameter key="regular_expression" value="out.*"/>
<parameter key="value_type" value="nominal"/>
<parameter key="aggregation_function" value="count"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="1184" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="out.*"/>
<parameter key="use_except_expression" value="true"/>
<parameter key="except_regular_expression" value="out"/>
<parameter key="invert_selection" value="true"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Set word to count" to_port="through 1"/>
<connect from_op="Set word to count" from_port="through 1" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
<connect from_op="Replace (2)" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
<connect from_op="Replace (3)" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
<connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
You have to set the word to count in the parameters of Set Macro operator (in your case out) :
I hope it helps,
Regards,
Lionel
Hi @neginz,
Here a process which perform what you want to do :
You have to set the word to count in the parameters of Set Macro operator (in your case out) :
I hope it helps,
Regards,
Lionel