"Text Mining with Excel File"
Hi,
I have an excel file filled with e-mail adresses in one colomn. Now I want to add one column in which the adresses are grouped. For example @abc is group 1 @dfg is group 2 and so on. I thought about using text mining for the adresses but I already failed to switch the excel file in a document with data to documents.
Hoping for help.
Greetings,
Joshua
Best Answer
-
Hi,
so you want to extract the domain of an email address? If yes - you can do this with Replace. Attached is an example process.
Cheers,
Martin
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="8.0.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="85">
<list key="attribute_values">
<parameter key="mail" value=""name@domain.com""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_copy" compatibility="8.0.001" expanded="true" height="82" name="Generate Copy" width="90" x="313" y="85">
<parameter key="attribute_name" value="mail"/>
<parameter key="new_name" value="domain"/>
</operator>
<operator activated="true" class="replace" compatibility="8.0.001" expanded="true" height="82" name="Replace" width="90" x="447" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="domain"/>
<parameter key="replace_what" value=".+@(.+)"/>
<parameter key="replace_by" value="$1"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Copy" to_port="example set input"/>
<connect from_op="Generate Copy" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1
Answers
-
Hi,
so you want to extract the domain of an email address? If yes - you can do this with Replace. Attached is an example process.
Cheers,
Martin
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="8.0.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="85">
<list key="attribute_values">
<parameter key="mail" value=""name@domain.com""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_copy" compatibility="8.0.001" expanded="true" height="82" name="Generate Copy" width="90" x="313" y="85">
<parameter key="attribute_name" value="mail"/>
<parameter key="new_name" value="domain"/>
</operator>
<operator activated="true" class="replace" compatibility="8.0.001" expanded="true" height="82" name="Replace" width="90" x="447" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="domain"/>
<parameter key="replace_what" value=".+@(.+)"/>
<parameter key="replace_by" value="$1"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Copy" to_port="example set input"/>
<connect from_op="Generate Copy" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1 -
Hello Martin,
thank you for your answer. I think the Replace Operator is not the right one for my concern. In my excel list I have a lot of emails from different companies. Now want to add another column and group them. So that all emails with @company1 will get 1 and @company2 will get the no 2 in the new column.
0 -
Hi,
have a look at the process i have posted. It will give you a new attribute called domain with "company1.com" for the one and "company2.com" for the other.
Best,
Martin
1 -
Okay, thank you. How can I use/copy your process in my RapidMiner?
0 -
Hi,
have a look at this thread: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-can-I-share-processes-without-RapidMiner-Server/ta-p/37047 that describes it in details.
Cheers,
Martin
1 -
Thank you! Looks like a great solution, I will try!
0