"Dictionary Based Sentiment Analysis"
Hey guys,
i'm currently working on a dictionary based sentiment analysis from the Operator Toolbox. Everything works out fine so far, but in the end i cannot add a date as an attribute to my output. The model only allows "Text", "Score", "Positivity", "Negativity" and "uncovered token".
Is there a way to add the date from my dataset? So i want the output "Date", "Text", "Score",...
Best Answer
-
Hi,
i've sent you a version of the new operator via mail. I will add it to the marketplace toolbox if this is what you need.
Cheers,
Martin
1
Answers
-
Here is my process:
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess" width="90" x="112" y="340">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\Benedict\Desktop\Seminar\Dictionaries\Dictionary 2_SentiWS_mitFlexionen_final.xlsx"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="A1:D15633"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="A.false.attribute_value.attribute"/>
<parameter key="1" value="Wortstamm.true.text.attribute"/>
<parameter key="2" value="Worart.false.polynominal.attribute"/>
<parameter key="3" value="Weighting.true.numeric.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="136">
<parameter key="excel_file" value="C:\Users\Benedict\Desktop\Seminar\Dictionaries\Dictionary 2_SentiWS_mitFlexionen_final.xlsx"/>
<parameter key="sheet_number" value="2"/>
<parameter key="imported_cell_range" value="A1:D15650"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="A.false.attribute_value.attribute"/>
<parameter key="1" value="Wortstamm.true.text.attribute"/>
<parameter key="2" value="Worart.false.polynominal.attribute"/>
<parameter key="3" value="Weighting.true.numeric.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="179" y="34">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="112" y="34">
<parameter key="transform_to" value="lower case"/>
</operator>
<connect from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="179" y="187">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="112" y="34">
<parameter key="transform_to" value="lower case"/>
</operator>
<connect from_port="document" to_op="Transform Cases (3)" to_port="document"/>
<connect from_op="Transform Cases (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="text"/>
<parameter key="target_role" value="regular"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="7.6.001" expanded="true" height="82" name="Text to Nominal" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="text"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="text"/>
<parameter key="block_type" value="value_matrix"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="187">
<parameter key="attribute_name" value="text"/>
<parameter key="target_role" value="regular"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="7.6.001" expanded="true" height="82" name="Text to Nominal (2)" width="90" x="447" y="187">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="text"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="text"/>
<parameter key="block_type" value="value_matrix"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="103" name="Append" width="90" x="581" y="85">
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="merge_type" value="all"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Process Documents from Data (2)" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/>
<connect from_op="Text to Nominal" from_port="example set output" to_op="Append" to_port="example set 1"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Text to Nominal (2)" to_port="example set input"/>
<connect from_op="Text to Nominal (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
<connect from_op="Append" from_port="merged set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="0.6.000" expanded="true" height="82" name="Dictionary Based Sentiment" width="90" x="313" y="340">
<parameter key="Value Attribute" value="Weighting"/>
<parameter key="Key Attribute" value="text"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess (2)" width="90" x="112" y="85">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="112" y="34">
<parameter key="excel_file" value="C:\Users\Benedict\Desktop\Seminar\Datensatz\Thomas Daily Datensatz.xls"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="A1:C15750"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="time_zone" value="US/Pacific"/>
<parameter key="locale" value="English (United States)"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Datum.true.nominal.attribute"/>
<parameter key="1" value="Titel.true.text.attribute"/>
<parameter key="2" value="Text.true.text.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Date" width="90" x="246" y="34">
<parameter key="attribute_name" value="Datum"/>
<parameter key="date_type" value="date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="time_zone" value="US/Pacific"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="keep_old_attribute" value="true"/>
</operator>
<operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="447" y="34">
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
</operator>
<connect from_op="Read Excel (3)" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="collect" compatibility="7.6.001" expanded="true" height="82" name="Collect" width="90" x="246" y="85">
<parameter key="unfold" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="loop_collection" compatibility="7.6.001" expanded="true" height="82" name="Loop Collection" width="90" x="380" y="85">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="true"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<operator activated="true" class="text:filter_stopwords_german" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (German)" width="90" x="447" y="34">
<parameter key="stop_word_list" value="Standard"/>
</operator>
<operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="112" y="187">
<parameter key="min_chars" value="4"/>
<parameter key="max_chars" value="40"/>
</operator>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="187">
<parameter key="transform_to" value="lower case"/>
</operator>
<operator activated="true" class="text:extract_length" compatibility="7.5.000" expanded="true" height="68" name="Extract Length" width="90" x="380" y="187">
<parameter key="metadata_key" value="document_length"/>
</operator>
<operator activated="true" class="text:extract_token_number" compatibility="7.5.000" expanded="true" height="68" name="Extract Token Number" width="90" x="514" y="187">
<parameter key="metadata_key" value="token_number"/>
<parameter key="condition" value="all"/>
<parameter key="case_sensitive" value="false"/>
<parameter key="invert_condition" value="false"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (German)" to_port="document"/>
<connect from_op="Filter Stopwords (German)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_op="Extract Length" to_port="document"/>
<connect from_op="Extract Length" from_port="document" to_op="Extract Token Number" to_port="document"/>
<connect from_op="Extract Token Number" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="operator_toolbox:apply_dictionary_learner" compatibility="0.6.000" expanded="true" height="103" name="Apply Dictionary Based Sentiment" width="90" x="514" y="187"/>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="648" y="238">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<process expanded="true">
<operator activated="true" class="text:extract_token_number" compatibility="7.5.000" expanded="true" height="68" name="Extract Token Number (2)" width="90" x="112" y="34">
<parameter key="metadata_key" value="token_number"/>
<parameter key="condition" value="all"/>
<parameter key="case_sensitive" value="false"/>
<parameter key="invert_condition" value="false"/>
</operator>
<operator activated="true" class="text:aggregate_token_length" compatibility="7.5.000" expanded="true" height="68" name="Aggregate Token Length" width="90" x="313" y="34">
<parameter key="metadata_key" value="token_length"/>
<parameter key="aggregation" value="average"/>
</operator>
<connect from_port="document" to_op="Extract Token Number (2)" to_port="document"/>
<connect from_op="Extract Token Number (2)" from_port="document" to_op="Aggregate Token Length" to_port="document"/>
<connect from_op="Aggregate Token Length" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="write_csv" compatibility="7.6.001" expanded="true" height="82" name="Write CSV" width="90" x="648" y="34">
<parameter key="csv_file" value="C:\Users\Benedict\Desktop\Seminar\Auswertung Rapidminer.csv"/>
<parameter key="column_separator" value=";"/>
<parameter key="write_attribute_names" value="true"/>
<parameter key="quote_nominal_values" value="true"/>
<parameter key="format_date_attributes" value="true"/>
<parameter key="append_to_file" value="false"/>
<parameter key="encoding" value="SYSTEM"/>
</operator>
</process>0 -
Your XML process is corrupted. Please open the XML view and copy the XML from there.
1 -
how do i get to the xml view?
0 -
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="Subprocess" width="90" x="112" y="340">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\Benedict\Desktop\Seminar\Dictionaries\Dictionary 2_SentiWS_mitFlexionen_final.xlsx"/>
<parameter key="imported_cell_range" value="A1:D15633"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="A.false.attribute_value.attribute"/>
<parameter key="1" value="Wortstamm.true.text.attribute"/>
<parameter key="2" value="Worart.false.polynominal.attribute"/>
<parameter key="3" value="Weighting.true.numeric.attribute"/>
</list>
</operator>
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="136">
<parameter key="excel_file" value="C:\Users\Benedict\Desktop\Seminar\Dictionaries\Dictionary 2_SentiWS_mitFlexionen_final.xlsx"/>
<parameter key="sheet_number" value="2"/>
<parameter key="imported_cell_range" value="A1:D15650"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="A.false.attribute_value.attribute"/>
<parameter key="1" value="Wortstamm.true.text.attribute"/>
<parameter key="2" value="Worart.false.polynominal.attribute"/>
<parameter key="3" value="Weighting.true.numeric.attribute"/>
</list>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="179" y="34">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="112" y="34"/>
<connect from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="179" y="187">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="112" y="34"/>
<connect from_port="document" to_op="Transform Cases (3)" to_port="document"/>
<connect from_op="Transform Cases (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="text"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="7.6.001" expanded="true" height="82" name="Text to Nominal" width="90" x="447" y="34"/>
<operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="187">
<parameter key="attribute_name" value="text"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="text_to_nominal" compatibility="7.6.001" expanded="true" height="82" name="Text to Nominal (2)" width="90" x="447" y="187"/>
<operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="103" name="Append" width="90" x="581" y="85"/>
<connect from_op="Read Excel" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Process Documents from Data (2)" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
<connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Process Documents from Data (2)" from_port="word list" to_port="out 2"/>
<connect from_op="Set Role" from_port="example set output" to_op="Text to Nominal" to_port="example set input"/>
<connect from_op="Text to Nominal" from_port="example set output" to_op="Append" to_port="example set 1"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Text to Nominal (2)" to_port="example set input"/>
<connect from_op="Text to Nominal (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
<connect from_op="Append" from_port="merged set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="0.6.000" expanded="true" height="82" name="Dictionary Based Sentiment" width="90" x="313" y="340">
<parameter key="Value Attribute" value="Weighting"/>
<parameter key="Key Attribute" value="text"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess (2)" width="90" x="112" y="85">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="112" y="34">
<parameter key="excel_file" value="C:\Users\Benedict\Desktop\Seminar\Datensatz\Thomas Daily Datensatz.xls"/>
<parameter key="imported_cell_range" value="A1:C15750"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="time_zone" value="US/Pacific"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Datum.true.nominal.attribute"/>
<parameter key="1" value="Titel.true.text.attribute"/>
<parameter key="2" value="Text.true.text.attribute"/>
</list>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Date" width="90" x="246" y="34">
<parameter key="attribute_name" value="Datum"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="time_zone" value="US/Pacific"/>
<parameter key="keep_old_attribute" value="true"/>
</operator>
<operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="447" y="34">
<list key="specify_weights"/>
</operator>
<connect from_op="Read Excel (3)" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="collect" compatibility="7.6.001" expanded="true" height="82" name="Collect" width="90" x="246" y="85"/>
<operator activated="true" class="loop_collection" compatibility="7.6.001" expanded="true" height="82" name="Loop Collection" width="90" x="380" y="85">
<parameter key="unfold" value="true"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34"/>
<operator activated="true" class="text:filter_stopwords_german" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (German)" width="90" x="447" y="34"/>
<operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="112" y="187">
<parameter key="max_chars" value="40"/>
</operator>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="187"/>
<operator activated="true" class="text:extract_length" compatibility="7.5.000" expanded="true" height="68" name="Extract Length" width="90" x="380" y="187"/>
<operator activated="true" class="text:extract_token_number" compatibility="7.5.000" expanded="true" height="68" name="Extract Token Number" width="90" x="514" y="187"/>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (German)" to_port="document"/>
<connect from_op="Filter Stopwords (German)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_op="Extract Length" to_port="document"/>
<connect from_op="Extract Length" from_port="document" to_op="Extract Token Number" to_port="document"/>
<connect from_op="Extract Token Number" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_dictionary_learner" compatibility="0.6.000" expanded="true" height="103" name="Apply Dictionary Based Sentiment" width="90" x="514" y="187"/>
<operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="648" y="238">
<parameter key="create_word_vector" value="false"/>
<parameter key="keep_text" value="true"/>
<process expanded="true">
<operator activated="true" class="text:extract_token_number" compatibility="7.5.000" expanded="true" height="68" name="Extract Token Number (2)" width="90" x="112" y="34"/>
<operator activated="true" class="text:aggregate_token_length" compatibility="7.5.000" expanded="true" height="68" name="Aggregate Token Length" width="90" x="313" y="34">
<parameter key="aggregation" value="average"/>
</operator>
<connect from_port="document" to_op="Extract Token Number (2)" to_port="document"/>
<connect from_op="Extract Token Number (2)" from_port="document" to_op="Aggregate Token Length" to_port="document"/>
<connect from_op="Aggregate Token Length" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="write_csv" compatibility="7.6.001" expanded="true" height="82" name="Write CSV" width="90" x="648" y="34">
<parameter key="csv_file" value="C:\Users\Benedict\Desktop\Seminar\Auswertung Rapidminer.csv"/>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Dictionary Based Sentiment" to_port="exa"/>
<connect from_op="Dictionary Based Sentiment" from_port="mod" to_op="Apply Dictionary Based Sentiment" to_port="mod"/>
<connect from_op="Subprocess (2)" from_port="out 1" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Dictionary Based Sentiment" to_port="doc"/>
<connect from_op="Apply Dictionary Based Sentiment" from_port="res" to_op="Write CSV" to_port="input"/>
<connect from_op="Apply Dictionary Based Sentiment" from_port="doc" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_port="result 2"/>
<connect from_op="Write CSV" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
-
yea, thanks
my code is just above your comment
0 -
Have you tried using a Set Role to set your Date value as an ID role in the first subprocess? That should flow through the process
0 -
hey thanks, i fixed the date problem with excel but got to a new problem.
do you know a way how i can get to the number of covered/ recognised tokens for each of my text set?
So the text of day 1 contains 50 tokens and with the dictionary model i get the score of i.e. +2, but i don't know if it is a single word with the weighting of +2 or if its a 100 words with a weighting of 0.02.
That would be very important to know.
My code is still the same as above
Thanks
0 -
Hi,
this is not yet implemented, but not too hard to do... I will check if i can to this tomorrow.
Edit: As a work around: you can simply set all weights to -1 or +1 and run it a second time. Afterwards you just rename and join the results.
Best,
Martin
1 -
Hi,
i've sent you a version of the new operator via mail. I will add it to the marketplace toolbox if this is what you need.
Cheers,
Martin
1 -
Hi Martin,
could you also sent me the updated version?
Best regards
Simon
0 -
1
-
Hi Martin,
thanks. I tried it but it didn't solve my problem.
I need some additional data or at least the ID and not only "text" in the results after applying the dictionary based sentiment. The role of the id variable is set as "id".
Thank you!
Simon
0 -
hi,
good point. You might be able to join back the original data by joining on the text attribute.
Best,
Martin
0 -
Unfortunately, this is not possible as the text is not a unique attribute. Many tweets are retweets and look the same after some "document processing" steps but differ in terms of metadata (,e.g. geo coordinates).
Is there a workaround or anything?
Best,
Simon
0 -
Hi,
i see the point. You could add a nominal counter right before the operator to make texts unique.
Best,
Martin
0 -
Hi,
i solved it by using "select attributes" right before the text processing loop where I selected the text variable only. After applying the dictionary based sentiment analysis I also added "Generate ID". I did the same for the orginal output of the "select attributes" module and then joined the data using "ID".
Thanks for your help.
Best,
Simon1 -
Hi @simon_kuehne,
nice workaround! if you would use the Merge operator of toolbox you would not need an ID .
How would you enhance the operator?
Best,
Martin
1 -
Hi Martin,
i think the best way to enhance this operator would be to add some options allowing to add metadata and to choose which variable is the text variable used for the sentiment analysis.
I had another issue with the operator: As I am analyzing tweets, there are lots of hashtags which contain relevant keywords e.g., #ILikeThatKeyword. The problem is that, I did not find a way to extract this hashtag into 4 tokens or to make the operator search for those matches also instad of word tokens only.
Best,
Simon
0