Extract sentiment operator works with french words?
EL75
New Altair Community Member
Hi,
Does someone could tell me if VADER or Wordnet are dealing with french when you select one of them in the "Extract sentiment" operator ?
- The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf
- VADER also has been transposed: https://github.com/thomas7lieues/vader_FR
But what about the legacy operator of rapid miner? I've seen no way to parameter the operator, neither in the help window...
In case the standard rapid miner operator doesn't woks for french, is there a way to connect rapidminer to the french projects mentioned above?
thanks.
Tagged:
0
Best Answers
-
Hi,there is something odd with escaping of / and so on, please try this process and adapt the path of read csv in a way that it points to the downloaded version of: https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txtBest,Martin<?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="85">
<parameter key="csv_file" value="C:/Users/MartinSchmitz/Downloads/fr_lexicon.txt"/>
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="447" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="581" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
1 -
0
Answers
-
Hi,this operator is actually just wrapping models created with dictionary based sentiment operator. You can easily use the dict based sentiment operator to do this.Best,Martin0
-
hello mschmitz,
thanks for your answer. how can I manage the "dictionary based sentiment operator" in order to access to french versions mentioned of vader or wordnet?
best regards0 -
0
-
if you mean this one, yes. Tell me if I'm wrong.
In case not, how this process allow me to access one of those ressources?The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf- VADER also has been transposed: https://github.com/thomas7lieues/vader_FRbest regards
0 -
Hi,a full training process looks like this:<?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="open_file" compatibility="9.8.000" expanded="true" height="68" name="Open File" width="90" x="45" y="85">
<parameter key="resource_type" value="URL"/>
<parameter key="url" value="https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt"/>
<description align="center" color="transparent" colored="false" width="126">https://github.com/cjhutto/vaderSentiment</description>
</operator>
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="313" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="514" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
This is even more powerful than extract sentiment, but obviously also harder to use. I will create a ticket to add french vader to the Extract sentiment operator. Do you have any other dictionary to add?Best,
Martin
0 -
Thanks for your answer !
WOLF project is the french translation of wordnet, probably a good idea to add it too.
rapidminer popularity will increase within the french community- The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf- VADER also has been transposed: https://github.com/thomas7lieues/vader_FR0 -
Martin,
trying to copy/paste the xml code ("a full training process looks like this") in rapid miner.. but nothing happens.
could you help ?<?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="open_file" compatibility="9.8.000" expanded="true" height="68" name="Open File" width="90" x="45" y="85">
<parameter key="resource_type" value="URL"/>
<parameter key="url" value="https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt"/>
<description align="center" color="transparent" colored="false" width="126">https://github.com/cjhutto/vaderSentiment</description>
</operator>
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="313" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="514" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hi,there is something odd with escaping of / and so on, please try this process and adapt the path of read csv in a way that it points to the downloaded version of: https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txtBest,Martin<?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="85">
<parameter key="csv_file" value="C:/Users/MartinSchmitz/Downloads/fr_lexicon.txt"/>
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="447" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="581" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
1 -
Thanks a lot! works fine.
May I ask you few additional questions, in oder to fine tune the process?
1- working with example set
As I have an example set containing reviews, I've added a "data to document" operator before the "loop collection" operator (I havent't seen an operator like "Apply Model (Documents)" dedicated to example sets). then I've put in the "loop" all my text processing operators, and it looks fine. Is it the right way?
2- using emojis
I've seen in the vader repository that there are two others files that could be helpful (I've lot of emoticons in my reviews):
is there a way to integrate them in this process ?
3- understanding the columns in the dictionary
- att1 is the word of de dictionary
- att2 seems to be the value of the polarity
- att3: is it the weight?
- att4: how those values are used?
4- using polarity_scores_max
https://github.com/thomas7lieues/vader_FR
on this web page it is indicated that we can use polarity_scores_max: how is it possible?
# Note : You can use polarity_scores_max instead of polarity_scores. polarity_scores_max uses fuzzywuzzy to get the most similar words with your inputs. For example "connar" won't be detected with polarity_scores but with polarity_scores_max
5- Build my own dictionary
If I want to add sentiment words and weights related to the specific domain I'm working on, what would be the best process?
just adding new lines in the dictionary file?
I really enjoy using this dictionary on my data set
all the best,0 -
0
-
Hi Martin,something strange: the process works fine, alone. But when the same one is added to a bigger one (copy/paste) with other operators (I've done this to compare results) => I get an error message saying (prb of tokenization) although the subprocess "loop collection" contains tokenization process". I'm 100% sure that all connections are good. I have even try something aberant but that seems to reveals a bug: in the processus that works fine, I've imported other operators (that generate the default), then move them to the trash (so that I come back to the process that worked fine) and then the process crash...below: the process containing at the bottom the "Vader FR" (deactivated)
the "vader fr" process (works fine alone):
thanks for your help
best0 -
Hi @EL75,I would love to help, but I am very busy and this is somewhat complex. I cannot deep dive into it.Is this something commercial or is this an academic project? If this is a commercial request we may move this over and we can assign resources on it. Otherwise maybe @lionelderkrikor or so can help?Cheers,Martin0
-
Hi Martin,
Of course not, this is not commercial but a research purpose => (working on health aspects and impacts of digital practices => I'm working on parents and children reviews coming from app stores, twitter, blogs etc)
But as I'm working on a french dataset that would be very useful.
May I ask you also :
1 - WORD2VEC
- I've read your article "wordSynonym Detection with Word2Vec" => I've tried to implement the process but I've obtained strange results : do this operator works with every language (e.g french of course)?2- TOPICS EXTRACTION
As I'm trying to extract topics from the data set, I've read and adapted your excellent article dealing with amazon reviews, thinking that this process could fit part of my needs. It is really inspiring! I wonder if there's any other possibilities to visualize results, such as dendrogram, etc?
Best,0 -
Hi @EL75 ,maybe you want to explain a bit more what you try to accomplish from a "Business" perspective so we can map this to a DS method?~Martin0
-
Hi @EL75 ,I added french to the operator a minute ago. It will not be publicly available for a bit (since we usually wait a bit to have more new things). Please let me know if you need a preview build.Best,Martin0
-
Hi Martin,
thanks for having done it. I'd appreciate receiving a preview build, indeed.
I wish you a happy new year!
Best,0 -
0