Song text Sentiment Analysis
mv070
New Altair Community Member
Hello,
i would like to do a sentiment analysis for songs. i have 250 songs but how do i do this. i use an XPath to get the songs and after some preprocessing all the songs are divided in different words. so each word is an different attribute, and not each song has every word attirubte. how do i continue cause im pretty stuck at this stage.
i'm also thinking about having the whole text from the song as 1 attribute instead of a lot.
i would like to do a sentiment analysis for songs. i have 250 songs but how do i do this. i use an XPath to get the songs and after some preprocessing all the songs are divided in different words. so each word is an different attribute, and not each song has every word attirubte. how do i continue cause im pretty stuck at this stage.
i'm also thinking about having the whole text from the song as 1 attribute instead of a lot.
0
Best Answers
-
Hi @mv070,the best you can do is import all songs as text and then use the text processing extension. In principle having one attribute per word is ok, you can use stemming to reduce the number a bit and maybe filter out some of the words (all things you can use with the text processing extension).I guess you don't have labels in the data. Then you have the option of using the Dictionary Based Sentiment operator. You will need to generate/download a dictionary for that.Edit: I developed a sample process
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="subprocess" compatibility="9.1.000" expanded="true" height="82" name="Subprocess" origin="GENERATED_TUTORIAL" width="90" x="112" y="340"><br> <process expanded="true"><br> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"><br> <list key="attribute_values"><br> <parameter key="Key" value=""good""/><br> <parameter key="Value" value="1"/><br> </list><br> <list key="set_additional_roles"/><br> </operator><br> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification (2)" origin="GENERATED_TUTORIAL" width="90" x="45" y="136"><br> <list key="attribute_values"><br> <parameter key="Key" value=""bad""/><br> <parameter key="Value" value="-1"/><br> </list><br> <list key="set_additional_roles"/><br> </operator><br> <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="103" name="Append" origin="GENERATED_TUTORIAL" width="90" x="179" y="85"><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> <parameter key="merge_type" value="all"/><br> </operator><br> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/><br> <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/><br> <connect from_op="Append" from_port="merged set" to_port="out 1"/><br> <portSpacing port="source_in 1" spacing="0"/><br> <portSpacing port="sink_out 1" spacing="0"/><br> <portSpacing port="sink_out 2" spacing="0"/><br> </process><br> <description align="center" color="transparent" colored="false" width="126">Generate dummy dictionary</description><br> </operator><br> <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="1.7.000" expanded="true" height="82" name="Dictionary Based Sentiment" origin="GENERATED_TUTORIAL" width="90" x="380" y="340"><br> <parameter key="value_attribute" value="Value"/><br> <parameter key="key_attribute" value="Key"/><br> <parameter key="negation_attribute" value=""/><br> <parameter key="negation_window_size" value="1"/><br> <parameter key="use_symmetric_negation_window" value="false"/><br> </operator><br> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="34"><br> <parameter key="text" value="the good, the bad and the ugly is a good film"/><br> <parameter key="add label" value="false"/><br> <parameter key="label_type" value="nominal"/><br> </operator><br> <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="246" y="34"><br> <parameter key="mode" value="non letters"/><br> <parameter key="characters" value=".:"/><br> <parameter key="language" value="English"/><br> <parameter key="max_token_length" value="3"/><br> </operator><br> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (2)" width="90" x="112" y="136"><br> <parameter key="text" value="the good, the bad and the ugly is a bad bad film"/><br> <parameter key="add label" value="false"/><br> <parameter key="label_type" value="nominal"/><br> </operator><br> <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="246" y="136"><br> <parameter key="mode" value="non letters"/><br> <parameter key="characters" value=".:"/><br> <parameter key="language" value="English"/><br> <parameter key="max_token_length" value="3"/><br> </operator><br> <operator activated="true" class="collect" compatibility="9.1.000" expanded="true" height="103" name="Collect" width="90" x="447" y="34"><br> <parameter key="unfold" value="false"/><br> </operator><br> <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="1.7.000" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="581" y="187"><br> <list key="application_parameters"/><br> </operator><br> <connect from_op="Subprocess" from_port="out 1" to_op="Dictionary Based Sentiment" to_port="exa"/><br> <connect from_op="Dictionary Based Sentiment" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/><br> <connect from_op="Create Document" from_port="output" to_op="Tokenize (2)" to_port="document"/><br> <connect from_op="Tokenize (2)" from_port="document" to_op="Collect" to_port="input 1"/><br> <connect from_op="Create Document (2)" from_port="output" to_op="Tokenize (3)" to_port="document"/><br> <connect from_op="Tokenize (3)" from_port="document" to_op="Collect" to_port="input 2"/><br> <connect from_op="Collect" from_port="collection" to_op="Apply Model (Documents)" to_port="doc"/><br> <connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process><br><br>
I'm also tagging @mschmitz because the tutorial process of the Apply Model (Documents) seems to be broken.I hope it helps!Regards,Sebastian
5 -
You can also build a model directly to predict sentiment but you would need to hand label some songs to train the model first, assign that as your label, and then build the model on that dataset (using appropriate validation strategies). That might be fairly easy if it is a song corpus that you already know and can quickly decide whether a song is positive vs negative (or whatever other sentiment dimension you are going to be using).
5
Answers
-
Hi @mv070,the best you can do is import all songs as text and then use the text processing extension. In principle having one attribute per word is ok, you can use stemming to reduce the number a bit and maybe filter out some of the words (all things you can use with the text processing extension).I guess you don't have labels in the data. Then you have the option of using the Dictionary Based Sentiment operator. You will need to generate/download a dictionary for that.Edit: I developed a sample process
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="subprocess" compatibility="9.1.000" expanded="true" height="82" name="Subprocess" origin="GENERATED_TUTORIAL" width="90" x="112" y="340"><br> <process expanded="true"><br> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"><br> <list key="attribute_values"><br> <parameter key="Key" value=""good""/><br> <parameter key="Value" value="1"/><br> </list><br> <list key="set_additional_roles"/><br> </operator><br> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification (2)" origin="GENERATED_TUTORIAL" width="90" x="45" y="136"><br> <list key="attribute_values"><br> <parameter key="Key" value=""bad""/><br> <parameter key="Value" value="-1"/><br> </list><br> <list key="set_additional_roles"/><br> </operator><br> <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="103" name="Append" origin="GENERATED_TUTORIAL" width="90" x="179" y="85"><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> <parameter key="merge_type" value="all"/><br> </operator><br> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/><br> <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/><br> <connect from_op="Append" from_port="merged set" to_port="out 1"/><br> <portSpacing port="source_in 1" spacing="0"/><br> <portSpacing port="sink_out 1" spacing="0"/><br> <portSpacing port="sink_out 2" spacing="0"/><br> </process><br> <description align="center" color="transparent" colored="false" width="126">Generate dummy dictionary</description><br> </operator><br> <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="1.7.000" expanded="true" height="82" name="Dictionary Based Sentiment" origin="GENERATED_TUTORIAL" width="90" x="380" y="340"><br> <parameter key="value_attribute" value="Value"/><br> <parameter key="key_attribute" value="Key"/><br> <parameter key="negation_attribute" value=""/><br> <parameter key="negation_window_size" value="1"/><br> <parameter key="use_symmetric_negation_window" value="false"/><br> </operator><br> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="34"><br> <parameter key="text" value="the good, the bad and the ugly is a good film"/><br> <parameter key="add label" value="false"/><br> <parameter key="label_type" value="nominal"/><br> </operator><br> <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="246" y="34"><br> <parameter key="mode" value="non letters"/><br> <parameter key="characters" value=".:"/><br> <parameter key="language" value="English"/><br> <parameter key="max_token_length" value="3"/><br> </operator><br> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (2)" width="90" x="112" y="136"><br> <parameter key="text" value="the good, the bad and the ugly is a bad bad film"/><br> <parameter key="add label" value="false"/><br> <parameter key="label_type" value="nominal"/><br> </operator><br> <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="246" y="136"><br> <parameter key="mode" value="non letters"/><br> <parameter key="characters" value=".:"/><br> <parameter key="language" value="English"/><br> <parameter key="max_token_length" value="3"/><br> </operator><br> <operator activated="true" class="collect" compatibility="9.1.000" expanded="true" height="103" name="Collect" width="90" x="447" y="34"><br> <parameter key="unfold" value="false"/><br> </operator><br> <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="1.7.000" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="581" y="187"><br> <list key="application_parameters"/><br> </operator><br> <connect from_op="Subprocess" from_port="out 1" to_op="Dictionary Based Sentiment" to_port="exa"/><br> <connect from_op="Dictionary Based Sentiment" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/><br> <connect from_op="Create Document" from_port="output" to_op="Tokenize (2)" to_port="document"/><br> <connect from_op="Tokenize (2)" from_port="document" to_op="Collect" to_port="input 1"/><br> <connect from_op="Create Document (2)" from_port="output" to_op="Tokenize (3)" to_port="document"/><br> <connect from_op="Tokenize (3)" from_port="document" to_op="Collect" to_port="input 2"/><br> <connect from_op="Collect" from_port="collection" to_op="Apply Model (Documents)" to_port="doc"/><br> <connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process><br><br>
I'm also tagging @mschmitz because the tutorial process of the Apply Model (Documents) seems to be broken.I hope it helps!Regards,Sebastian
5 -
You can also build a model directly to predict sentiment but you would need to hand label some songs to train the model first, assign that as your label, and then build the model on that dataset (using appropriate validation strategies). That might be fairly easy if it is a song corpus that you already know and can quickly decide whether a song is positive vs negative (or whatever other sentiment dimension you are going to be using).
5 -
thanks guys both ways worked perfectly for me. my teacher wanted me to do more ways so thanks a lot
0