Hey guys,
I'm currently working on an automatic Music Lyrics Analyzer. The MLA uses text analytics methods based on an established platform to analyze the vocabulary used in song lyrics of different interpreters / genres and build clusters of songs based on their lyrics. In many songs, some sections of lyrics are repeated twice, indicated by a string string “x2".
In my opinion, I have to account for those repetition to avoid screwed classification model's results. Do you agree? If yes, how to handle this? Which operators should I choose?
Many thanks for your help! Have a good day!