"Brand new user - text mining basics"
Hello. If anyone is willing to help point a newb in the right direction, I'd appreciate it very much. I am working on a personal project to get a feel for the software and concepts, which will lead into a school project. I have an Excel file with the lyrics and some other basic information of several hundred songs. I wanted to look for interesting relationships with word usage perhaps in songs by artists of a certain gender, the year the song was written, and/or hit songs.
For my first go I thought I'd try focusing on just the decade (70s, 80s, 90s) and the lyrics. Maybe certain words didn't appear until a certain timeframe or there are some interesting cultural references. I can import the data and get the word frequency lists and understand on a basic level how to use the association operators. However, I'm not sure what I need to do so that RapidMiner groups the text by years/decades. Will I be able to see easily that in different years/decades certain words appear together or at all? What operators should I use and what should my data be like? Is an Excel file with a separate row for each song sufficient?
Do you think this is even a good started project or will nothing interesting/useful come out of it?
Thanks in advance for any advice or pointers in the right direction.
For my first go I thought I'd try focusing on just the decade (70s, 80s, 90s) and the lyrics. Maybe certain words didn't appear until a certain timeframe or there are some interesting cultural references. I can import the data and get the word frequency lists and understand on a basic level how to use the association operators. However, I'm not sure what I need to do so that RapidMiner groups the text by years/decades. Will I be able to see easily that in different years/decades certain words appear together or at all? What operators should I use and what should my data be like? Is an Excel file with a separate row for each song sufficient?
Do you think this is even a good started project or will nothing interesting/useful come out of it?
Thanks in advance for any advice or pointers in the right direction.