Text mining for open-ended questions?
Hi, how are you?
I was recently presented with the following problem, a company has around 20.000 answers to one open question "What are the aspects of your work you like the most?", and they would like to analyze those answers.
I already worked manually analyzing around 300 of them, getting several flags, for example, HELPING_CUSTOMERS, SHORT_HOURS, etc.
My idea was to simply make a model for each flag and predict the remaining 20.000 answers, obtaining percentages regarding how many employees value each flag.
1. I was wondering if there is another approach to this and what would be the advantage over simply sampling the 20.000, getting percentages and extrapolating those, statistically, regardless of predictive models based on text.
2. Another valid question would be what is the difference between text mining and simply a tag cloud, but that is something that remains to be seen and I guess it depends on each individual problem. For example a more neutral question like "What do you think about your job?" may contain positive and negative sentiments using the same words, but right now I'm working on a question biased towards recieving positive sentiments.
Thanks a lot for your insight, I'll make sure to share mine!