"Word Clustering/Classification"
dsaraph
New Altair Community Member
Thought this question deserved it's own topic for others looking in the future.
Hi Matthias,
Just wanted to report back that I was able to run the n-grams quite well, but in the end the results were not exactly what I was looking for so I'm going to be tinkering with the data for the next little bit. Thanks for all your help on this.
On another topic, I wanted to inquire if anyone was familiar with word clustering. For example, is there a way that I can cluster the text (by a certain topic..an example would be if it was major league baseball data, it would allow me to cluster by teams.. in this case I want it to form the clusters on its own) without considering the order (n-grams are formed based on the order of the words)... I was looking into some of the clustering operators but I'm not sure what would be applicable to what I'm trying to do. I was hoping there would be an operator that could just replace the n-gram operator in order to carry this out since I still wanted the pre-processing of the data, stemming, and filtering as I currently have. Any suggestions are greatly appreciated.
Thanks.
Hi Matthias,
Just wanted to report back that I was able to run the n-grams quite well, but in the end the results were not exactly what I was looking for so I'm going to be tinkering with the data for the next little bit. Thanks for all your help on this.
On another topic, I wanted to inquire if anyone was familiar with word clustering. For example, is there a way that I can cluster the text (by a certain topic..an example would be if it was major league baseball data, it would allow me to cluster by teams.. in this case I want it to form the clusters on its own) without considering the order (n-grams are formed based on the order of the words)... I was looking into some of the clustering operators but I'm not sure what would be applicable to what I'm trying to do. I was hoping there would be an operator that could just replace the n-gram operator in order to carry this out since I still wanted the pre-processing of the data, stemming, and filtering as I currently have. Any suggestions are greatly appreciated.
Thanks.
Tagged:
0