A program to recognize and reward our most engaged community members
1. Limit the terms to a list of Domain keywords - Ideally, I think this is the best but harder to generate. It requires a lot of manual work to have a list of domain keywords.
2. Stop Word Lists (english and domain specific)
3. Use small part of document for indexing - Title only - this results to fewer terms but I don't know if it affects the quality of clustering - Abstract only - I find it to generate lots of noisy terms
4. Use N-Gram - ok but it multiplies the number of terms
5. Stemmer (on/off)
6. InteractiveAttributeWeighing operator - I think this is ok but requires manual work --considering I always get thousand of attributes.