Hi,
I have collected a huge database of products (with their descriptions, name, prices and labels)
Now I am trying to create a multiclass classifier to automatically classify these products.
So, right now I am using a k-NN classifier and reading a subset of that data.
1) Now, a lame question: How to build a good classifier when you have such sparsity in data (alot of categories vs the few attributes and the problem is that in document classification, usually the word lists are huge, here you have product, product description and price.. hence not many words) How do we solve that.Any suggestions, advice would be greatly appreciated.
I used the inputs from vacouver data blogspot.
2) How do I give this huge training data.
I always hit the memory limitations