"How to improve Classification in Text Mining"
I'm doing classification (15 classes) of technical papers using their abstract.
My processes are simple.
Learning:
+ TextInput
+ String Tokenizer
+ English StopwordFilter
+TokenLengthFilter
+ Binary2MultiClassLearner
+LibSVMLearner
+ModelWriter
Applying:
+TextInput
+ String Tokenizer
+ English StopwordFilter
+TokenLengthFilter
+ModelLoader
+ModelApplier
+ExcelExampleSetWriter
I get results but I'm not satisfied with them. How do I improve them? ???
I've been searching the forum and seen that feature selection is one way. There are lots of examples of FeatureSelection operator uses but I couldn't find one that writes to a model file. One example from the installer is shown but I couldn't figure out where to add the ModelWriter. Or am I thinking wrong? ???
....
+ FeatureSelection
+XValidation
+NearestNeighbors
+OperatorChain
+ModelApplier
+Performance
+ProcessLog
I'm also thinking of forcing some attributes with bigger weights. Is this a good thing to do and how do I do this?
thanks,
Matthew
My processes are simple.
Learning:
+ TextInput
+ String Tokenizer
+ English StopwordFilter
+TokenLengthFilter
+ Binary2MultiClassLearner
+LibSVMLearner
+ModelWriter
Applying:
+TextInput
+ String Tokenizer
+ English StopwordFilter
+TokenLengthFilter
+ModelLoader
+ModelApplier
+ExcelExampleSetWriter
I get results but I'm not satisfied with them. How do I improve them? ???
I've been searching the forum and seen that feature selection is one way. There are lots of examples of FeatureSelection operator uses but I couldn't find one that writes to a model file. One example from the installer is shown but I couldn't figure out where to add the ModelWriter. Or am I thinking wrong? ???
....
+ FeatureSelection
+XValidation
+NearestNeighbors
+OperatorChain
+ModelApplier
+Performance
+ProcessLog
I'm also thinking of forcing some attributes with bigger weights. Is this a good thing to do and how do I do this?
thanks,
Matthew