"text mining (classification
mksaad
New Altair Community Member
Hello all,
I read many tutorials about text mining (TM) including tutorials about TM using RM.
most of these tutorials uses support vector machine (SVM) and Naive-Bayes (NB) as classification methods. I conclude they are the best Algorithm for text classification.
do you recommend me to use these algorithm or there are other suitable algorithms for text classification. (I am looking for Algorithms that implemented in RM)
If SVM and NB are the best one, any references about that will be appreciated.
I also appreciate any recommendation of RM clustering algorithms for text.
Thanks in advance,
--
Motaz K. Saad
I read many tutorials about text mining (TM) including tutorials about TM using RM.
most of these tutorials uses support vector machine (SVM) and Naive-Bayes (NB) as classification methods. I conclude they are the best Algorithm for text classification.
do you recommend me to use these algorithm or there are other suitable algorithms for text classification. (I am looking for Algorithms that implemented in RM)
If SVM and NB are the best one, any references about that will be appreciated.
I also appreciate any recommendation of RM clustering algorithms for text.
Thanks in advance,
--
Motaz K. Saad
Tagged:
0
Answers
-
Hi,
I would suggest any clustering algorithm supporting the Cosine Similarity. And as always KMeans is worth a try.
Greetings,
Sebastian0 -
Motaz,
Have you done anything on Text Classification?
I need help there...0 -
Hello,
You can take a look at http://sites.google.com/site/motazsite/publications
you can find there conclusions on Arabic text classification and conclusions text classification in general.
Regards,
Motaz0 -
Is there a good algorithm to use when my documents can have multiple categories assigned to them? An example might be resumes where some are Java developers, some are SQL developers, and some are both Java and SQL developers?0
-
Hi, you can use Polynominal by Binominal Classification for this. This operator trains a model based on its inner process, where it tries to discriminate between each class and all other classes. During application the confidence for each class is calculated, and the one with the highest value is predicted. Please have a look at the attached process.
Best, Marius<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="494" width="752">
<operator activated="true" class="generate_data" compatibility="5.2.006" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="three ring clusters"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="polynomial_by_binomial_classification" compatibility="5.2.006" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="246" y="30">
<process expanded="true" height="512" width="770">
<operator activated="true" class="naive_bayes" compatibility="5.2.006" expanded="true" height="76" name="Naive Bayes" width="90" x="313" y="30"/>
<connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model" width="90" x="461" y="30">
<list key="application_parameters"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Polynominal by Binominal Classification" to_port="training set"/>
<connect from_op="Polynominal by Binominal Classification" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Polynominal by Binominal Classification" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Thanks, I'll try that.0