Creating ExcelExampleSet
rtaank
New Altair Community Member
Hi have successfully managed to read an excel sheet containing 30 rows of unique data (with just one regular attribute).
I am then trying to pipe this into the NB operator and I am getting the following error upon execution:
Mar 11, 2009 12:53:25 PM: [Fatal] UserError occured in 1st application of NaiveBayes (NaiveBayes)
Mar 11, 2009 12:53:25 PM: [Fatal] Process failed: Input example set has no attributes
Root[1] (Process)
+- ExcelExampleSource[1] (ExcelExampleSource)
here ==> +- NaiveBayes[1] (NaiveBayes)
Any ideas why this is the case?
I want to classify the 30 pieces of text (i.e. each row in the excel sheet) into associated groups.
Thanks.
I am then trying to pipe this into the NB operator and I am getting the following error upon execution:
Mar 11, 2009 12:53:25 PM: [Fatal] UserError occured in 1st application of NaiveBayes (NaiveBayes)
Mar 11, 2009 12:53:25 PM: [Fatal] Process failed: Input example set has no attributes
Root[1] (Process)
+- ExcelExampleSource[1] (ExcelExampleSource)
here ==> +- NaiveBayes[1] (NaiveBayes)
Any ideas why this is the case?
I want to classify the 30 pieces of text (i.e. each row in the excel sheet) into associated groups.
Thanks.
Tagged:
0
Answers
-
Hello,
several remarks:
1. you have to use the Text Plugin in order to transform your texts into word vectors with the StringTextInput operator
2. you do not seem to have a label --> clustering seems more appropriate than NaiveBayes which is a classification method
Cheers,
Ingo0 -
Thanks for that.
So which clustering algo do you recommend for standard written english text?0 -
Hi,
there is no standard algorithm - just try them and check which one delivers results you like best. If performance is an issue, I would start with KMeans, if you want something hierarchical and you have not too many examples, I would try agglomerative clustering.
Cheers,
Ingo0 -
Okay i will consider those clustering algorithms, performance really isn't an issue, but will experiment with the various unsupervised algos.
Going back to your original responses/remarks however:
1. you have to use the Text Plugin in order to transform your texts into word vectors with the StringTextInput operator
my_response: will i need to do this for the clustering algorithms too? or just for the classification algorithms?
2. you do not seem to have a label --> clustering seems more appropriate than NaiveBayes which is a classification method
my_response: what are these labels? i have been through the documentation but cannot fully interpret why the labels are required? also, within my excel sheet, do i need to have another column for these labels? what are they used for? ideally i would like to use supervised learning in order to produce a model.
Thanks Ingo.0 -
Hi,
Yes.
will i need to do this for the clustering algorithms too? or just for the classification algorithms?
Labels are the classes you provide during the training phase. The different values of the label column will then be predicted by a classification model for new and unseen data (which no longer needs a given label). For supervised learning, you will always need a label (target, class... you name it). If you are not able to provide a label, then you usually perform an unsupervised learning method instead (like clustering).
what are these labels? i have been through the documentation but cannot fully interpret why the labels are required? also, within my excel sheet, do i need to have another column for these labels? what are they used for? ideally i would like to use supervised learning in order to produce a model.
Cheers,
Ingo0 -
Thanks Ingo, a fantastic explanation!
0