Creating ExcelExampleSet

rtaank · March 2009

Hi have successfully managed to read an excel sheet containing 30 rows of unique data (with just one regular attribute).

I am then trying to pipe this into the NB operator and I am getting the following error upon execution:

Mar 11, 2009 12:53:25 PM: [Fatal] UserError occured in 1st application of NaiveBayes (NaiveBayes)
Mar 11, 2009 12:53:25 PM: [Fatal] Process failed: Input example set has no attributes
Root[1] (Process)
+- ExcelExampleSource[1] (ExcelExampleSource)
here ==> +- NaiveBayes[1] (NaiveBayes)

Any ideas why this is the case?

I want to classify the 30 pieces of text (i.e. each row in the excel sheet) into associated groups.

Thanks.

IngoRM · March 2009

Hello,

several remarks:

1. you have to use the Text Plugin in order to transform your texts into word vectors with the StringTextInput operator
2. you do not seem to have a label --> clustering seems more appropriate than NaiveBayes which is a classification method

Cheers,
Ingo

rtaank · March 2009

Thanks for that.

So which clustering algo do you recommend for standard written english text?

IngoRM · March 2009

Hi,

there is no standard algorithm - just try them and check which one delivers results you like best. If performance is an issue, I would start with KMeans, if you want something hierarchical and you have not too many examples, I would try agglomerative clustering.

Cheers,
Ingo

rtaank · March 2009

Okay i will consider those clustering algorithms, performance really isn't an issue, but will experiment with the various unsupervised algos.

Going back to your original responses/remarks however:

1. you have to use the Text Plugin in order to transform your texts into word vectors with the StringTextInput operator

my_response: will i need to do this for the clustering algorithms too? or just for the classification algorithms?

2. you do not seem to have a label --> clustering seems more appropriate than NaiveBayes which is a classification method

my_response: what are these labels? i have been through the documentation but cannot fully interpret why the labels are required? also, within my excel sheet, do i need to have another column for these labels? what are they used for? ideally i would like to use supervised learning in order to produce a model.

Thanks Ingo.

IngoRM · March 2009

Hi,

will i need to do this for the clustering algorithms too? or just for the classification algorithms?

Yes.

what are these labels? i have been through the documentation but cannot fully interpret why the labels are required? also, within my excel sheet, do i need to have another column for these labels? what are they used for? ideally i would like to use supervised learning in order to produce a model.

Labels are the classes you provide during the training phase. The different values of the label column will then be predicted by a classification model for new and unseen data (which no longer needs a given label). For supervised learning, you will always need a label (target, class... you name it). If you are not able to provide a label, then you usually perform an unsupervised learning method instead (like clustering).

Cheers,
Ingo

rtaank · March 2009

Thanks Ingo, a fantastic explanation!

Creating ExcelExampleSet

Answers

Categories