Process failed - X-means clustering
Hi everyone,
I´m trying to execute a X-means clustering process to a range of texts included in an Excel file, but it is not possible because everytime I try it, I obtain the same fail:
"Process Failed
The setup does not seem to contain any obvious errors, but you should check the log messages or activate the debug mode in the settings dialog in order to get more information about this problem."
The log messages are:
Jun 28, 2017 6:32:59 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Jun 28, 2017 6:32:59 PM SEVERE: Here:
Jun 28, 2017 6:32:59 PM SEVERE: Process[1] (Process)
Jun 28, 2017 6:32:59 PM SEVERE: subprocess 'Main Process'
Jun 28, 2017 6:32:59 PM SEVERE: +- Read Excel[1] (Read Excel)
Jun 28, 2017 6:32:59 PM SEVERE: +- Process Documents from Data[1] (Process Documents from Data)
Jun 28, 2017 6:32:59 PM SEVERE: subprocess 'Vector Creation'
Jun 28, 2017 6:32:59 PM SEVERE: | +- Tokenize[2096] (Tokenize)
Jun 28, 2017 6:32:59 PM SEVERE: | +- Transform Cases[2096] (Transform Cases)
Jun 28, 2017 6:32:59 PM SEVERE: | +- Filter Stopwords (English)[2096] (Filter Stopwords (English))
Jun 28, 2017 6:32:59 PM SEVERE: | +- Stem (Snowball)[2096] (Stem (Snowball))
Jun 28, 2017 6:32:59 PM SEVERE: ==> +- X-Means[1] (X-Means)
Jun 28, 2017 6:32:59 PM SEVERE: java.lang.ArrayIndexOutOfBoundsException
The structure of the process is in the attached image. If I try the same process only changing X-means box by a K-means box, the process is working without problems and I´m obtaining the results of the corresponding clustering.
I have also tried to do the X-Means clustering with other data input (direct input from a folder containing pdf files) and the process is not working either.
Could anyone help me?
I really appreciate your help!
Thank you very much
Alberto
Best Answer
-
Try toggling off the 'keep text' option on the Process Documents operator and run again. Sometimes this can confuse the X-means operator.
Another side note, you should probably prune more. I normally don't like to have wide data sets feeding into a clustering algorithm but that's just me.
0
Answers
-
This type of error typical means that there is a problem with a data-type in your data. Did you check the output of the data from the Process Documents operator before it loads into the X-means operator?
0 -
Hi Thomas, thank you very much for your quick answer.
I have also proved with other kind of input data such as a bunch pdf files and the process has failed too
Attached is an image with the content of "processdocument from data". As you can see, is a typical preprocessing task (tokenize-transforme cases-filter stopwords-stemming) creating a tf-if vector.
Besides, If I change the X-means operator by a K-means operator, there is no problem and I obtain the result of the clustering.
How do you think could I proceed? Thanks again
Alberto Arenal
0 -
Yes but did you check what comes out of the Process Documents operator? Did you put a breakpoint and inspect the data?
0 -
Thank you THomas, I really appreciate your help
Yes, I put a breakpoint just after Process document operator and I obtained a regular tf-if vector (attached an image), I don´t identify a problem with that but it is possible I am leaving out something.
Could be a problem of the number of rows-examples(1048) or the number of attributes (1 special attribute, 3197 regular attributes)?
Alberto
0 -
Try toggling off the 'keep text' option on the Process Documents operator and run again. Sometimes this can confuse the X-means operator.
Another side note, you should probably prune more. I normally don't like to have wide data sets feeding into a clustering algorithm but that's just me.
0 -
Dear Thomas,
Problem solved, thank you very much. I try toogling off the "keep text" option and the process continues to fail.
But following your advice of pruning more and then it works.
I really appreaciate your help, you saved me a lot of time and frustration.
best regards
ALberto
1