KernelKMeans now produces error when classify text

Question

RM team I have switched to RM 4.2. I began testing by using an existing project that classifies text by KernelKMeans. Text is read from a database and passed through StringtextInput and StringTokenizer. This operator chain worked before. Now I receive an error message Error 104 - non-numeric Error in: KernelKMeans (KernelKMeans) The example set contains non-numerical attribute #0: StockItemDesc (nominal/single_value)/values= Using KMediods to classify text works. Looking at the metadata with examplevisualizer there are string vectors and weights. Here is the project. Thanks for your help. B

B_ · Answer

Ingo

This runs successfully now.  Thanks for the help.

B.

IngoRM · Answer

Hi,
I  reinstalled RM 4.1 alongside RM 4.2.  I tested this project.  It runs under 4.1 and fails under 4.2.

thanks for this info. I now found the reason for this behaviour. It has actually nothing to do with the clustering operator but with the StringTextInput. There is a new parameter "remove_original_attributes" which unfortunately has not the default setting "true" (in order to keep backwards compatibility) but "false" so the original nominal (or string) attributes were not removed. This have caused the error for the clustering since the kernel cannot handle nominal values which are still present in the data set if the parameter "remove_original_attributes" was not set to "true". So the solution is quite simple: just set this parameter to "true" and everything should work as usual. You could add a breakpoint after the StringTextInput operator to see the difference with and without this setting.
Doesn't the FilterNominalAttributes convert the attributes to a usable format for further processing?

Yes, but with the new parameter they are also still kept as part of the example set as long as "remove_original_attributes" is set to "false". Instead of removing the directly here (with the parameter setting mentioned above) you could of course also use the operator "AttributeFilter" after the text processing to filter out all nominal attributes and only keep the numerical ones.

Cheers,
Ingo

B_ · Answer

Ingo I reinstalled RM 4.1 alongside RM 4.2. I tested this project. It runs under 4.1 and fails under 4.2. Same SQL query to pull records and same text in the records. +++++++++++++ +++ 4.2 error message Error in: KernelKMeans (KernelKMeans) The example set contains non-numerical attribute #0: StockItemDesc ++++++++++++++++++ Doesn't the FilterNominalAttributes convert the attributes to a usable format for further processing? Thanks for your help. B.