Document Classification by single word and its position?

Axel_Borowitsch
Axel_Borowitsch New Altair Community Member
edited November 5 in Altair RapidMiner
I would like to get the class of a document out of the document structure consider single words and their position.

I have Training-Data within an Excel-Document:

Training-data:

image

I would like to use Rapidminer to identify different document-classes in a medical environment. I thought about a neuronal network but I can't handle the polynominal attribute WORD.

How can I use the string WORD as an attribute? I assume the nominal2numeric conversion would provide too many rows.
The training has to consider all lines of one DOC_ID and its entries as one document. The occurrence and position of the words in one document have to be part of the neuronal network. How can I deliver this information to the training process, keeping the information together?

Thank you very much in advance!

All the best
Axel