🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Plain Text Classification/Clustering

User: "rtaank"
New Altair Community Member
Updated by Jocelyn
Hi all,

This is the scenario.

I have an input text file containing many thousand paragraphs of comments made by different people in plain engligh. Each person's comment or statement is basically one paragraph, separated by a \n of course.

I want to read in this single file and then for rapidminer to be able to classify each paragraph within the file to a particular cluster or topic. I am aware of the fact that rapidminer will expect me to specify how many clusters or unique classifications i want up front, this is fine although ideally i would like rapidminer to determine this for me based on the input file.

I have installed the text plugin for rapidminer and am using the TextInput to read the single input file, however i am having difficulty getting rapidminer to detect each unique paragraph within the file as one example of data - any ideas on how this can be done?

Secondly, i would like to know which type of learning is the most suitable for my problem above, unsupervised or supervised?

Finally, upon deciding which type of learning is the best suited to this task, can somebody then suggest which algorithm/s are designed to do natural english language classification best?

My plan is to create a learner (model) that can then easily be applied to future comments as and when they occur.

Thanks in advance for your time.

Ritesh
Sort by:
1 - 3 of 31
    User: "IngoRM"
    New Altair Community Member
    Hi,

    for tasks like this you probably can use the operator "Segmenter" which is also part of the text plugin.

    Cheers,
    Ingo
    User: "rtaank"
    New Altair Community Member
    OP
    Hi thanks for that response.

    When you say 'segmentation' are you referring to the problem of reading in the text file itself, or is this the actual learning you are referring to?
    User: "IngoRM"
    New Altair Community Member
    I mean the reading. The segmenter can be used to build the parts of the single text file and break it down into lots of smaller ones, one for each paragraph. Then you can apply the learned model on each of those texts.

    Cheers,
    Ingo