🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Input file format for Process Documents From File operator

User: "ccricha"
New Altair Community Member
Updated by Jocelyn

Does anyone know what text structure is expected or can be parsed using the Process Documents from Files operator? I am working on Ch 15 of the book written by Markus Hofmann and Ralf Klinkenberg. They use the Process Documents from Files operator to loop over a bunch of text files containing hotel rating data. An entry for a single hotel looks like this:

 

<Author>everywhereman2
<Content>Truncated for brevity....
<Date>Jan 6, 2009
<Rating>5 5 5 5 5 5 5 5

 

What irks me is that there absolutely nothing in the documentation for this operator telling me that is an acceptable text structure that can be parsed. Does anyone happen to know more about this operator?

 

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Thomas_Ott"
    New Altair Community Member
    Accepted Answer

    The Text Processing extension is a bit sparse on operator reference. 


    What I would do is review the Text Analytics KB and watch these videos on how to properly load/parse text data and build models from it.

     

    I will be recording a very detailed and updated Text Mining in RapidMiner video over the next few weeks.

    User: "ccricha"
    New Altair Community Member
    OP
    Accepted Answer

    Are there plans to update the documentation for this extension? Even just some JavaDoc would be better than nothing.