Input file format for Process Documents From File operator

ccricha
ccricha New Altair Community Member
edited November 5 in Community Q&A

Does anyone know what text structure is expected or can be parsed using the Process Documents from Files operator? I am working on Ch 15 of the book written by Markus Hofmann and Ralf Klinkenberg. They use the Process Documents from Files operator to loop over a bunch of text files containing hotel rating data. An entry for a single hotel looks like this:

 

<Author>everywhereman2
<Content>Truncated for brevity....
<Date>Jan 6, 2009
<Rating>5 5 5 5 5 5 5 5

 

What irks me is that there absolutely nothing in the documentation for this operator telling me that is an acceptable text structure that can be parsed. Does anyone happen to know more about this operator?

 

Best Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    The Text Processing extension is a bit sparse on operator reference. 


    What I would do is review the Text Analytics KB and watch these videos on how to properly load/parse text data and build models from it.

     

    I will be recording a very detailed and updated Text Mining in RapidMiner video over the next few weeks.

  • ccricha
    ccricha New Altair Community Member
    Answer ✓

    Are there plans to update the documentation for this extension? Even just some JavaDoc would be better than nothing.

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    The Text Processing extension is a bit sparse on operator reference. 


    What I would do is review the Text Analytics KB and watch these videos on how to properly load/parse text data and build models from it.

     

    I will be recording a very detailed and updated Text Mining in RapidMiner video over the next few weeks.

  • ccricha
    ccricha New Altair Community Member
    Answer ✓

    Are there plans to update the documentation for this extension? Even just some JavaDoc would be better than nothing.