How to change rich text into readable one? (for text mining)

User: "duygu"
New Altair Community Member
Updated by Jocelyn
Hi everyone!

Finally i read my database with rapidminer. But, again, there is a problem. My items look like this;

{\rtf1\ansi\ansicpg1254\deff0{\fonttbl{\f0\fnil\fcharset162 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\lang1055\f0\fs17 6 ayd\'fdr sol kol a\'f0. Boyun a\'f0 az.
G\'fc\'e7s\'fczl\'fck ve a\'f0 dan \'e7ok uyu\'feukluk var. NPBY. Belki C7-8 hipoaljezi.
Torasik \'e7\'fdk\'fd\'fe gibi de\'f0il. Miyofasial a\'f0 gibi. \'d6neriler.+\par \par \par \par \par }

How can i change this into a readable text? I need to do text mining :)
Thanks!

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "homburg"
    New Altair Community Member
    Hi duygu,

    have already installed the text mining extension? If yes you will find an operator called "Data to Documents" which can be used to migrate an example set to a document object. But to answer your question, currently there is no option to parse rtf code directly in RapidMiner. Maybe you'll find some library or scripting tool you can pipe your data through. What you could try to get the text content from your input is to filter the rtf code via regular expressions (using "Replace" or "Replace Token" operator) with a search pattern like this:
    [tt]\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?[/tt]
    Since text mining is a rather complex topic it may be a good idea to take a closer look at some useful introduction videos. A video which shows how to classify texts dealing with different topics can be found here:
    http://rapidminerresources.com/index.php?page=text-mining-3
    In addition to that Neil McGuigan produced a great series of videos dealing with RapidMiner and Text-Mining which are available via his blog:
    http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html shows the first one of the series.

    Cheers,
    Helge
    User: "duygu"
    New Altair Community Member
    OP
    Yes, i already tried "Data to Documents" but i have never thought about reguler expressions. I'm going to try it now.

    Yeah, Neil McGuigan's site really helpful :D

    Thank you!
    User: "duygu"
    New Altair Community Member
    OP
    I couldnt do it with a regular expression because i couldnt decide what to replace with regular expression. So I try to coopy my document into a file (with WriteDocument operator) but now i cant see all the content in document. I just can see a few lines even though the document is 27 MB.