"Regarding Text Mining"

Question

Hi,

I have a text document.How can I delete the contents in between two special characters (For Example  my document contains #something#). I want to delete the special character also. I tried with TextCleaner but we have to include the content whatever we want to delete.So I think this will not work out if its for huge amount of data.Is there any Operators available in RM?

Thanks,
Maria

maria_godric · Answer

Thanks Sebastain.

It worked fine.But I would like to get the edited text in the same format as that of original data ie I need to save it in .txt format .

Thanks,
Maria

land · Answer

Hi, you might add an TokenReplace Operator before the Tokenizer during TextProcessing and then use regular expressions to capture whatever you want. Here's an example process setup: For more information about regular expressions, you could visit wikipedia http://en.wikipedia.org/wiki/Regular_expression and for trying something without executing the process, you could use the online form at http://en.wikipedia.org/wiki/Regular_expression. Greetings, Sebastian