"Regarding Text Mining"
maria_godric
New Altair Community Member
Hi,
I have a text document.How can I delete the contents in between two special characters (For Example my document contains #something#). I want to delete the special character also. I tried with TextCleaner but we have to include the content whatever we want to delete.So I think this will not work out if its for huge amount of data.Is there any Operators available in RM?
Thanks,
Maria
I have a text document.How can I delete the contents in between two special characters (For Example my document contains #something#). I want to delete the special character also. I tried with TextCleaner but we have to include the content whatever we want to delete.So I think this will not work out if its for huge amount of data.Is there any Operators available in RM?
Thanks,
Maria
Tagged:
0
Answers
-
Hi,
you might add an TokenReplace Operator before the Tokenizer during TextProcessing and then use regular expressions to capture whatever you want.
Here's an example process setup:<operator name="Root" class="Process" expanded="yes">
For more information about regular expressions, you could visit wikipedia http://en.wikipedia.org/wiki/Regular_expression and for trying something without executing the process, you could use the online form at http://en.wikipedia.org/wiki/Regular_expression.
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
</list>
<list key="namespaces">
</list>
<operator name="TokenReplace" class="TokenReplace">
<list key="replace_dictionary">
<parameter key="#[^#]*#" value=" "/>
</list>
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
</operator>
</operator>
Greetings,
Sebastian0 -
Thanks Sebastain.
It worked fine.But I would like to get the edited text in the same format as that of original data ie I need to save it in .txt format .
Thanks,
Maria
0