🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Regarding Text Mining"

User: "maria_godric"
New Altair Community Member
Updated by Jocelyn
Hi,

I have a text document.How can I delete the contents in between two special characters (For Example  my document contains #something#). I want to delete the special character also. I tried with TextCleaner but we have to include the content whatever we want to delete.So I think this will not work out if its for huge amount of data.Is there any Operators available in RM?

Thanks,
Maria

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "land"
    New Altair Community Member
    Hi,
    you might add an TokenReplace Operator before the Tokenizer during TextProcessing and then use regular expressions to capture whatever you want.

    Here's an example process setup:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="TextInput" class="TextInput" expanded="yes">
            <list key="texts">
            </list>
            <list key="namespaces">
            </list>
            <operator name="TokenReplace" class="TokenReplace">
                <list key="replace_dictionary">
                  <parameter key="#[^#]*#" value=" "/>
                </list>
            </operator>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
        </operator>
    </operator>
    For more information about regular expressions, you could visit wikipedia http://en.wikipedia.org/wiki/Regular_expression and for trying something without executing the process, you could use the online form at http://en.wikipedia.org/wiki/Regular_expression.

    Greetings,
      Sebastian
    User: "maria_godric"
    New Altair Community Member
    OP
    Thanks Sebastain.

    It worked fine.But I would like to get the edited text in the same format as that of original data ie I need to save it in .txt format .

    Thanks,
    Maria