Text Processing with Document type (how to use modified output?)

User: "colo"
New Altair Community Member
Updated by Jocelyn
Hello everybody,

I want to do some text processing with the Document type. In a simple example I use "Read Document" to access a formerly crawled and stored web page (html file). The content shall be filtered and inspected with some regular expressions. For the beginning I just added the "Keep document parts" operator to discard everything but the <body>...</body> part. The Document output shows the desired modified content in the upper window. This is the part I need for further text processing but some operators seem to always work on the original document. For example a following "Extract information" with a regex "<head>" finds this content. Looking for other content which becomes available through filtering and transformation (left out in my simple example explained above) can never be found. "Write Document" also generates the original text ignoring all changes to Document made in my operator chain.

This results in my simple but important question: how to work with the modified document?

Thanks in advance!

Find more posts tagged with