Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

"[SOLVED] Tokenize - Generate n-grams and Filters"

hello friends comunidad.Una query
I need to perform the following procedure.
1) Read a text document
2) tokenize
3) Generating compound words (n grams)
4) Delete all compound words that are not equal to last list.
I could tokenize and generate compound words.
and filter operator "Text: Filter Tokens (by Content)" in the "string" added the compound word to filter and I filters.
The problem is I not how to add more than one word, to filter various compounds.

From already thank you very much
Regards

Find more posts tagged with

AI Studio

Filtering

Accepted answers

All comments

MariusHelf

Hey,

you marked this topic as solved - the community would be grateful if you posted your solution

TIA

~Marius

MarcosRL

It's a secret, I can not tell

;D

The solution was to use the "Filter Tokens (by Content)" parameter in the "condition" = "matches" and create a "regular expression" with all the words you want to filter in the following format:
word1 | word2 | wordN
This is separate words with the wildcard "|" unused spaces
It took me four hours to find this solution
Regards

MariusHelf

Thank you!