Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"[SOLVED] Tokenize - Generate n-grams and Filters"
MarcosRL
hello friends comunidad.Una query
I need to perform the following procedure.
1) Read a text document
2) tokenize
3) Generating compound words (n grams)
4) Delete all compound words that are not equal to last list.
I could tokenize and generate compound words.
and filter operator "Text: Filter Tokens (by Content)" in the "string" added the compound word to filter and I filters.
The problem is I not how to add more than one word, to filter various compounds.
From already thank you very much
Regards
Find more posts tagged with
AI Studio
Filtering
Accepted answers
All comments
MariusHelf
Hey,
you marked this topic as solved - the community would be grateful if you posted your solution
TIA
~Marius
MarcosRL
It's a secret, I can not tell
;D
The solution was to use the "Filter Tokens (by Content)" parameter in the "condition" = "matches" and create a "regular expression" with all the words you want to filter in the following format:
word1 | word2 | wordN
This is separate words with the wildcard "|" unused spaces
It took me four hours to find this solution
Regards
MariusHelf
Thank you!
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups