Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
[SOLVED] Problem with tokenize
jose
hello!
My question is this, so I have understood the tokenize operator divides the sentences into words. there is some way of dividing the prayers taking two words and not a word as usual the operator tokenize?.
Find more posts tagged with
AI Studio
Accepted answers
All comments
text_miner
Hi Jose,
Are you asking if you can have terms of more than one word/token? If so, the answer is yes. After you tokenize, use the Generate n-Grams (Terms) operator. This will generate phrases of n sequential tokens. Note: you will still have the single terms in your term-by-document matrix too. For example, generating 2-grams you would have "heart", "attack", and "heart attack" in the matrix.
jose
ok, perfect, thanks
km
How can I have only 2-grams and not 1-grams? e.g. "Heart Attack" and not "Heart" + "Attack" in the matrix?
MartinLiebig
Hi,
i think there is no way from preventing it to generate the table. There is the option however to use a clever Regex in Select Attributes and simply remove them.
~Martin
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups