Community & Support
Learn
Marketplace
Discussions
Categories
Discussions
General
Platform
Academic
Partner
Regional
User Groups
Documentation
Events
Altair Exchange
Share or Download Projects
Resources
News & Instructions
Programs
YouTube
Employee Resources
This tab can be seen by employees only. Please do not share these resources externally.
Groups
Join a User Group
Support
Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Split text tokens where words have concatenated
mob
I have text tokens like
stylesexploration
expressionresearch
technologypractice
curriculaimprovisationsurvey
where the punctuation and or spaces are missing in the original text
Besides using a list of replace "expressionresearch" with 2 tokens "expression" & "research" is there a smarter way to handle the situation
Find more posts tagged with
AI Studio
Text Mining + NLP
Split
Accepted answers
All comments
MartinLiebig
What ever you do, you need to have a list of words, which can be inside. Some kind of dictionary.
then you might do things using some Generate Attributes functions like contains or find or so..
~Martin
JEdward
One approach might be to try a word tokenizer for non-English characters such as Jieba (link below). You can then provide it with your own dictionary of words to split by. Hope that helps.
https://github.com/fxsjy/jieba
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups