Define spliting characters for tokenizer?

New Altair Community Member

Aug 12, 2009

Updated Nov 5, 2024 by Jocelyn

Hi!
I was playing around with the text plugin because it seemed to be the easiest way to try to run svms on the data I am working with and the example aready seem quite useful, but the StringTokenizer does too much splitting for my files, e.g. it splits stuff like "get_file" at "_", "c:\windows" at "\" etc...
Is there a way to tell it to split only on blank spaces, only on newlines, etc? I tried making my own Tokenizer, but sadly the given one only calls edu.udo.cs.wvtool.generic.tokenizer.StringTokenizer which comes from a library...

Find more posts tagged with

AI Studio

Define spliting characters for tokenizer?

Find more posts tagged with

Quick Links