JAPANESE Tokenizing

Question

Hi.

I'm a niewbie at RapidMiner.

I'm trying to mining some webpages with "GetPage", "Extract Content" And "Process Documents".
It seems work well for ENGLIUSH pages, but for JAPANESE pages, tokenizer doesn't work well,

Japanese tokenize is not supported?

turutosiya · Answer

Hi All.

It's beeeeeen a really long time to start this proj. at last, I have time to try.

I'm looking for document which describing API spec for Tokenizer.
does anyone know?

I'm trying to implement a JapaneseTokenizer which work with morphological analysis engine, such as Chasen / Mecab.

land · Answer

Hi Karl,
you are very welcome if you can come up with a good algorithm for japanese tokenization!

With kind regards,
  Sebastian

karlrb · Answer

If I can be of any help, I would be happy to look into any specific questions on this subject.  My wife is Japanese and I'm in the process of learning Japanese - amazingly complex.

Karl Bergerson
Seattle WA USA
karl.bergerson@gmail.com