nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

⚠️Please Note

Technical discussions have been migrated to the Siemens Support Center as Knowledge Base (KB) articles; please note that this content is no longer maintained and may be outdated, so for the latest information, log in to the Siemens Support Center, search online, or contact our support team.

Search for Content in Siemens Support Center

tokenizing by sentences and learning algorithms

lavramu

Hi,

Thanks for the help so far. I have another question and I am sorry to bother this way.

Most of the tutorials and problems I have seen so far in text classification through machine learning using rapidminer have used word vectors and tokenized text files into words before running any learning algo. Now my problem does not need words but sentences. For this I use the tokenize and select linguistic sentences and I try to run the learning algo. So the text files containg sentences and are tokenized into sentences and not words.

Will this work similarly? How is this different? I know that Perl's naive bayes allows this.

Second question is, what is the minimum data needed in order to be able to make an algo learn?

Third question is, (more important) Is there a difference between these two :
1) read in the text files (with appropriate class ) --> tokenize by sentence --> learning algo
2) read in the text files --> tokenize --> write to disk where each file has one sentence --> read each of these files --> leaning algo

(basically I am trying to understand if tokenize ensures that the learning algo takes in sentence by sentence here)

I dint want to startup new threads and hence put down this. Please help thanks!