How to

New Altair Community Member

Oct 17, 2019

Updated Nov 5, 2024 by Jocelyn

I am running a naive bayes classifcation, the most simplies way I could find on the internet. Results are...well...weird.

My trainingdata looks like this: 2 columns, column1 = combination of terms/words, column2 = categorization of those combinations

Example: column1 => "where to buy a mercedes" column2 => "mercedes"
Example: column1 => "whats the newesst mercedes model" => "mercedes"

So basically categorizing into "brands" of cars lets say

My dataset which should be classified ovv only has 1 column with combinations of terms/words.

Whats the best way to optimize or achieve that?

Find more posts tagged with

AI Studio

Classification

Naïve Bayes

Sort by:

1 - 2 of 21

kayman

New Altair Community Member

Oct 17, 2019

Are you tokenizing your dataset (split by word, set cases, strip stopwords etc) or do you do your classification on the full sentence?

What needs to be done is follow a text processing workflow as described before, using the process data from documents operator, and ensure your string is of text type (not the default nominal). Create a vector set using TF-IDF (or another one) with this operator and use the output to train your model.

Results can further be improved with toggling the settings (like increase or decrease the pruning) or add additional steps in your tokenizing workflow.

Hope this helps!

mauricenew

New Altair Community Member

Oct 18, 2019

Updated Oct 18, 2019 by mauricenew

Do I have to tokenize both, trainingsdata and my dataset (which should be predicted)?

So far I do this:

Trainigsdata ->"Nominal to text" -> "Process Documens from Data" (inside there is a tokenize operator) -> "set role" -> "naive bayes" -> "apply model"

ps: Thanks already for your input!

🎉Community Raffle - Win $25

How to

Find more posts tagged with

Quick Links