[SOLVED] Really basic question, I think I'm applying models wrong.

New Altair Community Member

Jan 22, 2012

Updated Nov 5, 2024 by Jocelyn

My first read database gets all of the values from the documents (20k)

My second read database(1k documents) has a value isGood = 1 if the value is good, -2 if the value is bad and a bunch of other really bad ideas.. I set isGood to label. Should I actually only be passing true/false or is an integer okay?

I use nominal to text to get the "data" field as text.

I then process the document, looking for word frequencies etc.

Is my Naive bayes even in the right place?

My end goal is that I feed it 1000 known good documents and it can find very similar documents from the first read database... I want my confidence score to be based on document similarity.

I am getting an output that contains confidence but I'm not sure how to present my output, I don't come from a statistical background so I'm learning on my feet. I appreciate I have a lot to learn so in 3 weeks time I'm going to read some books/content about how to use rapidminer and ML in general. I can only apologize for my ignorance!

TLDR;
Can I use an integer as a label?
Am I using naive bayes and apply model correctly?
How can I view my data in an easy to interpret way. Ideally something like a list of document IDs with their confidence rating.

Thanks guys!

Find more posts tagged with

AI Studio

🎉Community Raffle - Win $25

[SOLVED] Really basic question, I think I'm applying models wrong.

Find more posts tagged with

Quick Links