🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

[SOLVED] Really basic question, I think I'm applying models wrong.

User: "johnyma22"
New Altair Community Member
Updated by Jocelyn
image

My first read database gets all of the values from the documents (20k)

My second read database(1k documents) has a value isGood = 1 if the value is good, -2 if the value is bad and a bunch of other really bad ideas..  I set isGood to label.  Should I actually only be passing true/false or is an integer okay?

I use nominal to text to get the "data" field as text.

I then process the document, looking for word frequencies etc.

Is my Naive bayes even in the right place?

My end goal is that I feed it 1000 known good documents and it can find very similar documents from the first read database...  I want my confidence score to be based on document similarity.

I am getting an output that contains confidence but I'm not sure how to present my output, I don't come from a statistical background so I'm learning on my feet.  I appreciate I have a lot to learn so in 3 weeks time I'm going to read some books/content about how to use rapidminer and ML in general.  I can only apologize for my ignorance!

TLDR;
Can I use an integer as a label?
Am I using naive bayes and apply model correctly?
How can I view my data in an easy to interpret way.  Ideally something like a list of document IDs with their confidence rating.

Thanks guys!

Find more posts tagged with