Predicting Performance with Random Forest

Question

I am doing a work for university and it is my first time on RapidMiner. 
I try to predict if people will get vaccinated or not to avoid sending letters sent to people who will not be vaccinated and thus minimizing costs of sending.

I have a big database with more than 400 attributs. I need thus to classify attributs and delete useless ones. I tried the attributs Random Forest, Apply Model and Performance (Classification) but when I check the performance results, I always have 0% and 100% for class recall. I tried to use another model that I have often seen on the internet "k-NN" and with this one it is not the case. Thus I supposed that the problem is the Random Forest. 
Does someone know why the model predict always the same value ?

BalazsBaranyRM · Answer

Hi,

you should really watch a few introductory videos on validation to understand what happens here.
https://academy.rapidminer.com/learn/video/introduction-to-model-validation

Also, think about optimizing your model. Look at the model output. Did the random forest create trees that don't have any decisions in them? Or too complex ones? You could have underfitting (no reasonable trees) or overfitting (overly complex trees learning the incoming data, not the rules).

For a data set with many attributes, Naive Bayes and Support Vector Machines can be helpful.

You could also try AutoModel and RapidMiner Go, they would automatically determine the best modeling algorithm for your data.

Regards,
Balázs