Got 100% accuracy ,precious and Recall
fatimidveil
New Altair Community Member
hi everyone, My data set consist of 1150 entities and i have one attribute that is highly correlated with my class attribute. i got 100 % accuracy precious and recall of my algorithm .
and also my decision tree get only one attribute that is highly correlated with my class attritube.
what should i do know .?
i have apply three algorithm on my data set id3,cart and c4.5
so how i calculate with one is perform better than other ?
and also my decision tree get only one attribute that is highly correlated with my class attritube.
what should i do know .?
i have apply three algorithm on my data set id3,cart and c4.5
so how i calculate with one is perform better than other ?
Tagged:
0
Best Answer
-
Great, you can also look at the general relation between the highly correlated attribute and the outcome variable. If that relationship is acceptable in your domain, then you are fine. What I mean by relationship? For example, if you have a data set related to sport outcome and you are trying to predict win or loss for a team. In this dataset, there is a predictor column named "Winning percentage" that has values ranging from 0 to 100. Lets assume that the output "Outcome" attribute is labeled based on this winning percentage column (if winning percentage >=50 then Win and if winning percentage < 50 then loss). In this case, the algorithm can predict with very high accuracy as there is a clear relation between "Wiinning percentage" and "Oucome". These sort of general checks can be performed if we have have accuracy and highly correlating attribute.1
Answers
-
Hi @fatimidveil,
Have you tried to submit your data to the AutoModel ?
It can be a good starting point...
Regards,
Lionel0 -
no am working on my thesis and i collect data by my own
0 -
Hello @fatimidveil
You can try using automodel option present in rapidminer as mentikned by @lionelderkrikor . If you want to build model by yourself, use cross validation method with 5 folds and see how the model performances are varying. In cross validation, you can use feature selection and optimal hyper parameter search for better model performance.
There is nothing wrong in having single sttribute in Tree. One reason for this is the pruning in tree that will remove attributes that doesn't provide much information.1 -
yes i perform cross validation with 10 fold and also perform feature selection techniques like weight by information gain ,gini index ,chi Square statistics and weight by information gain ratio .
1 -
i perform auto model as well and get the same tree as i got earlier with one attribute
0 -
Great, you can also look at the general relation between the highly correlated attribute and the outcome variable. If that relationship is acceptable in your domain, then you are fine. What I mean by relationship? For example, if you have a data set related to sport outcome and you are trying to predict win or loss for a team. In this dataset, there is a predictor column named "Winning percentage" that has values ranging from 0 to 100. Lets assume that the output "Outcome" attribute is labeled based on this winning percentage column (if winning percentage >=50 then Win and if winning percentage < 50 then loss). In this case, the algorithm can predict with very high accuracy as there is a clear relation between "Wiinning percentage" and "Oucome". These sort of general checks can be performed if we have have accuracy and highly correlating attribute.1
-
thank you so much for your support my variable is not much correlate if i discard that variable my tree seem fine to me in fact my tree is built a0