Classification

Madcap
Madcap New Altair Community Member
edited November 5 in Community Q&A
Hi all,
When classification has been explained to me before it has been in two steps, 
Firstly, a learning phase which takes the data set puts it through a classification algorithm, making a model (e.g. a set of rules)
Secondly, the classification phase where test data is used to measure the accuracy of the model. 

When I use a classification method such as decision tree on RapidMiner, does it do both of these steps in 1 component? Or am I doing the second phase when using cross-validation etc.?

Thanks for any help and please correct any flaws in my logic.
-Madcap

Best Answer

  • varunm1
    varunm1 New Altair Community Member
    edited February 2019 Answer ✓
    Hi @Madcap

    When you are using the decision tree algorithm (operator) it is training. Once you connect the algorithm to apply model (operator) with testing data then it tests.
    In cross-validation (CV), you will just connect your data set to the CV operator, the operator automatically splits data into training and testing based on the number of folds provided (default 10 in RM). Once you place the algorithm in training and apply model and performance operators in testing it will test and provide you the performance metrics. So, both training and testing are being done in CV. Your understanding is correct about CV.

    You can provide XML code here so that we can figure out if there is an issue.

Answers

  • varunm1
    varunm1 New Altair Community Member
    edited February 2019 Answer ✓
    Hi @Madcap

    When you are using the decision tree algorithm (operator) it is training. Once you connect the algorithm to apply model (operator) with testing data then it tests.
    In cross-validation (CV), you will just connect your data set to the CV operator, the operator automatically splits data into training and testing based on the number of folds provided (default 10 in RM). Once you place the algorithm in training and apply model and performance operators in testing it will test and provide you the performance metrics. So, both training and testing are being done in CV. Your understanding is correct about CV.

    You can provide XML code here so that we can figure out if there is an issue.