Recombine predicted records with original dataset
How do I recombine the predicted data with the original data, my reason for this is that I want to verify the misclassifications. I have extracted features from the original features and now I need to refer back to the old features which were filtered during training.
Answers
-
Your question is a bit unclear---if you post your process, it may be easier to diagnose what you are doing.
If you are using cross-validation for your model building, then you should already have the combined dataset you are referencing---with both your predictions, the confidences, and the original labels, plus all the attributes used for scoring---available from the "test" output for review. You just need to connect that from the inner process on the "Testing" side so you can output it.
If you split your dataset earlier (for whatever reason) and you need to recombine it, you can do that with the "Join" operator, as long as you have an index attribute.
0 -
I often face the same issue and this process would give you a general idea of how that can be achived:
Just create an ID (is there were no ID) and multiply the initial dataset before doing any modelling work. At the end, make inner join (use ID as a key) of the initial dataset and the labeled dataset which is the output of the validated and tested model, this way you'll get back all the old features into new labeled dataset.
1