Impute Missing Values Error

dasoxori
dasoxori New Altair Community Member
edited November 5 in Community Q&A
I am trying to fill in the missing values in the dataset using the Impute Missing Values operator.

As I integrate the operator into the process and when connecting to the dataset I get the information about it as shown in the image below.



When I enter the operator the information I get without having integrated knn yet is that there are no missing values as you can see in the image below.




If I put the k-NN I took the following error




And if I connect the exa with the mod with a straight line it produces a logical error "Wrong Connection".

Any idea?

Thank you

Best Answer

  • CKönig
    CKönig New Altair Community Member
    Answer ✓
    Hi @dasoxori,
    the discrepancy between the whole dataset and the dataset on the input port of the inner subprocess can be partly explained by the default setting of "learn on complete cases".

    The operator "Impute Missing Values" essentially builds a prediction model to predict the missing values. Per default, the option "learn on complete cases" makes sure no examples with any missing value get fed into the training subprocess, since some machine learning algorithms cannot handle missing values. So the effect of having no missing values on the inside is totally correct, as long as that option is activated. If you deactivate it, the missing values should be shown again. Still, the metadata is probably not completely correct, since the total number of examples is not accurate in the case the examples with missing attributes are excluded.

    The follow-up error "Example set is empty" is most likely a result of the same parameter setting: your dataset seems to include lots and lots of missing values. Is it possible, that there is no "complete" example (row) in your dataset? That way, all of the examples get discarded and there is no data left for training the impute model.

    Kind regards,
    Christian

Answers

  • Hi @dasoxori,

    What you're viewing here is the metadata rather than the data itself - there are instances where it might not keep up with the data. It seems like potentially there's a problem with your dataset - could you right-click on your k-NN and add a Breakpoint Before to view the data and report back?

    Best,
    Roland
  • CKönig
    CKönig New Altair Community Member
    Answer ✓
    Hi @dasoxori,
    the discrepancy between the whole dataset and the dataset on the input port of the inner subprocess can be partly explained by the default setting of "learn on complete cases".

    The operator "Impute Missing Values" essentially builds a prediction model to predict the missing values. Per default, the option "learn on complete cases" makes sure no examples with any missing value get fed into the training subprocess, since some machine learning algorithms cannot handle missing values. So the effect of having no missing values on the inside is totally correct, as long as that option is activated. If you deactivate it, the missing values should be shown again. Still, the metadata is probably not completely correct, since the total number of examples is not accurate in the case the examples with missing attributes are excluded.

    The follow-up error "Example set is empty" is most likely a result of the same parameter setting: your dataset seems to include lots and lots of missing values. Is it possible, that there is no "complete" example (row) in your dataset? That way, all of the examples get discarded and there is no data left for training the impute model.

    Kind regards,
    Christian