Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Performance of Impute Missing Values
HeikoPaulheim
Hi,
just by chance, I found out that the impute missing values operator trains a model for each attribute - while from my understanding, it would be perfectly enough to train a model only for those attributes that actually contain missing values, with the result being 100% identical. This tweak could improve the operator's performance by a large factor in many cases.
Best,
Heiko
Find more posts tagged with
AI Studio
Accepted answers
All comments
RalfKlinkenberg
Hi Heiko,
yes, this would accelerate the operator. However, please consider the following: While some attributes may not have missing values during the training phase, they might actually have missing values during the deployment phase. The operator in its current implementation can handle that, while the accelerated version would not be able to handle missing values of such attributes in the deployment phase and hence would perform only an incomplete job. Since the new data occuring during deployment is not known in advance and hence you cannot be sure that certain attributes will not have missing values in the future, you need value prediction models for all attributes, if you want to have a robust implementation of this operator.
If you would like to apply the missing value imputation only to a subset of the attributes, you can combine it with an attribute selection opersator and re-join the other attributes later.
Best wishes,
Ralf
HeikoPaulheim
Hi Ralf,
this is an interesting argument. However, if the operator would look into attributes on the fly and decide whether or not they contain missing values, the thing should still work. The models seem to be built right at the moment when the operator is applied, so I would have a model for every attribute I need. Am I missing anything here?
The matter would be different, of course, if I trained an imputation model on training data, to apply it to test data later on. In that case, however, I would expect a preprocessing model output of the impute missing values operator.
Best,
Heiko
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups