A program to recognize and reward our most engaged community members
If ther's an exampleset-instance with a nominal attribute, you just have to get its mapping to obtain the set of all possible values. There's no calculation necessary to use the key-set of the mapping as values for the meta-data.
If I can not rely on it when implementing an operator (because I never know how the user has read her data), the only solution seems to NOT use meta-data at all.
Putting everything into a repository first, looks like an ugly workaround for me because of two reasons:1) RM doesn't enforce this
data always emerges outside of RM, so getting it into it is a crucial step when doing data-mining. So every data-reader implementation should be implemented in a way that the resulting example-set if fully working/compatible with any RM-operator. For instance if I have my data in a database I want to read from it directly (that's why i have it) without first dumping the table into some special data-repository.
Currently there are first-class data-readers (the repository) and second class readers with incorrect meta-data (AbstractReader subclasses), which is not clear to the user.
If I remember correctly an earlier posting of you, the idea of meta-data was to ease inter-operator communication. But as operators can not rely on the meta-data, it seems to better to ignore them, which is complicated as they need to be updated in case of data-transformations. By adding a second pathway (data + and now also meta-data) operator-implementations seem to require much more effort.
Is it possible to disable meta-data processing completely, or do some operators rely on it?
I am not sure if you ever worked with data sets containing several hundreds of millions of tupels?
works fine with directly loading from files or databases... The meta data is only for supporting GUI elements, checks, quick fixes etc. Nice things. But not necessary.