Text Mining - Name Collision with special and regular attributes
text_miner
New Altair Community Member
Hi,
Since RapidMiner requires all attribute names to be unique, I've noticed a potential naming conflict when doing text mining. If a special attribute with name X exists, then a regular attribute with the same name cannot also exist (or the regular attribute gets removed when the special attribute is created). For example, the special attributes "id" and "label" are relatively common terms that may also appear in text documents.
Is there anyway to specify a prefix/postfix for all special attributes (e.g., metadata_ or specattr_) so name collisions are less likely to occur? If not, could something be added to the configuration options or on the root Process node to allow for this functionality?
Thanks!
Since RapidMiner requires all attribute names to be unique, I've noticed a potential naming conflict when doing text mining. If a special attribute with name X exists, then a regular attribute with the same name cannot also exist (or the regular attribute gets removed when the special attribute is created). For example, the special attributes "id" and "label" are relatively common terms that may also appear in text documents.
Is there anyway to specify a prefix/postfix for all special attributes (e.g., metadata_ or specattr_) so name collisions are less likely to occur? If not, could something be added to the configuration options or on the root Process node to allow for this functionality?
Thanks!
Tagged:
0
Answers
-
Hi,
the Document processing operators will take care that no attribute name is used twice. If words like label or id occur, they will be assigned attributes names label_0 (or label_1 if label_0 already exists). This is remembered in the word list so that the attribtues are named equally during application.
Greetings,
Sebastian0