[SOLVED] Splitting a nominal attribute that has no separator
Hello.
I have attributes whose values have the form:
AK234
A112
In other words we have set of a alphabetical characters
followed by a set of digits. The question is: how can I split
the attribute into two attributes each containing either the
alphabetical or numerical characters.
I have attempted to use the Split operator but I seem to be
only able to select either one of the parts but not both.
Is their any way I can do this with an operator?
TIA,
Hugo F.
I have attributes whose values have the form:
AK234
A112
In other words we have set of a alphabetical characters
followed by a set of digits. The question is: how can I split
the attribute into two attributes each containing either the
alphabetical or numerical characters.
I have attempted to use the Split operator but I seem to be
only able to select either one of the parts but not both.
Is their any way I can do this with an operator?
TIA,
Hugo F.
Find more posts tagged with
Sort by:
1 - 6 of
61
All those attributes are marked regular and polynominal.
I think I am close to a solution.
My current attempt uses the expression:
[^a-z]+
so a k100 will have the 100 highlighted.
This means I can generate attributes such as:
att_
att_g
att_k
att_l
att_s
etc.
This looks ok. For the second part I have:
[a-z]+
which results in attributes such as:
att_
att_100
att_985
etc.
I cannot seem to do the split in a single expression
(note that I was able to do this when I had values split
with '/' in another attribute). I am now looking how I can
use a "multiply" and then combine those attributes back into
a single example set. Problem is now I have duplicate
attributes.
Seems way too convoluted. Any easier to do this?
TIA
I think I am close to a solution.
My current attempt uses the expression:
[^a-z]+
so a k100 will have the 100 highlighted.
This means I can generate attributes such as:
att_
att_g
att_k
att_l
att_s
etc.
This looks ok. For the second part I have:
[a-z]+
which results in attributes such as:
att_
att_100
att_985
etc.
I cannot seem to do the split in a single expression
(note that I was able to do this when I had values split
with '/' in another attribute). I am now looking how I can
use a "multiply" and then combine those attributes back into
a single example set. Problem is now I have duplicate
attributes.
Seems way too convoluted. Any easier to do this?
TIA
Hi,
in fact the Split operator is useful, if you have two values in an attribute which are separated by a fixed string. Referring to your example with A100, AK200 or alike, there is no splitting sequence. The case would be different if you had A_100, AK_200 etc. Since you don't have it, you should use Generate Attributes and create 2 new attributes with the following definition:
a1: replaceAll(a, "[a-zA-Z]", "")
a2: replaceAll(a, "[^a-zA-Z]", "")
This assumes that your original attribute is called a. The process below does what I described above.
Best regards,
Marius
in fact the Split operator is useful, if you have two values in an attribute which are separated by a fixed string. Referring to your example with A100, AK200 or alike, there is no splitting sequence. The case would be different if you had A_100, AK_200 etc. Since you don't have it, you should use Generate Attributes and create 2 new attributes with the following definition:
a1: replaceAll(a, "[a-zA-Z]", "")
a2: replaceAll(a, "[^a-zA-Z]", "")
This assumes that your original attribute is called a. The process below does what I described above.
Best regards,
Marius
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.005" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="30">
<list key="attribute_values">
<parameter key="a" value=""AK100""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.005" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
<list key="function_descriptions">
<parameter key="a1" value="replaceAll(a, "[a-zA-Z]", "")"/>
<parameter key="a2" value="replaceAll(a, "[^a-zA-Z]", "")"/>
</list>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I have just realized that even though the split via regexp
seems to work in the dialogue box that allows testing, the
output does _not_ split the attribute value.
Any help will be appreciated.
TIA