Function description

Thiru
New Altair Community Member
dear all,
Ive a data set in which age of the subject is given as an attribute and the values are given in
either months or years or in weeks.
eg: 3 days , 8 weeks , 10 months
I want to convert that attribute in to no. of days, so that i can group them based on no. of days. I was trying
to use functions - 'finds" and "parse", but not successful. can someone helps me on this. thank you.
regds
thiru
Ive a data set in which age of the subject is given as an attribute and the values are given in
either months or years or in weeks.
eg: 3 days , 8 weeks , 10 months
I want to convert that attribute in to no. of days, so that i can group them based on no. of days. I was trying
to use functions - 'finds" and "parse", but not successful. can someone helps me on this. thank you.
regds
thiru
Tagged:
0
Best Answer
-
@Thiru, Now I get it :-)
In this case use 'contains' is what you need, in combination with the if statement.So if Age pet contains year then get number times 365, else if Age pet contains month get number times 30 and so on.So something like this :if(contains([Age pet],"year"),
parse(replaceAll([Age pet],"\\D",""))*365,
if(contains([Age pet],"month"),
parse(replaceAll([Age pet],"\\D",""))*30,
if(contains([Age pet],"week"),
parse(replaceAll([Age pet],"\\D",""))*7,
parse(replaceAll([Age pet],"\\D","")))))
5
Answers
-
it's in the generate attribute operator.
The idea is that you 'regenerate' your existing attribute, so you just use your existing attribute name, but generate new content for it.
The generate attribute operator contains all the search, replace, splice, trim and other functions you will need1 -
Yeah, takes some getting used to.
Regular expressions are your friend here, but they can be frightening if you're not used to them.
Try something as below :
(start a new process, copy the xml, open view -> show panel ->xml -> paste -> green tick in top corner to validate and store -> back to process window)
What is does is create a new field (but you can also overwrite your existing field), uses a regular expression to remove everything that's not a digit (using \D ) and then parses it.
Now for weeks you can safely multiply by 7, for months it's not so straightforward so I just took an average of 30.
Finally I used the aggregation operator to sum them all up.
Note that in reality you can combine all of the above in a single expression using the generate attribute, but it can become a bit unreadable then.<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="UTF-8"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.6.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34"> <parameter key="generator_type" value="attribute functions"/> <parameter key="number_of_examples" value="1"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"> <parameter key="MyDays" value=""3 days""/> <parameter key="MyWeeks" value=""8 weeks""/> <parameter key="MyMonths" value=""10 months""/> </list> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="34"> <list key="function_descriptions"> <parameter key="MyParsedDays" value="parse(replaceAll([MyDays],"\\D",""))"/> <parameter key="MyParsedWeeks" value="parse(replaceAll([MyWeeks],"\\D",""))*7"/> <parameter key="MyParsedMonths" value="parse(replaceAll([MyMonths],"\\D",""))*30"/> </list> <parameter key="keep_all" value="true"/> </operator> <operator activated="true" class="generate_aggregation" compatibility="9.6.000" expanded="true" height="82" name="Generate Aggregation" width="90" x="380" y="34"> <parameter key="attribute_name" value="TotalDays"/> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="MyParsedDays|MyParsedMonths|MyParsedWeeks"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="aggregation_function" value="sum"/> <parameter key="concatenation_separator" value="|"/> <parameter key="keep_all" value="true"/> <parameter key="ignore_missings" value="true"/> <parameter key="ignore_missing_attributes" value="false"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/> <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
0 -
Ah, wasn't that clear to me. In the end it means just a bit more complex find and replace logic.
something like this :
input 3 days , 8 weeks , 10 months
output = 359<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.6.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="34"> <parameter key="generator_type" value="attribute functions"/> <parameter key="number_of_examples" value="1"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"> <parameter key="myField" value=""3 days , 8 weeks , 10 months""/> </list> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34"> <list key="function_descriptions"> <parameter key="days" value="parse(replaceAll(myField,"(\\d+) days?[ ,]+(\\d+) weeks?[ ,]+(\\d+) months?","$1"))"/> <parameter key="days" value="days + (parse(replaceAll(myField,"(\\d+) days?[ ,]+(\\d+) weeks?[ ,]+(\\d+) months?","$2"))*7)"/> <parameter key="days" value="days + (parse(replaceAll(myField,"(\\d+) days?[ ,]+(\\d+) weeks?[ ,]+(\\d+) months?","$3"))*30)"/> </list> <parameter key="keep_all" value="true"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
1 -
@kayman, thanks for your reply. I m sorry that I didnt make it clear to understand the data .
All these are different rows of a single attribute. Means - 3 days can be one row, 8 weeks another row , 10 months
another row.. , 6 years can be an another one. like that there are many rows.
Im enclosing the sample of that attribute " Age pet". kindly have a look on it.0 -
@Thiru, Now I get it :-)
In this case use 'contains' is what you need, in combination with the if statement.So if Age pet contains year then get number times 365, else if Age pet contains month get number times 30 and so on.So something like this :if(contains([Age pet],"year"),
parse(replaceAll([Age pet],"\\D",""))*365,
if(contains([Age pet],"month"),
parse(replaceAll([Age pet],"\\D",""))*30,
if(contains([Age pet],"week"),
parse(replaceAll([Age pet],"\\D",""))*7,
parse(replaceAll([Age pet],"\\D","")))))
5