Time Series Gaps for Arima - How to fill them?
pedrodomingosdv
New Altair Community Member
Hello,
I using auto-arima (operator R Script) with some success, but I'm facing now an issue. My data sometimes is not provided with all dates. For example, my data is recorded by week and to be in a date format I use the every monday of each week.
Tipically I do not have gaps, but ever in a while I have and it takes a lot of time to create those rows for every runs I have to do. So basically I would like to know if there are any ways of filling the missing date points in Rapidminer. It would be helpful because I want to replace those gaps with interpolation or average.
I see that there are some operators that are related with similar issues. I thought that "Fill Data Gaps" might be the one, but every time I set the step size as 7 the process freezes and no outcome is delivered at all.
I see that there are some operators that are related with similar issues. I thought that "Fill Data Gaps" might be the one, but every time I set the step size as 7 the process freezes and no outcome is delivered at all.
Enclosed an example of the data source in excel and a short process file.
Thanks,
Pedro
Tagged:
0
Best Answers
-
@lionelderkrikor
This example can get you started.
Copy the value of the Time_lapse macro generated by the Days operator I used.
@IngoRM or @mschmitz is there a way to use a Macro Generated, type integer, as a parameter on an operator ? I didn´t find a way of doing it.<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.1.000" expanded="true" height="82" name="Your Data Set" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (range)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Days" value="2018-01-01 00:00:00.2019-01-01 00:00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="DAY" value="2019-01-01 00:00:00.1.day"/> </list> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="238"> <parameter key="generator_type" value="numeric_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"> <parameter key="Value" value="linear.0\.0.1\.0"/> </list> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (3)" width="90" x="179" y="238"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join" width="90" x="313" y="136"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="inner"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="id" value="id"/> </list> <parameter key="keep_both_join_attributes" value="false"/> </operator> <operator activated="true" class="numerical_to_polynominal" compatibility="9.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="filter_examples" compatibility="9.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="id.is_not_in.2;7"/> </list> <parameter key="filters_logic_and" value="false"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value="DAY"/> <parameter key="attributes" value="|Value|DAY"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="remember" compatibility="9.1.000" expanded="true" height="68" name="Remember" width="90" x="715" y="34"> <parameter key="name" value="DataSet"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <connect from_op="Create ExampleSet (range)" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Generate ID (3)" to_port="example set input"/> <connect from_op="Generate ID (3)" from_port="example set output" to_op="Join" to_port="right"/> <connect from_op="Join" from_port="join" to_op="Numerical to Polynominal" to_port="example set input"/> <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Remember" to_port="store"/> <connect from_op="Remember" from_port="stored" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort" width="90" x="313" y="34"> <parameter key="attribute_name" value="DAY"/> <parameter key="sorting_direction" value="increasing"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal" width="90" x="447" y="34"> <parameter key="attribute_name" value="DAY"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="136"/> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day" width="90" x="581" y="34"> <parameter key="macro" value="min_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="DAY"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort (2)" width="90" x="715" y="34"> <parameter key="attribute_name" value="id"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Max_Day" width="90" x="849" y="34"> <parameter key="macro" value="max_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="max"/> <parameter key="attribute_name" value="DAY"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="generate_macro" compatibility="9.1.000" expanded="true" height="82" name="Days" width="90" x="983" y="34"> <list key="function_descriptions"> <parameter key="Time_lapse" value="date_diff(date_parse_custom(%{min_day},"dd/MM/yyyy"),date_parse_custom(%{max_day},"dd/MM/yyyy"))/(1000*60*60*24)"/> </list> <description align="center" color="green" colored="true" width="126">Use this value as the number of exampples on your Create Example Set</description> </operator> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="289"> <list key="attribute_values"> <parameter key="Min_Inicial" value="%{min_day}"/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="nominal_to_date" compatibility="9.1.000" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_type" value="date"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="adjust_date" compatibility="9.1.000" expanded="true" height="82" name="Adjust Date" width="90" x="313" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <list key="adjustments"> <parameter key="1" value="Day"/> </list> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (2)" width="90" x="447" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day (2)" width="90" x="581" y="289"> <parameter key="macro" value="min_day2"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="442"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Series" value="%{min_day} 00:00.%{max_day} 00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="Day_in_series" value="%{min_day2} 00:00.1.day"/> </list> <parameter key="date_format" value="dd/MM/yyyy HH:mm"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (4)" width="90" x="246" y="442"> <parameter key="attribute_name" value="Day_in_series"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join Info" width="90" x="514" y="442"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="left"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="Day_in_series" value="DAY"/> </list> <parameter key="keep_both_join_attributes" value="false"/> <description align="center" color="orange" colored="true" width="126">Joining original Data with the date Series to find missings</description> </operator> <connect from_op="Your Data Set" from_port="out 1" to_op="Generate ID (2)" to_port="example set input"/> <connect from_op="Generate ID (2)" from_port="example set output" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Date to Nominal" to_port="example set input"/> <connect from_op="Date to Nominal" from_port="example set output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Min_Day" to_port="example set"/> <connect from_op="Multiply" from_port="output 2" to_op="Join Info" to_port="right"/> <connect from_op="Min_Day" from_port="example set" to_op="Sort (2)" to_port="example set input"/> <connect from_op="Sort (2)" from_port="example set output" to_op="Max_Day" to_port="example set"/> <connect from_op="Max_Day" from_port="example set" to_op="Days" to_port="through 1"/> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Date" to_port="example set input"/> <connect from_op="Nominal to Date" from_port="example set output" to_op="Adjust Date" to_port="example set input"/> <connect from_op="Adjust Date" from_port="example set output" to_op="Date to Nominal (2)" to_port="example set input"/> <connect from_op="Date to Nominal (2)" from_port="example set output" to_op="Min_Day (2)" to_port="example set"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Date to Nominal (4)" to_port="example set input"/> <connect from_op="Date to Nominal (4)" from_port="example set output" to_op="Join Info" to_port="left"/> <connect from_op="Join Info" from_port="join" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="center" color="red" colored="true" height="241" resized="true" width="983" x="164" y="10">Extracting the min and max date on your data set and calculating teh amount of days the series need to create<br/></description> <description align="center" color="yellow" colored="false" height="442" resized="true" width="892" x="31" y="252">Generating a Data Set that includes all teh dates that are covered by your information and joining with your original DataSet</description> </process> </operator> </process>
5 -
Ok let me know if there is a way in which I can help.
I gues you speak spanish if easier for you we can change languages.5 -
Hi Marco, I'm Portuguese and looking to your name I guess that you are too
Sorry for the late reply, but I've been away from desk.
I tried to adapt the process you supplied, but with no success at all.
Two questions:
1) Enclosed the "adapted" process. What am I doing wrong? I feel that I need a couple of spare hours to understand the all process and that's why I did few adaptations.
2) Being able to have 1) correct, how can I apply that to fill my time series?
Is the output supposed to be already the time series with no gaps?
The output I'm getting doesn't seem to me to be correct.
Obrigado,
Pedro0 -
Hi @pedrodomingosdv I'm actually from Mexico so I guess we need to stick with English then.
I've made some changes to the process and connected it with the mock file you gave me. In my example the date attribute was named as DAY.
Also I'm attaching a picture of the process. and Added a breakpoint on the Days Operator please check what value is thrown to the Macro at that point and place that value on the Create Example Set operator.
sgenzer do you know how I can set the number of examples of the create example through a macro? I've tried with set macro (real) but id does not work.<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="136"> <parameter key="excel_file" value="C:\Users\mbarradas\Downloads\XL Mock File.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="DATE.true.date.attribute"/> <parameter key="1" value="SCORE.true.real.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort" width="90" x="313" y="34"> <parameter key="attribute_name" value="DATE"/> <parameter key="sorting_direction" value="increasing"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal" width="90" x="447" y="34"> <parameter key="attribute_name" value="DATE"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="136"/> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day" width="90" x="581" y="34"> <parameter key="macro" value="min_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="DATE"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort (2)" width="90" x="715" y="34"> <parameter key="attribute_name" value="id"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Max_Day" width="90" x="849" y="34"> <parameter key="macro" value="max_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="max"/> <parameter key="attribute_name" value="DATE"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" breakpoints="after" class="generate_macro" compatibility="9.1.000" expanded="true" height="82" name="Days" width="90" x="983" y="34"> <list key="function_descriptions"> <parameter key="Time_lapse" value="(date_diff(date_parse_custom(%{min_day},"dd/MM/yyyy"),date_parse_custom(%{max_day},"dd/MM/yyyy"))/(1000*60*60*24))+1"/> </list> <description align="center" color="green" colored="true" width="126">Use this value as the number of exampples on your Create Example Set</description> </operator> <operator activated="false" class="subprocess" compatibility="9.1.000" expanded="true" height="82" name="Your Data Set" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (range)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Days" value="2018-01-01 00:00:00.2019-01-01 00:00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="DAY" value="2019-01-01 00:00:00.1.day"/> </list> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="238"> <parameter key="generator_type" value="numeric_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"> <parameter key="Value" value="linear.0\.0.1\.0"/> </list> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (3)" width="90" x="179" y="238"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join" width="90" x="313" y="136"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="inner"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="id" value="id"/> </list> <parameter key="keep_both_join_attributes" value="false"/> </operator> <operator activated="true" class="numerical_to_polynominal" compatibility="9.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="filter_examples" compatibility="9.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="id.is_not_in.2;7"/> </list> <parameter key="filters_logic_and" value="false"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value="DAY"/> <parameter key="attributes" value="|Value|DAY"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="remember" compatibility="9.1.000" expanded="true" height="68" name="Remember" width="90" x="715" y="34"> <parameter key="name" value="DataSet"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <connect from_op="Create ExampleSet (range)" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Generate ID (3)" to_port="example set input"/> <connect from_op="Generate ID (3)" from_port="example set output" to_op="Join" to_port="right"/> <connect from_op="Join" from_port="join" to_op="Numerical to Polynominal" to_port="example set input"/> <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Remember" to_port="store"/> <connect from_op="Remember" from_port="stored" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="289"> <list key="attribute_values"> <parameter key="Min_Inicial" value="%{min_day}"/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="nominal_to_date" compatibility="9.1.000" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_type" value="date"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="adjust_date" compatibility="9.1.000" expanded="true" height="82" name="Adjust Date" width="90" x="313" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <list key="adjustments"> <parameter key="1" value="Day"/> </list> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (2)" width="90" x="447" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day (2)" width="90" x="581" y="289"> <parameter key="macro" value="min_day2"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="442"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="736"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Series" value="%{min_day} 00:00.%{max_day} 00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="Day_in_series" value="%{min_day2} 00:00.1.day"/> </list> <parameter key="date_format" value="dd/MM/yyyy HH:mm"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (4)" width="90" x="246" y="442"> <parameter key="attribute_name" value="Day_in_series"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join Info" width="90" x="514" y="442"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="left"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="Day_in_series" value="DATE"/> </list> <parameter key="keep_both_join_attributes" value="false"/> <description align="center" color="orange" colored="true" width="126">Joining original Data with the date Series to find missings</description> </operator> <connect from_op="Read Excel" from_port="output" to_op="Generate ID (2)" to_port="example set input"/> <connect from_op="Generate ID (2)" from_port="example set output" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Date to Nominal" to_port="example set input"/> <connect from_op="Date to Nominal" from_port="example set output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Min_Day" to_port="example set"/> <connect from_op="Multiply" from_port="output 2" to_op="Join Info" to_port="right"/> <connect from_op="Min_Day" from_port="example set" to_op="Sort (2)" to_port="example set input"/> <connect from_op="Sort (2)" from_port="example set output" to_op="Max_Day" to_port="example set"/> <connect from_op="Max_Day" from_port="example set" to_op="Days" to_port="through 1"/> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Date" to_port="example set input"/> <connect from_op="Nominal to Date" from_port="example set output" to_op="Adjust Date" to_port="example set input"/> <connect from_op="Adjust Date" from_port="example set output" to_op="Date to Nominal (2)" to_port="example set input"/> <connect from_op="Date to Nominal (2)" from_port="example set output" to_op="Min_Day (2)" to_port="example set"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Date to Nominal (4)" to_port="example set input"/> <connect from_op="Date to Nominal (4)" from_port="example set output" to_op="Join Info" to_port="left"/> <connect from_op="Join Info" from_port="join" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="center" color="red" colored="true" height="242" resized="true" width="1125" x="20" y="10">Extracting the min and max date on your data set and calculating teh amount of days the series need to create<br></description> <description align="center" color="yellow" colored="false" height="442" resized="true" width="892" x="31" y="252">Generating a Data Set that includes all teh dates that are covered by your information and joining with your original DataSet</description> </process> </operator> </process>
1
Answers
-
Hi @pedrodomingosdv,
Have you tried the Replace Missing Values (Series) operator of the Time Series module ?
Hope it helps,
Regards,
Lionel
0 -
@lionelderkrikor
This example can get you started.
Copy the value of the Time_lapse macro generated by the Days operator I used.
@IngoRM or @mschmitz is there a way to use a Macro Generated, type integer, as a parameter on an operator ? I didn´t find a way of doing it.<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.1.000" expanded="true" height="82" name="Your Data Set" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (range)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Days" value="2018-01-01 00:00:00.2019-01-01 00:00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="DAY" value="2019-01-01 00:00:00.1.day"/> </list> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="238"> <parameter key="generator_type" value="numeric_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"> <parameter key="Value" value="linear.0\.0.1\.0"/> </list> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (3)" width="90" x="179" y="238"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join" width="90" x="313" y="136"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="inner"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="id" value="id"/> </list> <parameter key="keep_both_join_attributes" value="false"/> </operator> <operator activated="true" class="numerical_to_polynominal" compatibility="9.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="filter_examples" compatibility="9.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="id.is_not_in.2;7"/> </list> <parameter key="filters_logic_and" value="false"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value="DAY"/> <parameter key="attributes" value="|Value|DAY"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="remember" compatibility="9.1.000" expanded="true" height="68" name="Remember" width="90" x="715" y="34"> <parameter key="name" value="DataSet"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <connect from_op="Create ExampleSet (range)" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Generate ID (3)" to_port="example set input"/> <connect from_op="Generate ID (3)" from_port="example set output" to_op="Join" to_port="right"/> <connect from_op="Join" from_port="join" to_op="Numerical to Polynominal" to_port="example set input"/> <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Remember" to_port="store"/> <connect from_op="Remember" from_port="stored" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort" width="90" x="313" y="34"> <parameter key="attribute_name" value="DAY"/> <parameter key="sorting_direction" value="increasing"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal" width="90" x="447" y="34"> <parameter key="attribute_name" value="DAY"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="136"/> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day" width="90" x="581" y="34"> <parameter key="macro" value="min_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="DAY"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort (2)" width="90" x="715" y="34"> <parameter key="attribute_name" value="id"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Max_Day" width="90" x="849" y="34"> <parameter key="macro" value="max_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="max"/> <parameter key="attribute_name" value="DAY"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="generate_macro" compatibility="9.1.000" expanded="true" height="82" name="Days" width="90" x="983" y="34"> <list key="function_descriptions"> <parameter key="Time_lapse" value="date_diff(date_parse_custom(%{min_day},"dd/MM/yyyy"),date_parse_custom(%{max_day},"dd/MM/yyyy"))/(1000*60*60*24)"/> </list> <description align="center" color="green" colored="true" width="126">Use this value as the number of exampples on your Create Example Set</description> </operator> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="289"> <list key="attribute_values"> <parameter key="Min_Inicial" value="%{min_day}"/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="nominal_to_date" compatibility="9.1.000" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_type" value="date"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="adjust_date" compatibility="9.1.000" expanded="true" height="82" name="Adjust Date" width="90" x="313" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <list key="adjustments"> <parameter key="1" value="Day"/> </list> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (2)" width="90" x="447" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day (2)" width="90" x="581" y="289"> <parameter key="macro" value="min_day2"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="442"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Series" value="%{min_day} 00:00.%{max_day} 00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="Day_in_series" value="%{min_day2} 00:00.1.day"/> </list> <parameter key="date_format" value="dd/MM/yyyy HH:mm"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (4)" width="90" x="246" y="442"> <parameter key="attribute_name" value="Day_in_series"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join Info" width="90" x="514" y="442"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="left"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="Day_in_series" value="DAY"/> </list> <parameter key="keep_both_join_attributes" value="false"/> <description align="center" color="orange" colored="true" width="126">Joining original Data with the date Series to find missings</description> </operator> <connect from_op="Your Data Set" from_port="out 1" to_op="Generate ID (2)" to_port="example set input"/> <connect from_op="Generate ID (2)" from_port="example set output" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Date to Nominal" to_port="example set input"/> <connect from_op="Date to Nominal" from_port="example set output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Min_Day" to_port="example set"/> <connect from_op="Multiply" from_port="output 2" to_op="Join Info" to_port="right"/> <connect from_op="Min_Day" from_port="example set" to_op="Sort (2)" to_port="example set input"/> <connect from_op="Sort (2)" from_port="example set output" to_op="Max_Day" to_port="example set"/> <connect from_op="Max_Day" from_port="example set" to_op="Days" to_port="through 1"/> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Date" to_port="example set input"/> <connect from_op="Nominal to Date" from_port="example set output" to_op="Adjust Date" to_port="example set input"/> <connect from_op="Adjust Date" from_port="example set output" to_op="Date to Nominal (2)" to_port="example set input"/> <connect from_op="Date to Nominal (2)" from_port="example set output" to_op="Min_Day (2)" to_port="example set"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Date to Nominal (4)" to_port="example set input"/> <connect from_op="Date to Nominal (4)" from_port="example set output" to_op="Join Info" to_port="left"/> <connect from_op="Join Info" from_port="join" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="center" color="red" colored="true" height="241" resized="true" width="983" x="164" y="10">Extracting the min and max date on your data set and calculating teh amount of days the series need to create<br/></description> <description align="center" color="yellow" colored="false" height="442" resized="true" width="892" x="31" y="252">Generating a Data Set that includes all teh dates that are covered by your information and joining with your original DataSet</description> </process> </operator> </process>
5 -
Hi guys,
Thanks for your replies.
@MarcoBarradas I think that your proposal it is more close to what I need.
Though I'm still struggling to make it fit in my process.
Regards,
Pedro0 -
Ok let me know if there is a way in which I can help.
I gues you speak spanish if easier for you we can change languages.5 -
Hi Marco, I'm Portuguese and looking to your name I guess that you are too
Sorry for the late reply, but I've been away from desk.
I tried to adapt the process you supplied, but with no success at all.
Two questions:
1) Enclosed the "adapted" process. What am I doing wrong? I feel that I need a couple of spare hours to understand the all process and that's why I did few adaptations.
2) Being able to have 1) correct, how can I apply that to fill my time series?
Is the output supposed to be already the time series with no gaps?
The output I'm getting doesn't seem to me to be correct.
Obrigado,
Pedro0 -
Hi @pedrodomingosdv I'm actually from Mexico so I guess we need to stick with English then.
I've made some changes to the process and connected it with the mock file you gave me. In my example the date attribute was named as DAY.
Also I'm attaching a picture of the process. and Added a breakpoint on the Days Operator please check what value is thrown to the Macro at that point and place that value on the Create Example Set operator.
sgenzer do you know how I can set the number of examples of the create example through a macro? I've tried with set macro (real) but id does not work.<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="136"> <parameter key="excel_file" value="C:\Users\mbarradas\Downloads\XL Mock File.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="DATE.true.date.attribute"/> <parameter key="1" value="SCORE.true.real.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort" width="90" x="313" y="34"> <parameter key="attribute_name" value="DATE"/> <parameter key="sorting_direction" value="increasing"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal" width="90" x="447" y="34"> <parameter key="attribute_name" value="DATE"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="136"/> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day" width="90" x="581" y="34"> <parameter key="macro" value="min_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="DATE"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="sort" compatibility="9.1.000" expanded="true" height="82" name="Sort (2)" width="90" x="715" y="34"> <parameter key="attribute_name" value="id"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Max_Day" width="90" x="849" y="34"> <parameter key="macro" value="max_day"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="max"/> <parameter key="attribute_name" value="DATE"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" breakpoints="after" class="generate_macro" compatibility="9.1.000" expanded="true" height="82" name="Days" width="90" x="983" y="34"> <list key="function_descriptions"> <parameter key="Time_lapse" value="(date_diff(date_parse_custom(%{min_day},"dd/MM/yyyy"),date_parse_custom(%{max_day},"dd/MM/yyyy"))/(1000*60*60*24))+1"/> </list> <description align="center" color="green" colored="true" width="126">Use this value as the number of exampples on your Create Example Set</description> </operator> <operator activated="false" class="subprocess" compatibility="9.1.000" expanded="true" height="82" name="Your Data Set" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (range)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Days" value="2018-01-01 00:00:00.2019-01-01 00:00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="DAY" value="2019-01-01 00:00:00.1.day"/> </list> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="238"> <parameter key="generator_type" value="numeric_series"/> <parameter key="number_of_examples" value="10"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"> <parameter key="Value" value="linear.0\.0.1\.0"/> </list> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_id" compatibility="9.1.000" expanded="true" height="82" name="Generate ID (3)" width="90" x="179" y="238"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join" width="90" x="313" y="136"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="inner"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="id" value="id"/> </list> <parameter key="keep_both_join_attributes" value="false"/> </operator> <operator activated="true" class="numerical_to_polynominal" compatibility="9.1.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="filter_examples" compatibility="9.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="custom_filters"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"> <parameter key="filters_entry_key" value="id.is_not_in.2;7"/> </list> <parameter key="filters_logic_and" value="false"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value="DAY"/> <parameter key="attributes" value="|Value|DAY"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="remember" compatibility="9.1.000" expanded="true" height="68" name="Remember" width="90" x="715" y="34"> <parameter key="name" value="DataSet"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <connect from_op="Create ExampleSet (range)" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Generate ID (3)" to_port="example set input"/> <connect from_op="Generate ID (3)" from_port="example set output" to_op="Join" to_port="right"/> <connect from_op="Join" from_port="join" to_op="Numerical to Polynominal" to_port="example set input"/> <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Remember" to_port="store"/> <connect from_op="Remember" from_port="stored" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="generate_data_user_specification" compatibility="9.1.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="289"> <list key="attribute_values"> <parameter key="Min_Inicial" value="%{min_day}"/> </list> <list key="set_additional_roles"/> </operator> <operator activated="true" class="nominal_to_date" compatibility="9.1.000" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_type" value="date"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="adjust_date" compatibility="9.1.000" expanded="true" height="82" name="Adjust Date" width="90" x="313" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <list key="adjustments"> <parameter key="1" value="Day"/> </list> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (2)" width="90" x="447" y="289"> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="extract_macro" compatibility="9.1.000" expanded="true" height="68" name="Min_Day (2)" width="90" x="581" y="289"> <parameter key="macro" value="min_day2"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="min"/> <parameter key="attribute_name" value="Min_Inicial"/> <parameter key="example_index" value="1"/> <list key="additional_macros"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="442"> <parameter key="generator_type" value="date_series"/> <parameter key="number_of_examples" value="736"/> <parameter key="use_stepsize" value="true"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Series" value="%{min_day} 00:00.%{max_day} 00:00"/> </list> <list key="date_series_configuration (interval)"> <parameter key="Day_in_series" value="%{min_day2} 00:00.1.day"/> </list> <parameter key="date_format" value="dd/MM/yyyy HH:mm"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="date_to_nominal" compatibility="9.1.000" expanded="true" height="82" name="Date to Nominal (4)" width="90" x="246" y="442"> <parameter key="attribute_name" value="Day_in_series"/> <parameter key="date_format" value="dd/MM/yyyy"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="keep_old_attribute" value="false"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.1.000" expanded="true" height="82" name="Join Info" width="90" x="514" y="442"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="left"/> <parameter key="use_id_attribute_as_key" value="false"/> <list key="key_attributes"> <parameter key="Day_in_series" value="DATE"/> </list> <parameter key="keep_both_join_attributes" value="false"/> <description align="center" color="orange" colored="true" width="126">Joining original Data with the date Series to find missings</description> </operator> <connect from_op="Read Excel" from_port="output" to_op="Generate ID (2)" to_port="example set input"/> <connect from_op="Generate ID (2)" from_port="example set output" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Date to Nominal" to_port="example set input"/> <connect from_op="Date to Nominal" from_port="example set output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Min_Day" to_port="example set"/> <connect from_op="Multiply" from_port="output 2" to_op="Join Info" to_port="right"/> <connect from_op="Min_Day" from_port="example set" to_op="Sort (2)" to_port="example set input"/> <connect from_op="Sort (2)" from_port="example set output" to_op="Max_Day" to_port="example set"/> <connect from_op="Max_Day" from_port="example set" to_op="Days" to_port="through 1"/> <connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Date" to_port="example set input"/> <connect from_op="Nominal to Date" from_port="example set output" to_op="Adjust Date" to_port="example set input"/> <connect from_op="Adjust Date" from_port="example set output" to_op="Date to Nominal (2)" to_port="example set input"/> <connect from_op="Date to Nominal (2)" from_port="example set output" to_op="Min_Day (2)" to_port="example set"/> <connect from_op="Create ExampleSet" from_port="output" to_op="Date to Nominal (4)" to_port="example set input"/> <connect from_op="Date to Nominal (4)" from_port="example set output" to_op="Join Info" to_port="left"/> <connect from_op="Join Info" from_port="join" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="center" color="red" colored="true" height="242" resized="true" width="1125" x="20" y="10">Extracting the min and max date on your data set and calculating teh amount of days the series need to create<br></description> <description align="center" color="yellow" colored="false" height="442" resized="true" width="892" x="31" y="252">Generating a Data Set that includes all teh dates that are covered by your information and joining with your original DataSet</description> </process> </operator> </process>
1 -
Hi Marco,
Done
I two small made a few changes to your process:
1) "Adjust Date" was removed. That as adding one day to each row and in the end it was causing an incorrect "join"
2) In the last part of the process I just added "Nominal to Date" to have a dates
0 -
hi @MarcoBarradas I'm checking with @mschmitz on the macro in Create ExampleSet...1