Sales Forecasting using ARIMA
ScottBett8
New Altair Community Member
Hi everyone,
I'm still new to using RapidMiner and having problems when trying to deploy a sales forecast model using ARIMA. My data is a one-year sales transaction (60,000+ records).
Label: Tonnage
The purpose is to forecast tonnage but need to know the product group (Group_Name) column
For example, the forecast should be like this (assume - maybe need an additional column for more information)
I try to follow Dr. Fabian Temme's video about time series forecasting but still no luck.
Please help.
Best regards,
ScottBett
I'm still new to using RapidMiner and having problems when trying to deploy a sales forecast model using ARIMA. My data is a one-year sales transaction (60,000+ records).
Label: Tonnage
The purpose is to forecast tonnage but need to know the product group (Group_Name) column
For example, the forecast should be like this (assume - maybe need an additional column for more information)
I try to follow Dr. Fabian Temme's video about time series forecasting but still no luck.
Please help.
Best regards,
ScottBett
Tagged:
0
Answers
-
Hello, @ScottBett8
Sorry nobody has chimed in. Do you still have issues with ARIMA forecasting? I might be able to help.
All the best,
Rod.0 -
Hi @ScottBett8,
You could use a Group Into Collection and a Loop Collection operator for your use case.
It seems that you need to forecast the Qty on your problems since the Tonage is the result of Qty (variable a predictable) and Price (you may already have those prices or may need to predict ahead of time)
You'll need to install the operator toolbox extension to get access to the Group Into Collection operator.
I would also suggest you try the Forecasting Extension.
https://community.rapidminer.com/discussion/comment/66543#Comment_66543
If you want access to more time series training log into our free course
https://academy.rapidminer.com/learn/course/time-series-analytics/time-series-analytics/data-preparation-and-analysis<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.9.002" expanded="true" height="82" name="Fake_Data" width="90" x="112" y="34"> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Customers" width="90" x="45" y="34"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="Customer Customer_A Customer_B Customer_C Customer_D Customer_E Customer_F"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="loop_examples" compatibility="9.9.002" expanded="true" height="103" name="Loop Examples" width="90" x="179" y="34"> <parameter key="iteration_macro" value="example"/> <process expanded="true"> <operator activated="true" class="extract_macro" compatibility="9.9.002" expanded="true" height="68" name="Extract Macro" width="90" x="45" y="34"> <parameter key="macro" value="customer"/> <parameter key="macro_type" value="data_value"/> <parameter key="statistics" value="average"/> <parameter key="attribute_name" value="Customer"/> <parameter key="example_index" value="%{example}"/> <list key="additional_macros"/> </operator> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Days" width="90" x="179" y="34"> <parameter key="generator_type" value="date series"/> <parameter key="number_of_examples" value="365"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"> <parameter key="Date" value="2020-01-01.2020-12-31"/> </list> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_data" compatibility="9.9.002" expanded="true" height="68" name="Sales" width="90" x="179" y="136"> <parameter key="target_function" value="random"/> <parameter key="number_examples" value="365"/> <parameter key="number_of_attributes" value="1"/> <parameter key="attributes_lower_bound" value="20.0"/> <parameter key="attributes_upper_bound" value="150.0"/> <parameter key="gaussian_standard_deviation" value="10.0"/> <parameter key="largest_radius" value="10.0"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.9.002" expanded="true" height="82" name="Select Attributes (3)" width="90" x="313" y="136"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="label"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="true"/> </operator> <operator activated="true" class="blending:rename" compatibility="9.9.002" expanded="true" height="82" name="Rename" width="90" x="447" y="136"> <list key="rename attributes"> <parameter key="att1" value="Qty"/> </list> <parameter key="from_attribute" value=""/> <parameter key="to_attribute" value=""/> </operator> <operator activated="true" class="operator_toolbox:merge" compatibility="2.11.000" expanded="true" height="103" name="Merge Attributes (2)" width="90" x="648" y="85"> <parameter key="handling_of_duplicate_attributes" value="rename"/> <parameter key="handling_of_special_attributes" value="keep_first_special_other_regular"/> <parameter key="handling_of_duplicate_annotations" value="rename"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.9.002" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="782" y="85"> <list key="function_descriptions"> <parameter key="Customer" value="%{customer}"/> <parameter key="Qty" value="round(Qty,0)"/> </list> <parameter key="keep_all" value="true"/> </operator> <connect from_port="example set" to_op="Extract Macro" to_port="example set"/> <connect from_op="Days" from_port="output" to_op="Merge Attributes (2)" to_port="example set 1"/> <connect from_op="Sales" from_port="output" to_op="Select Attributes (3)" to_port="example set input"/> <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Rename" to_port="example set input"/> <connect from_op="Rename" from_port="example set output" to_op="Merge Attributes (2)" to_port="example set 2"/> <connect from_op="Merge Attributes (2)" from_port="merged set" to_op="Generate Attributes (5)" to_port="example set input"/> <connect from_op="Generate Attributes (5)" from_port="example set output" to_port="output 1"/> <portSpacing port="source_example set" spacing="0"/> <portSpacing port="sink_example set" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="append" compatibility="9.9.002" expanded="true" height="82" name="Append" width="90" x="313" y="34"> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> <parameter key="merge_type" value="all"/> </operator> <operator activated="true" class="blending:sort" compatibility="9.9.002" expanded="true" height="82" name="Sort" width="90" x="447" y="34"> <list key="sort_by"> <parameter key="Date" value="ascending"/> <parameter key="Customer" value="ascending"/> </list> </operator> <connect from_op="Customers" from_port="output" to_op="Loop Examples" to_port="example set"/> <connect from_op="Loop Examples" from_port="output 1" to_op="Append" to_port="example set 1"/> <connect from_op="Append" from_port="merged set" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.11.000" expanded="true" height="82" name="Group Into Collection" width="90" x="313" y="34"> <parameter key="group_by_attribute" value="Customer"/> <parameter key="group_by_attribute (numerical)" value=""/> <parameter key="sorting_order" value="alphabetical"/> </operator> <operator activated="true" class="loop_collection" compatibility="9.9.002" expanded="true" height="103" name="Loop Collection" width="90" x="514" y="34"> <parameter key="set_iteration_macro" value="false"/> <parameter key="macro_name" value="iteration"/> <parameter key="macro_start_value" value="1"/> <parameter key="unfold" value="false"/> <process expanded="true"> <operator activated="true" class="time_series:arima_trainer" compatibility="9.9.002" expanded="true" height="103" name="ARIMA" width="90" x="112" y="34"> <parameter key="time_series_attribute" value="Qty"/> <parameter key="has_indices" value="true"/> <parameter key="indices_attribute" value="Date"/> <parameter key="p:_order_of_the_autoregressive_model" value="1"/> <parameter key="d:_degree_of_differencing" value="0"/> <parameter key="q:_order_of_the_moving-average_model" value="1"/> <parameter key="estimate_constant" value="true"/> <parameter key="main_criterion" value="aic"/> </operator> <operator activated="true" class="time_series:apply_forecast" compatibility="9.9.002" expanded="true" height="82" name="Apply Forecast" width="90" x="313" y="34"> <parameter key="forecast_horizon" value="12"/> <parameter key="add_original_time_series" value="true"/> <parameter key="add_combined_time_series" value="true"/> </operator> <connect from_port="single" to_op="ARIMA" to_port="example set"/> <connect from_op="ARIMA" from_port="forecast model" to_op="Apply Forecast" to_port="forecast model"/> <connect from_op="Apply Forecast" from_port="example set" to_port="output 2"/> <portSpacing port="source_single" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> <portSpacing port="sink_output 3" spacing="0"/> </process> </operator> <connect from_op="Fake_Data" from_port="out 1" to_op="Group Into Collection" to_port="exa"/> <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/> <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
0 -
Hi @MarcoBarradas,
Sorry for the late reply. Got caught up in my work for weeks.
I tried your suggestion. It was a good idea to try your approach. Qty unit of measure is in Kg (kilograms) and Tonase is converted to Tons. After I apply your XML, indeed 14 product groups are generated. But the forecast is empty.
I also attach the XMLI tried using the Forecast Validation operator but cannot run.<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="-1"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="SYSTEM"/><process expanded="true"><operator activated="true" class="retrieve" compatibility="9.9.002" expanded="true" height="68" name="Retrieve Data_2020" width="90" x="45" y="34"><parameter key="repository_entry" value="Data_2020"/></operator><operator activated="true" class="select_attributes" compatibility="9.9.002" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value=""/><parameter key="attributes" value="Billing Date|Business|Tonase"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator><operator activated="true" class="blending:sort" compatibility="9.9.002" expanded="true" height="82" name="Sort" width="90" x="313" y="34"><list key="sort_by"><parameter key="Billing Date" value="ascending"/></list></operator><operator activated="true" class="aggregate" compatibility="9.9.002" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34"><parameter key="use_default_aggregation" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="default_aggregation_function" value="average"/><list key="aggregation_attributes"><parameter key="Tonase" value="average"/></list><parameter key="group_by_attributes" value="Billing Date|Business"/><parameter key="count_all_combinations" value="false"/><parameter key="only_distinct" value="false"/><parameter key="ignore_missings" value="true"/></operator><operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.11.000" expanded="true" height="82" name="Group Into Collection" width="90" x="581" y="34"><parameter key="group_by_attribute" value="Business"/><parameter key="group_by_attribute (numerical)" value=""/><parameter key="sorting_order" value="alphabetical"/></operator><operator activated="true" class="loop_collection" compatibility="9.9.002" expanded="true" height="82" name="Loop Collection" width="90" x="715" y="34"><parameter key="set_iteration_macro" value="false"/><parameter key="macro_name" value="iteration"/><parameter key="macro_start_value" value="1"/><parameter key="unfold" value="false"/><process expanded="true"><operator activated="true" class="time_series:arima_trainer" compatibility="9.9.002" expanded="true" height="103" name="ARIMA" width="90" x="112" y="34"><parameter key="time_series_attribute" value="average(Tonase)"/><parameter key="has_indices" value="true"/><parameter key="indices_attribute" value="Billing Date"/><parameter key="p:_order_of_the_autoregressive_model" value="1"/><parameter key="d:_degree_of_differencing" value="0"/><parameter key="q:_order_of_the_moving-average_model" value="1"/><parameter key="estimate_constant" value="true"/><parameter key="main_criterion" value="aic"/></operator><operator activated="true" class="time_series:apply_forecast" compatibility="9.9.002" expanded="true" height="82" name="Apply Forecast" width="90" x="313" y="34"><parameter key="forecast_horizon" value="12"/><parameter key="add_original_time_series" value="true"/><parameter key="add_combined_time_series" value="true"/></operator><connect from_port="single" to_op="ARIMA" to_port="example set"/><connect from_op="ARIMA" from_port="forecast model" to_op="Apply Forecast" to_port="forecast model"/><connect from_op="Apply Forecast" from_port="example set" to_port="output 1"/><portSpacing port="source_single" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/></process></operator><connect from_op="Retrieve Data_2020" from_port="output" to_op="Select Attributes" to_port="example set input"/><connect from_op="Select Attributes" from_port="example set output" to_op="Sort" to_port="example set input"/><connect from_op="Sort" from_port="example set output" to_op="Aggregate" to_port="example set input"/><connect from_op="Aggregate" from_port="example set output" to_op="Group Into Collection" to_port="exa"/><connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/><connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/></process></operator></process>
Regards,
ScottBett80 -
Hi @ScottBett8,
Mmmh, it is expected that the first rows are not defined.
But can you confirm that the entire column "forecast of average (Tonase)" contains only interrogation marks ?
Moreover can you share your dataset in order we can run the process, see what is going on and fix it ?
Regards,
Lionel0 -
Hi @lionelderkrikor
Actually, not the entire column contains interrogation marks. Only part of it. But anyway, after searching and reading replies in the forum, I manage to complete the forecasting using Deep Learning and Random Forest model.Just one last question if anybody can help or explain. When using the Windowing operator, the relative error is around 400%. Later, when not using the Windowing operator, the relative error is around <10%. I am still confused about the result.Regards,ScottBett
0