Doing LinearRegression in a loop? [Solved]
New Altair Community Member
I'm having a problem trying to automate something across a dataset that works fine for subsets.
I want to generate linear regression gradients for the weekly sales of a bunch of products. My input data is of the form:
Product, Week, Quantity
"Product 1", "2012-03-02", 34
"Product 1", "2012-03-09", 72
"Product 2", "2012-03-02", 91
"Product 2", "2012-03-09", 27
I want to generate a resultset that looks like:
Product, Trend_Gradient
Product 1, 39.2
Product 2, 15.2
I have it working well enough for a dataset that contains only the one product's sales data but can't figure out how to loop across the dataset with each loop containing all the entries for one product. Essentially I want to apply the LinearRegression operator in an SQL "GROUP BY Product_ID" type of process.
Any tips?
This is the process I'm trying at the moment though something is wrong and it's probably the loop operator.
I'm having a problem trying to automate something across a dataset that works fine for subsets.
I want to generate linear regression gradients for the weekly sales of a bunch of products. My input data is of the form:
Product, Week, Quantity
"Product 1", "2012-03-02", 34
"Product 1", "2012-03-09", 72
"Product 2", "2012-03-02", 91
"Product 2", "2012-03-09", 27
I want to generate a resultset that looks like:
Product, Trend_Gradient
Product 1, 39.2
Product 2, 15.2
I have it working well enough for a dataset that contains only the one product's sales data but can't figure out how to loop across the dataset with each loop containing all the entries for one product. Essentially I want to apply the LinearRegression operator in an SQL "GROUP BY Product_ID" type of process.
Any tips?
This is the process I'm trying at the moment though something is wrong and it's probably the loop operator.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="762" width="685">
<operator activated="true" class="read_csv" compatibility="5.2.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
<parameter key="csv_file" value="/home/user/Repots/SalesAllProducts/SalesByWeekAllProducts.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value=""/>
<parameter key="1" value=""/>
<parameter key="2" value="Sold.true.numeric.label"/>
<operator activated="true" class="loop_values" compatibility="5.2.006" expanded="true" height="94" name="Loop Values" width="90" x="246" y="75">
<parameter key="attribute" value="Product"/>
<process expanded="true" height="780" width="708">
<operator activated="true" class="series:moving_average" compatibility="5.1.002" expanded="true" height="76" name="Moving Average" width="90" x="45" y="30">
<parameter key="attribute_name" value="Sold"/>
<parameter key="window_width" value="4"/>
<parameter key="ignore_missings" value="true"/>
<parameter key="keep_original_attribute" value="false"/>
<operator activated="true" class="series:replace_missing_series_values" compatibility="5.1.002" expanded="true" height="76" name="Replace Missing Values" width="90" x="179" y="30">
<parameter key="attribute_name" value="moving_average(Sold)"/>
<parameter key="replacement" value="next value"/>
<operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
<parameter key="old_name" value="moving_average(Sold)"/>
<parameter key="new_name" value="Sold"/>
<list key="rename_additional_attributes"/>
<operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
<parameter key="name" value="Sold"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
<operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="581" y="30"/>
<connect from_port="example set" to_op="Moving Average" to_port="example set input"/>
<connect from_op="Moving Average" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="out 1"/>
<connect from_op="Linear Regression" from_port="exampleSet" to_port="out 2"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
<connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
you are missing a Filter Examples operator in the loop. Please see the attached process for an example.
Best, Marius<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="762" width="685">
<operator activated="true" class="generate_data" compatibility="5.2.006" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="random classification"/>
<operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename (2)" width="90" x="179" y="30">
<parameter key="old_name" value="label"/>
<parameter key="new_name" value="product"/>
<list key="rename_additional_attributes"/>
<operator activated="true" class="loop_values" compatibility="5.2.006" expanded="true" height="94" name="Loop Values" width="90" x="313" y="30">
<parameter key="attribute" value="label"/>
<process expanded="true" height="780" width="882">
<operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="product=%{loop_value}"/>
<operator activated="true" class="series:moving_average" compatibility="5.1.002" expanded="true" height="76" name="Moving Average" width="90" x="179" y="30">
<parameter key="attribute_name" value="att1"/>
<parameter key="window_width" value="4"/>
<parameter key="ignore_missings" value="true"/>
<parameter key="keep_original_attribute" value="false"/>
<operator activated="true" class="series:replace_missing_series_values" compatibility="5.1.002" expanded="true" height="76" name="Replace Missing Values" width="90" x="313" y="30">
<parameter key="attribute_name" value="moving_average(att1)"/>
<parameter key="replacement" value="next value"/>
<operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="447" y="30">
<parameter key="old_name" value="moving_average(att1)"/>
<parameter key="new_name" value="att1"/>
<list key="rename_additional_attributes"/>
<operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="581" y="30">
<parameter key="name" value="att1"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
<operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="715" y="30"/>
<connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Moving Average" to_port="example set input"/>
<connect from_op="Moving Average" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="out 1"/>
<connect from_op="Linear Regression" from_port="exampleSet" to_port="out 2"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<connect from_op="Generate Data" from_port="output" to_op="Rename (2)" to_port="example set input"/>
<connect from_op="Rename (2)" from_port="example set output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
<connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>0 -
Aah, thank you Marius. That should do the trick.
My Product ID is polynomial but other than that, your process is pretty straightforward to adapt.