A program to recognize and reward our most engaged community members
I am looking to do a normalization also within an attribute and wonder if I need the same operators as Ruca.
I have a list of product groups and prices. Since prices can vary between a few $/item and up to $1M/item, I need to normalize. Rather than normalizing by ID, I would normalize by product group such that all products of group 1 are normalized to each other, group 2 to each other, etc. so I can make meaningful comparisons of product groups with price. My data are like:
ID group price1 group1 12 group2 103 group3 10004 group1 25 group2 206 group3 2000
Group is a polynominal attribute; price is numerical. Of course I don't want to normalize all prices against each other so if I can use the looping operator to normalize each group at a time that would be awesome.
Thanks!
Scott
I'm not sure if I understand what you want the output to look like, but here is an example of statistical standardization using the Altair Personal SLC.
/* Step 1: Create example dataset */ data products; input ID group $ price; datalines; 1 group1 1 2 group2 10 3 group3 1000 4 group1 2 5 group2 20 6 group3 2000 ; run;
/* Step 2: Sort data by group for BY processing */ proc sort data=products; by group; run;
/* Step 3: Use PROC STDIZE to normalize prices by group */ proc stdize data=products out=products_norm method=std oprefix=orig_ sprefix=std_;; by group; var price; run;
/* Step 4: View normalized prices */ proc print data=products_norm noobs; run;
ORIG_ID ORIG_GROUP ORIG_PRICE STD_PRICE
1 group1 1 -0.707106781 4 group1 2 0.7071067812 2 group2 10 -0.707106781 5 group2 20 0.7071067812 3 group3 1000 -0.707106781 6 group3 2000 0.7071067812
6733 6734 /* Step 1: Create example dataset */ 6735 data products; 6736 input ID group $ price; 6737 datalines;
NOTE: Data set "WORK.products" has 6 observation(s) and 3 variable(s) NOTE: The data step took : real time : 0.005 cpu time : 0.015
6738 1 group1 1 6739 2 group2 10 6740 3 group3 1000 6741 4 group1 2 6742 5 group2 20 6743 6 group3 2000 6744 ; 6745 run; 6746 6747 /* Step 2: Sort data by group for BY processing */ 6748 proc sort data=products; 6749 by group; 6750 run; NOTE: Automatically set SORTSIZE to 10240MiB NOTE: 6 observations were read from "WORK.products" NOTE: Data set "WORK.products" has 6 observation(s) and 3 variable(s) NOTE: Procedure sort step took : real time : 0.025 cpu time : 0.000
6751 6752 /* Step 3: Use PROC STDIZE to normalize prices by group */ 6753 proc stdize data=products out=products_norm method=std oprefix=orig_ sprefix=std_;; 6754 by group; 6755 var price; 6756 run; NOTE: Data set "WORK.products_norm" has 6 observation(s) and 4 variable(s) NOTE: Procedure stdize step took : real time : 0.012 cpu time : 0.000
6757 6758 /* Step 4: View normalized prices */ 6759 proc print data=products_norm noobs; 6760 run; NOTE: 6 observations were read from "WORK.products_norm" NOTE: Procedure print step took : real time : 0.020 cpu time : 0.015