Smoothing values
msacs09
New Altair Community Member
I have a following data set and i wanted to smooth the values, so that the outliers (high peaks are smoothed out with the general sample). How can we achieve this tried exponential smoothing with alpha =0.1 and didn't help. The column i need to smooth it out is VALUE
DT | ID | TOTAL | VALUE |
201704 | 1 | 1383860 | 10.95 |
201705 | 1 | 1828848 | 7.81 |
201706 | 1 | 1520265 | 11.86 |
201707 | 1 | 1823590 | 12.44 |
201708 | 1 | 1371700 | 9 |
201709 | 1 | 1117848 | 8.33 |
201710 | 1 | 1261479 | 6.73 |
201711 | 1 | 1673933 | 16.62 |
201711 | 2 | 561574 | 1316.41 |
201712 | 2 | 454604 | 15.53 |
201801 | 2 | 671094 | 32.87 |
201802 | 2 | 648078 | 22.48 |
201803 | 2 | 664679 | 9.49 |
201804 | 2 | 785520 | 20.59 |
201805 | 2 | 775158 | 18.3 |
201806 | 2 | 658977 | 346.71 |
201807 | 2 | 1075298 | 11.13 |
201808 | 2 | 709763 | 32.56 |
201809 | 2 | 800129 | 17.01 |
201810 | 2 | 817088 | 9.49 |
201811 | 2 | 882086 | 15.36 |
201812 | 2 | 862648 | 41.4 |
201901 | 2 | 891263 | 36.05 |
201902 | 2 | 893897 | 227.04 |
201903 | 2 | 843360 | 131.17 |
201904 | 2 | 559655 | 5.06 |
201905 | 2 | 919638 | 107.18 |
201804 | 3 | 5375.65 | 32.5 |
201805 | 3 | 5344.03 | 168.65 |
201806 | 3 | 5312.43 | 168.1 |
201807 | 3 | 6747.5 | 14.76 |
201808 | 3 | 6715.76 | 14.69 |
201809 | 3 | 5217.5 | 164.88 |
201810 | 3 | 5185.33 | 9.96 |
201811 | 3 | 5153.71 | 9.9 |
201812 | 3 | 5122.06 | 161 |
201901 | 3 | 5090.43 | 160.92 |
201902 | 3 | 5058.79 | 159.89 |
201903 | 3 | 5027.07 | 158.77 |
201904 | 3 | 4995.42 | 157.73 |
201905 | 3 | 4963.74 | 156.67 |
201806 | 4 | 3505205 | 15.49 |
201807 | 4 | 3548029 | 13.18 |
201808 | 4 | 3473771 | 28.84 |
201809 | 4 | 4727992 | 11.6 |
201810 | 4 | 5900274 | 7.53 |
201811 | 4 | 7799624 | 5.41 |
201812 | 4 | 6454174 | 11.22 |
201901 | 4 | 5810997 | 195.27 |
201902 | 4 | 7789077 | 2215.65 |
201903 | 4 | 9437120 | 9.49 |
201904 | 4 | 9575089 | 7.63 |
201905 | 4 | 9442829 | 17.13 |
201801 | 5 | 2128374 | 9 |
201802 | 5 | 2262821 | 12.69 |
201803 | 5 | 1948217 | 38.53 |
201804 | 5 | 2116825 | 549.87 |
201805 | 5 | 1369574 | 4.6 |
201806 | 5 | 1601645 | 13.86 |
Tagged:
0
Answers
-
Hi @msacs09 ,
You could try the Moving Average Filter with either a simple or a binom filter type. Out of interest, what do you mean with the Exponential Smoothing didn't help? Wasn't there an effect at all, or was the smoothing effect too small (and how did you defined what a good smoothing is for you).
If you have only outliers (seems to be in your case) and you are sure that these are outliers in your data and can be removed, you can also try to replace them.
First replace all the outliers by missing values. For example by using a fixed max value for valid values. You could use if (Value < 200.0, Value, MISSING_NUMERICAL) in a Generate Attribute operator, which would replace all values in Value attribute which are larger than 200.0. Then you could use Replace Missing Values (Series) to replace the new missing values.
Hopes this helpes
Fabian
PS: If you want a more sophisticated way to identify the outliers, have a look into outlier detection. For example the Detect Outlier operators or the Tukey Test from the Operator Toolbox.
1 -
Hi @msacs09,the probably confusing thing of exponential smoothing is the effect of different values for the alpha value.The closer the value is to 1, the less you are smoothing you values and the result looks more similar to the original data.In contrast, an alpha value of 0.1 is a very strong smoothing, that can eliminate a lot of characteristics from your data.Another thing is, that in your data the peaks a very extreme, compared to baseline data points, so finding the right amount of smoothing, without loosing to much information can be tricky.Best,David0
-
You can also try capping high outlier values---you can do this with a Generate Attributes and a simple IF expression to replace with a fixed value above a certain threshold.
0 -
Thank you all GREAT suggestions.0