Inferential Statistics - R, Python or Extension
As a partner, I am looking to use RapidMiner to integrate related inferential statistical methods such as hypothesis testing, confidence intervals, chi-square, etc. as part of a client implementation. I see there is a pay-for extension to do this work, but given the simplicity of these methods and unwanted burden of managing a paid for subscription to integrate these methods for only occasional use, is there a no-charge library of operators available, or do I need to just leverage R or Python and create my own? We only need a few methods for occasional use and I'd like to know if there are other options besides R, Python or the pay-for extension? Thanks!
Find more posts tagged with
Sort by:
1 - 8 of
81

Hi Michael,
i've just aded (last thursday) an operator called 'Compare Distributions' to SMILE extension. It provides KS-Test, Chi-Square Test, F-Test and T-Test. Would this already help?
BR,
Martin
Hi Michael,
so the idea is to get the number of std-devs from the mean? I think we don't have it yet.
But, Tukey Test in Operator Toolbox is fairly similar, imo superior. It's defined as:
For each selected attribute a confidence of the Tukey Test is calculated. This confidence is defined as the distance between the current value to the median, divided by the distance of the lower/upper 'Tukey Test boundary' to the median.
So instead of mean and std_dev we take Inter quartile range and median. Median is more robust to outliers than mean, so i and many stats-people prefer it.
Can you have a look at Tukey test? We may just write the same stuff but with mean and std_dev if that's what you need.
Cheers,
Martin
I normally calculate the z test statistic by taking the sample mean (or median) - null hypothesis value (what I'm testing) all divided by the standard error assuming the constraints of the central limit theorem. So, for SE I usually use the sample standard deviation/sq root of samples. I then compare this result with the critical z value (1.65 for a one tail test and level of significance of 5%) to see if I should reject or accept the hypothesis. The math is quite simple, I was just looking for a simple operator to automate the work given how important testing our data and results is to our particular use cases. I believe I can make all of this work with your suggestions above.
Hi @CB123,
in KS test, the KS statistics, p-value will be returned as Dr Martin mentioned above. What is the usual significant level used by you in practice?
in KS test, the KS statistics, p-value will be returned as Dr Martin mentioned above. What is the usual significant level used by you in practice?
The common alpha values (significant level) of 0.05 and 0.01 are simply based on tradition.
When a P value is less than or equal to the significance level, you reject the null hypothesis. If we take the P value from statistical tests and compare it to the common significance levels. For example the P value of 0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01 level.
KStest http://haifengl.github.io/api/java/smile/stat/hypothesis/KSTest.html
Hope it helps.
YY
KStest http://haifengl.github.io/api/java/smile/stat/hypothesis/KSTest.html
Hope it helps.
YY
Sort by:
1 - 1 of
11
I normally calculate the z test statistic by taking the sample mean (or median) - null hypothesis value (what I'm testing) all divided by the standard error assuming the constraints of the central limit theorem. So, for SE I usually use the sample standard deviation/sq root of samples. I then compare this result with the critical z value (1.65 for a one tail test and level of significance of 5%) to see if I should reject or accept the hypothesis. The math is quite simple, I was just looking for a simple operator to automate the work given how important testing our data and results is to our particular use cases. I believe I can make all of this work with your suggestions above.