Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"Probability of mean being the same in first and second part of series?"
wessel
Assume a series of 1000 data points. The mean of the first 500 points is 2, with s.d. 1. What is the probability of finding mean > 0, on the second 500 data points?
Extra information:
The first 500 points are approximately normally distributed.
The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)
Find more posts tagged with
AI Studio
Time Series
Accepted answers
All comments
rakirk
This is actually a more complex question than I thought upon initial inspection.
Knowledge of the first 500 in the series effects knowledge about the rest of the series.
Your question relates to the probability that the mean is > 0 and this requires several tools to answer: Bayes theorum, a truth table and the characteristics of your normal distribution.
1. Truth table: the polarity of the mean of the outcomes of the two halves of your distribution can be: +,- ; +, + ; -, - ; -, +
However, the question restricts these outcomes to +, - ; +, + ; -, + since the second half has to be > 0
Thus, the prior odds of the series is 1/3
2. There are conditional probabilities for polarity outcomes within the data based upon the evidence from your distribution:
P(+) : the SD is 1, meaning < 0 would fall > 2 SD away from mean (roughly 97% of data would likely fall above 0 given the distribution you have listed). Thus, P(+) = .97
P(-) = 1 - P(+) = 0.03
3. Now reason using the probabilities and the outcomes:
The probability that the second half of the data is positive = prior odds that positive outcome will occur * probability of obtaining your series outcome
= P(+|+) = P(+) * p(+|+)
~ (.97)*(.33)
~ .3201 or about a 32% chance.
So, I believe this is your answer.
regards,
rk
rakirk
There was an error in my previous post. I believe that I was correct in stating that the odds of the series outcome is 1/3 and the odds for the mean to be positive is .97. However, I conditioned the results wrong. Through normalization we can see that the true answer is ~94%:
alpha<[(1/3)*(.97^2)],[(2/3)*(.97*.03)]> = <94.17,5.82>
wessel
Hey, I believe you can only calculate it like this if you assume that points are independently distributed.
I believe the true answer is:
"Not enough information to give an answer"
Best regards,
Wessel
rakirk
This approach assumes they are conditionally independent...
wessel
Then I do not understand what you are doing.
I think you should look at the trend of the 500 data points.
The mean and variance are not sufficient statistics to say something useful about this trend.
Also I do not understand why you would want to restrict your values to + and -.
The data generated are real numbers.
Possibly by a random walk process.
http://en.wikipedia.org/wiki/Random_walk
If it were random walk, only the last value of the 500, would say something meaningful.
The mean and variance would be useless statistics.
Best regards,
Wessel
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups