"Probability of mean being the same in first and second part of series?"
wessel
New Altair Community Member
Assume a series of 1000 data points. The mean of the first 500 points is 2, with s.d. 1. What is the probability of finding mean > 0, on the second 500 data points?
Extra information:
The first 500 points are approximately normally distributed.
The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)
Extra information:
The first 500 points are approximately normally distributed.
The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)
Tagged:
0
Answers
-
This is actually a more complex question than I thought upon initial inspection.
Knowledge of the first 500 in the series effects knowledge about the rest of the series.
Your question relates to the probability that the mean is > 0 and this requires several tools to answer: Bayes theorum, a truth table and the characteristics of your normal distribution.
1. Truth table: the polarity of the mean of the outcomes of the two halves of your distribution can be: +,- ; +, + ; -, - ; -, +
However, the question restricts these outcomes to +, - ; +, + ; -, + since the second half has to be > 0
Thus, the prior odds of the series is 1/3
2. There are conditional probabilities for polarity outcomes within the data based upon the evidence from your distribution:
P(+) : the SD is 1, meaning < 0 would fall > 2 SD away from mean (roughly 97% of data would likely fall above 0 given the distribution you have listed). Thus, P(+) = .97
P(-) = 1 - P(+) = 0.03
3. Now reason using the probabilities and the outcomes:
The probability that the second half of the data is positive = prior odds that positive outcome will occur * probability of obtaining your series outcome
= P(+|+) = P(+) * p(+|+)
~ (.97)*(.33)
~ .3201 or about a 32% chance.
So, I believe this is your answer.
regards,
rk
0 -
There was an error in my previous post. I believe that I was correct in stating that the odds of the series outcome is 1/3 and the odds for the mean to be positive is .97. However, I conditioned the results wrong. Through normalization we can see that the true answer is ~94%:
alpha<[(1/3)*(.97^2)],[(2/3)*(.97*.03)]> = <94.17,5.82>0 -
Hey, I believe you can only calculate it like this if you assume that points are independently distributed.
I believe the true answer is:
"Not enough information to give an answer"
Best regards,
Wessel0 -
This approach assumes they are conditionally independent...0
-
Then I do not understand what you are doing.
I think you should look at the trend of the 500 data points.
The mean and variance are not sufficient statistics to say something useful about this trend.
Also I do not understand why you would want to restrict your values to + and -.
The data generated are real numbers.
Possibly by a random walk process.
http://en.wikipedia.org/wiki/Random_walk
If it were random walk, only the last value of the 500, would say something meaningful.
The mean and variance would be useless statistics.
Best regards,
Wessel0