"Probability of mean being the same in first and second part of series?"

wessel
wessel New Altair Community Member
edited November 5 in Community Q&A
Assume a series of 1000 data points. The mean of the first 500 points is 2, with s.d. 1. What is the probability of finding mean > 0, on the second 500 data points?
Extra information:
The first 500 points are approximately normally distributed.
The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)

Answers

  • rakirk
    rakirk New Altair Community Member
    This is actually a more complex question than I thought upon initial inspection.

    Knowledge of the first 500 in the series effects knowledge about the rest of the series.

    Your question relates to the probability that the mean is > 0 and this requires several tools to answer: Bayes theorum, a truth table and the characteristics of your normal distribution.

    1. Truth table: the polarity of the mean of the outcomes of the two halves of your distribution can be: +,- ; +, + ; -, - ; -, +
    However, the question restricts these outcomes to +, - ; +, + ; -, + since the second half has to be > 0
    Thus, the prior odds of the series is 1/3

    2. There are conditional probabilities for polarity outcomes within the data based upon the evidence from your distribution:
    P(+) : the SD is 1, meaning < 0 would fall > 2 SD away from mean (roughly 97% of data would likely fall above 0 given the distribution you have listed). Thus, P(+) = .97


    P(-) =  1 - P(+) = 0.03

    3. Now reason using the probabilities and the outcomes:
    The probability that the second half of the data is positive = prior odds that positive outcome will occur * probability of obtaining your series outcome
    = P(+|+) = P(+) * p(+|+)
    ~ (.97)*(.33)
    ~ .3201 or about a 32% chance.

    So, I believe this is your answer.

    regards,

    rk

  • rakirk
    rakirk New Altair Community Member
    There was an error in my previous post. I believe that I was correct in stating that the odds of the series outcome is 1/3 and the odds for the mean to be positive is .97. However, I conditioned the results wrong. Through normalization we can see that the true answer is ~94%:

    alpha<[(1/3)*(.97^2)],[(2/3)*(.97*.03)]> = <94.17,5.82>
  • wessel
    wessel New Altair Community Member
    Hey, I believe you can only calculate it like this if you assume that points are independently distributed.

    I believe the true answer is:
    "Not enough information to give an answer"

    Best regards,

    Wessel
  • rakirk
    rakirk New Altair Community Member
    This approach assumes they are conditionally independent...
  • wessel
    wessel New Altair Community Member
    Then I do not understand what you are doing.

    I think you should look at the trend of the 500 data points.
    The mean and variance are not sufficient statistics to say something useful about this trend.

    Also I do not understand why you would want to restrict your values to + and -.
    The data generated are real numbers.
    Possibly by a random walk process.
    http://en.wikipedia.org/wiki/Random_walk
    image

    If it were random walk, only the last value of the 500, would say something meaningful.
    The mean and variance would be useless statistics.

    Best regards,

    Wessel