"Calculate confidence interval of RMSE"

wessel
wessel New Altair Community Member
edited November 5 in Community Q&A
Dear All,

I have two forecasting algorithms that output some forecast for the temperature 24 hours a head in time.
Algorithm A uses 1-nearest neighbours.
Algorithm B is a baseline algorithm, and simply outputs the last known temperature value as a prediction.

Lets say I calculate the Mean Squared Error, and the Variance of the Squared Error for A and B on a separate test set with N data points.
Then what is the confidence interval of MSE_A?
And what is the confidence interval of MSE_B?

Best regards,

Wessel

Answers

  • wessel
    wessel New Altair Community Member
    I have solved this problem as following, although I'm not sure it is correct:

    diffErrMean = baseErrMean - predErrMean;
    diffVarMean = baseVarMean + predVarMean;
    varOverSqrtN = diffVarMean / Math.sqrt(N);
    z = diffErrMean / varOverSqrtN;
    z = Math.abs(z);
    upper = diffErrMean + z * diffVarMean
    lower = diffErrMean - z * diffVarMean
    (Where B = baseline = baseErrMean, and A = algorithm = predErrMean)

    I can then print something like:
    N: 13 // number of test points
    Target: "temp"
    Run time: 0.105 ms
    predErrMean: 0.134  predVarMean: 0.067
    baseErrMean: 0.246  baseVarMean: 0.141
    diffErrMean: 0.113 +- 0.058 = [-0.003, 0.228] // kinda weird that this is already nearly significant with only 13 test points
    Ratio:  1.843
  • wessel
    wessel New Altair Community Member
    Okay this does not make any sense.

    You need to use the CDF of the T distribution to convert the z at the 2.5% point.

    But this is hard in Java since there is no easy access to the CDF of the T distribution.

    So for now I think I will assume the normal distribution and use confidence interval = MEAN +- 2 * S.D.

    But then the problem is:
    The differences are not normally distributed.
    The maximum difference possible with algorithm A 0 error, and the baseline some big error, then the big error would be equal to the S.D.
    And nothing would ever be significant.