(Bug?) What definition of AutoCorrelation operator is in valueSeries plugin?

owen
owen New Altair Community Member
edited November 5 in Altair RapidMiner
Hello statistical friends,

I examined the code of
[tt]rapidminer\operator\valueseries\transformations\basis\AutoCorrelation.java[/tt]
so that I could understand the meaning of the three input parameters factor, start, end.

Here is the relevant excerpt from [tt]AutoCorrelation.java[/tt] v5.3.000.
for (int i = start; i < end; i++) {
double differences = 0.0d;
int numberOfValues = 0;
for (int j = 0; j < series.length(); j++) {
int lag = (int) ((double) factor / (double) i);
if ((j + lag) >= series.length())
break;
numberOfValues++;
double difference = series.getValue(j) - series.getValue(j + lag);
differences += (difference * difference);
}
differences /= numberOfValues;

displacements[i - start] = i;
result[i - start] = new Vector(differences);
}
The function appears to calculate an estimate that converges to
[tt]result( i ) = 2 Variance( x ) - 2 Covariance( x( j ), x( j+factor/i ) )[/tt]
.
The term "factor/i" is unfamiliar to me.

To calculate an auto-covariance function of sequence x, I would have expected to see [tt]Cov(x( j ), x( j+factor * i ))[/tt]. There, the purpose of factor is to enable user to control the computational effort by sparsely sampling the lag axis.

A few questions arise for me:
1. Is this a bug? Or is "autocorrelation transformation" something mathematically distinct from the autocovariance of the sequence?
2. Suppose the output was [tt]result(lag) = 2 Var(x) - 2 Cov( x( j ), x( j+lag ) )[/tt]. Is there a reason in machine learning why that expression is more useful than just  [tt]result(lag) = Cov( x( j ), x( j+lag ) )[/tt] ?
3. Where is the public repository for ValueSeries plugin so that I can be sure that my comments are relevant to the latest code?

Thanks and regards,

Owen