As best as I can determine, the only difference in these calculations in 2007
vs earlier versions is the number of available rows. Everything else follows
from order of evaluation.
For A1:A1048576, the data is in a single column, so all three formulas sum
from smallest to largest, which generally minimizes truncation error. The
resulting sum is 384307717958019968 = (93825126454594*2^6+46)*2^6. All
versions of Excel give the reported DEVSQ when applied to (x,x,x) with x
equal to this value. Surprisingly, STDEV also gives the reported result in
all versions, despite the algorithm change in 2003.
For A1:AF32768, the formulas sum across each row in turn, which has
different truncation properties. The resulting sum (in either 2007 beta or
earlier versions) is 384307717958225920 = 93825126454645*2^12. All versions
of Excel give zero for DEVSQ and STDEV when applied to (x,x,x) with x equal
to this value.
The exact (unlimited precision) result of the summation is (2*n+1)*(n+1)*n/6
= 384307717958270976 = 733008800427*2^19 which is exactly representable in
IEEE double precision, although intermediates are not. If intermediate
partial sums were retained in 10 byte registers, then this value should be
the result of the summation. Therefore it appears that no version of Excel
has ever retained intermediate partial sums in 10-byte registers for SUM,
SUMPRODUCT, or SUMSQ.
The issue that the current algorithm for AVERAGE (and therefore DEVSQ,
STDEV, etc) need not be exact just because all values are equal, has been
discussed in previous threads that we have participated in. If AVERAGE (and
therefore DEVSQ) used 10-byte register storage of intermediates, then
AVERAGE(x,x,x) would always equal x, and hence DEVSQ(x,x,x) would always
equal zero. Therefore it appears that no version of Excel has ever retained
intermediates in 10-byte registers for AVERAGE or DEVSQ.
The increased worksheet size raises the potential for these anomalies to
crop up more often. IMHO the larger worksheet size strengthens the need for
MS to either store intermediates in 10-byte registers or else switch to
updating algorithms for these calculations, but they have not done either.
Interestingly, version 2.00 of Stephen Bye’s Spread32 program
http://www.byedesign.freeserve.co.uk/
has switched to updating algorithms and so returns zero for DEVSQ(x,x,x)
and STDEV(x,x,x) for with any value of x. He also has probability functions
that approach the accuracy and working range of Ian Smith’s VBA library, so
Spread32 now sets the standard for accuracy of native worksheet statistical
functions.
Jerry
"Harlan Grove" wrote:
> Jerry W. Lewis wrote...
> >=C1=C2
> >is a less reliable comparison than
> > =C1-C2
> >which has been subject to a fuzz factor since Excel 97
> > http://support.microsoft.com/kb/78113
> >The only reliable test of exact equality is whether
> > =(C1-C2) is exactly zero.
> ....
>
> I accept this, but my concern is that XL12 beta shows nonzero results
> for STDEV and DEVSQ of 3 different calculations of what mathematically
> are the same thing, the sum of squares of the integers from 1 to 2^20
> given by SUMPRODUCT of an array raised to the 2nd power, SUM of a range
> of what should be the same values, and SUMSQ of the same unraised array
> as used for SUMPRODUCT. Those calculations give *different* results
> than XL11, but the same result as a simple gawk script in which all
> intermediate calculations are stored in 64-bit double precision. My
> concern is that XL11 and prior seem to have taked advantage of 80-bit
> extended precision FPU registers for intermediate calculations inside
> built-in function calls, but XL12 beta seems to behave in this
> particular case like the gawk script, i.e., as if all intermediate
> calculations are only in double precision.
>
>
Bookmarks