ksvanhorn.com
Home
Bayes Home
Jaynes Errata
Articles
Books
Software
Contact
next up previous

Subsections



Chapter 17: Principles and pathology of orthodox statistics

  • p. 512, first full paragraph: ``...it seems obvious that, at least for large $n$, this has made things worse instead of better.'' I believe ``large'' should be ``small.''

  • p. 513, footnote 2: ``indavertently'' should be ``inadvertently.''

  • p. 518, first line after (17.27), and last line on page: ``Schwartz'' should be ``Schwarz.''

  • p. 519, eqn. (17.30): $d\alpha$ (in the denominator) should be $dx$.

  • p. 519, eqn. (17.32): $p(x\mid a)$ should be $p(x\mid \alpha)$.

  • p. 520, third full paragraph, start of third line: ``Schwartz'' should be ``Schwarz.''

  • p. 530, eqn. (17.65): ``$=$'' should be ``$\propto$''.

  • p. 538, eqn. (17.93): $\overline{s^2 t^2}$ should be $(\overline{s^2})(\overline{t^2})$.

  • p. 541, eqn. (17.107): As far as I can tell, this equation is wrong. The final term of the rightmost expression should be

    \begin{displaymath}
\frac{(\overline{t^2})(\overline{se}) - (\overline{st})(\overline{te})}{
(\overline{s^2})(\overline{t^2})}
\end{displaymath}

    instead of $\overline{se}/\overline{s^2}$.

  • p. 543, first line after (17.118): ``Schwartz'' should be ``Schwarz.''

  • p. 544, eqn. (17.125): subscripts of $t$ and $t'$ are missing from the summation symbol $\sum$; the factor $1/N$ for the middle expression should be $1/N^2$; and the rightmost expression should be $\overline{g^2}\sigma^2/N$.

  • p. 544, eqn. (17.126): $\sqrt{\overline{g^2}}$ should be $\sqrt{\overline{g^2}/N}$.

  • p. 545, eqn. (17.128): The denominator needs to be squared--that is,

    \begin{displaymath}
\mbox{replace}\qquad \overline{s^2}(1-r^2) \qquad\mbox{with}\qquad
\left(\overline{s^2}(1-r^2)\right)^2.
\end{displaymath}

  • p. 545, eqn. (17.129): this should read

    \begin{displaymath}
A_0 \pm \sigma \sqrt{\frac{1}{N\overline{s^2}(1-r^2)}}.
\end{displaymath}

  • p. 545, eqn. (17.130): $\tilde{A}$ should be $A_0$.

  • p. 546, eqn. (17.135): $T_j$, here and in the rest of the section, should be $\Phi_j$. Alternatively, $\Phi_k$ in (17.132) should be replaced with $T_k$.

  • p. 546, eqn. (17.136): I think the series should be $A_1,\ldots,A_6,B_1,\ldots,B_6$, as $A_0$ provides only a constant term, which is subsumed by the trend. Likewise, in (17.137), replace ``$0\leq k\leq 6$'' with ``$1\leq k\leq 6$''.

  • p. 548, middle of page: ``$F$ is the $(N\times n)$ matrix of model functions.'' We called this matrix $G$ on the previous two pages.

  • p. 549, eqn. (17.164): This is redundant; it just repeats eqn. (17.161), which appeared half a page earlier.

  • p. 551, third full paragraph, third-to-last line: ``student'' should be ``Student.''


Miscellaneous Commentary

  • p.517, eqn. (17.23): the normalization constant $1/n!$ is not a typo, even though it is the same as the normalization constant in (17.17). That's right: defining $f(n,l) \equiv \exp(-l)l^n$, we have that both $\sum_{n=0}^\infty f(n,l)$ and $\int_0^\infty f(n,l) dl$ are equal to $n!$.

  • p. 517, eqn. (17.23) and preceding line: Since $l$ is a scale factor, it might seem more reasonable to use an ignorance prior $\propto l^{-1}$ instead of a uniform prior over $l$. The only effect this has is to replace $n$ with $n-1$ in the posterior $\exp(-l)l^n/n!$ and the corresponding posterior expectation formula (17.23).

  • p. 518, footnote 6: This story about Kendall and Jeffreys is probably based on a confusion with G.U. Yule. Kendall was a student at St John's, but he did not become a Fellow. It was Jeffreys and Yule who were both Fellows for many years. [Contributed by Nick Cox.]

  • p. 519, eqn. (17.31): Note that the ``change of parameters'' mentioned in the preceding text is not a reparameterization of a distribution. Recall that to achieve the minimum variance, there must exist some $q(\alpha)$ such that

    \begin{displaymath}
\frac{\partial\log p(x\mid\alpha)}{\partial\alpha} =
q(\alpha)(\beta(x) - \langle\beta\rangle).
\end{displaymath}

    We then define $l(\alpha)$ implicitly via $q(\alpha) = -l'(\alpha)$; the purpose is to make (17.32) come out neatly.

  • p. 519, eqn. (17.33): Note that $x$ is a vector of observables, not a single observable. Writing $x_1^n$ for the vector $(x_1,\ldots,x_n)$, (17.33) may be written more explicitly as

    \begin{displaymath}
p(x_1^n\mid\alpha) = \frac{m_n(x_1^n)}{Z_n(l_n)}\exp(-l_n(\alpha)\beta_n(x_1^n));
\end{displaymath}

    so to make (17.33) correspond to the results of Chapter 11, we must choose $n = 1$. If the estimators $\beta_i$ are related by

    \begin{displaymath}
\beta_n(x_1^n) = 1/n \sum_{i=1}^n \beta_1(x_i),
\end{displaymath}

    then the distribution that minimizes $\mbox{var}(\beta_1)$ also minimizes $\mbox{var}(\beta_n)$, assuming the $x_i$ are assigned independent and identical distributions:

    \begin{displaymath}
p(x_1^n \mid \alpha) = \prod_i \frac{m_1(x_i)}{Z_1(l_1)}\exp\left(-l_1\sum_i\beta_1(x_i)\right)
\end{displaymath}

    is identical to (17.33) for arbitrary $n$ if we choose $m_n(x_1^n) =
\prod_i m_1(x_i)$, $l_n = n l_1 $, and $Z_n(l_n) = Z_1(l_1)^n$.

  • p. 530, end of section, and p. 549, end of section: A more detailed discussion of the topic can be found in Larry Bretthorst's Ph.D. dissertation, Bayesian Spectrum Analysis and Parameter Estimation (Bretthorst got his doctorate under Jaynes.) This was published as Lecture Notes in Statistics 48, but is out of print; the best way to get it now is to download it from Bretthorst's web page on Probability Theory as Extended Logic (near the end of the page).

  • p. 531 (Section 17.7, ``The folly of randomization'') Jaynes's example of using Monte Carlo methods to do a simple one-dimensional integral is a bit misleading, in that most uses of MC methods involve high-dimensional integrals and sampling from from distribution $p(x)$ to compute an expectation over that distribution; in such cases, using an $n$-dimensional grid to numerically integrate is impractical. However, the emergence of techniques such as Latin squares sampling and the use of quasi-random sequences to improve the convergence of MC integration certainly seems to support Jaynes's contention (on p. 532) that ``Whenever there is a randomized way of doing something, there is a nonrandomized way that yields better results from the same data, but requires more thinking.'' I would only add the caveat that the better, nonrandomized way often enough seems to require much more thinking, and years of research.

  • p. 545, end of section 17.10.5: Chapter 17 is one of those roughed-out chapters that Jaynes never really finished, and it's unclear where he intended to go with this comparison of the Bayesian vs. orthodox estimators using the orthodox criterion of performance. However, let's continue where Jaynes left off. Using the corrected versions of equations (17.129) and (17.130) [see errata above], the expected squared error for the Bayesian estimator is

    \begin{displaymath}
\frac{\sigma^2}{N \overline{s^2} (1 - r^2)},
\end{displaymath}

    whereas the expected squared error for the orthodox estimator is

    \begin{displaymath}
\frac{\sigma^2 (1 - r^2)}{N \overline{s^2}} + r^4 A_0^2,
\end{displaymath}

    the extra term coming from the bias of the estimator.

    To better understand the behavior of $r$, let us assume that $\omega = 2\pi
/ k$ for some integer $k>2$, and let $N = nk$ (a complete number of cycles) for ease of analysis. It is easily shown that $r^2\leq 1$. Using

    \begin{displaymath}
\overline{s^2} = \frac{1}{k} \sum_{j=1}^k \sin^2(2\pi j/k) = 0.5
\end{displaymath}

    and

    \begin{displaymath}
\overline{t^2} = \frac{1}{N}\sum_{t=1}^N t^2 = \frac{(N+1)(2N+1)}{6}
\end{displaymath}

    and

    \begin{displaymath}
\overline{st} = \frac{1}{k}\sum_{j=1}^k j \sin(2\pi j/k)
\approx -\frac{k}{2\pi} \quad\mbox{for moderate to large $k$}.
\end{displaymath}

    we find that

    \begin{displaymath}
r^2 = \frac{72\overline{st}^2}{(N+1)^2 (2N+1)^2}.
\end{displaymath}

    This goes to zero quite rapidly; even for the minimum values of k=3 and n=1, giving N=3, we have $r^2\approx 0.00765$. As a result, the difference in the expected squared error of the two estimators is usually small. Experimenting with different values for $k$, $n$, and the ratio $\rho\equiv\sigma/A_0$, I found substantial advantage for the Bayesian estimator only for small values of $n$ and $\rho$; in particular, for $n = 1$ I found the expected square error for the orthodox estimator to become much higher than that for the Bayesian estimator as $\rho$ gets smaller than 0.02. For other cases I found only very small differences, often favoring the orthodox estimator.

next up previous