The concept of convexity and Jensen's inequality (see practice problems at the end of the convexity notes).

Information Theory: The definition of entropy. The interpretation of log_2(1/P(x)) as the number of bits needed to specify x. The definition of cross-entropy and KL divergence and basic properties KL(P,Q) >= 0, KL(P,Q)=0 iff P=Q, H(P,Q) = H(P) + KL(P,Q).

Covariance: The definition of the covariance matrix. The fact that it is positive semi-definite and has orthogonal eigenvectors with nonnegative eigenvalues. Equation (12).

PCA: Equation (5).

Linear Regression. Equations 3.15 and 3.28 of Bishop.

Linear Classification: Equations (3), (5), (6), (7), (10), (12), (18), (24). Which loss functions and regularizers are convex.

Bias-Variance: equation (9)

Generaization Bounds: Equation (1), Chernoff bound, union bound, and Kraft inequality. Equations (13) and (15).

Kernel Methods:

Equations (3), (4), (5), (6), (8), (13), (16). General definition of a kernel in terms of a feature map mapping object to $\ell_2$. Gaussian kernels, polynomial kernels.

Boosting: Equations (8) and (9).

Neural Networks: Equations (2), (3), (4) and (5). The Newton-Raphson update equation.

Hidden Markov Models. The Viterbi algorithm --- The definition of V_{i,t} and equation (4). The forward-backward algorithm --- the definitin of F an B and equations (7), (9), and (10).

PCFGs: The PCFG Viterbi algorithm and inside-outside.

K-means Clustering and EM: Equations (1), (2), (3), (4), (6), (7). The update equations at the end of section 3 and again for section 4. Equations (8) and (9). The Baum-Welch algorithm --- the update equations at the end of section 6. The ability to design a new EM algorithm by inspection.