In a funny coincidence, this post has the same basic structure as my previous one: proving some technical result, and then looking at an application to machine learning. This time it’s Mercer’s theorem from functional analysis, and the kernel trick for SVMs. The proof of Mercer’s theorem mostly follows Lax’s *Functional Analysis*.

**1. Mercer’s Theorem **

Consider a real-valued function $latex {K(s,t)}&fg=000000$, and the corresponding integral operator $latex {mathbf{K}: L^2[0,1]rightarrow L^2[0,1]}&fg=000000$ given by

$latex displaystyle (mathbf{K} u)(s)=int_0^1 K(s,t) u(t), dt.&fg=000000$

We begin with two facts connecting the properties of $latex {K}&fg=000000$ to the properties of $latex {mathbf{K} }&fg=000000$.

**Proposition 1.*** If $latex {K}&fg=000000$ is continuous, then $latex {mathbf{K} }&fg=000000$ is compact. *

*Proof:* Consider a bounded sequence $latex {{f_n}_{n=1}^{infty} subset L^2[0,1]}&fg=000000$. We wish to show that the image of this sequence, $latex {{mathbf{K} f_n}_{n=1}^{infty}}&fg=000000$, has a convergent subsequence. We show that $latex {{mathbf{K} f_n}}&fg=000000$ is equicontinuous, and Arzela-Ascoli then gives a…