Optimization problem using Reproducing Kernel Hilbert Spaces

Beginning with the answer to your second problem, suppose that $f \in H$, where $H$ is the reproducing kernel hilbert space. Let $S$ be the subspace spanned by the kernel functions $k(x_i, \cdot)$. Then by the theory of Hilbert spaces, $f$ can be written as $f = f_S + f_P$ where $$f_S(x) = \sum_{i=1}^n a_i k(x_i, x),$$ and $f_P$ is a function perpendicular to $S$. Moreover by the Pythagorean theorem $$\| f \|^2 = \| f_S \|^2 + \| f_P \|^2.$$ In particular this tells us that $\|f\| > \|f_S\|$ if $f_P \neq 0$.

Now consider $f(x_i)$, which can be written as $$f(x_i)=\langle f, k(x_i, \cdot) \rangle = \langle f_S, k(x_i, \cdot) \rangle + \langle f_P, k(x_i, \cdot) \rangle = \langle f_S, k(x_i, \cdot) \rangle + 0 = \langle f_S, k(x_i, \cdot) \rangle = f_S(x_i)$$

Thus for every $f$ we have $$\sum_{i=1}^n L(y_i, f(x_i) + b) = \sum_{i=1}^n L(y_i, f_S(x_i) + b)$$

Hence, $$F[f] = \lambda \| f\|^2 + \sum_{i=1}^n L(y_i, f(x_i) + b) > \lambda \| f_S\|^2 + \sum_{i=1}^n L(y_i, f_S(x_i) + b) = F(f_S)$$ and this holds for all $f \in H$. This means if a function is going to minimize $F$, it must be in the subspace $S$ and is a linear combination of kernel functions.


As for the first question, quadratic terms resembling $w^T A w$ appear through what is known as the Graham matrix, which is made from kernels: $$K = \left( k(x_i,x_j) \right)_{i,j=1}^n.$$ It is straightforward to prove that this matrix is symmetric and positive (semi)-definite, since if $a = (a_1, a_2, ..., a_n)$ then $$a^T K a = \left\langle \sum_{i=1}^n a_i k(x_i, \cdot), \sum_{j=1}^n a_j k(x_j, \cdot)\right\rangle=\left\|\sum_{i=1}^n a_i k(x_i, \cdot)\right\|^2$$

This gives us our first hint at how to recast $w^T A w$ into our language of reproducing kernel hilbert spaces.


Take for instance $$A = diag(a_1,a_2,a_3,..., a_n)$$ where each $a_i > 0$. Then $$w^T A w = \sum_{i=1}^n a_i w_i^2$$

Now imagine replacing $w$ with $f$, and each $w_i = f(x_i)$. Then $$\sum_{i=1}^n a_i w_i^2 = \sum_{i=1}^n a_i f(x_i)^2$$

By the same reasoning above, $$\sum_{i=1}^n a_i f(x_i)^2 = \sum_{i=1}^n a_i f_S(x_i)^2$$, and so we may add this to the loss function and still be guaranteed that a minimizer will be a linear combination of kernel functions.

So in short, you may introduce the term you want into your loss function. Here keeping in mind that $w = (f(x_1), f(x_2),...,f(x_n))$.


I would like to clarify my final question as descibed by the discussion with @Joel above (see the comments).

Let $\mathbf{w}=(w_1,\ldots,w_n)^T$, $\mathbf{x}_i=(x_{i1},\ldots,x_{in})^T\in\mathbb{R}^n$, $i=1,\ldots,m$, and $A=\big(a_{ij}\big)_{i,j=1}^{n}$ an $n\times n$ symmetric positive definite real matrix.

Let's suppose that we would like to minimize the following quantity with respect to $\mathbf{w}$

$$ J =\mathbf{w}\cdot\mathbf{w} + \sum_{i=1}^{m}\mathbf{w}\cdot\mathbf{x}_i + \mathbf{w}^TA\mathbf{w}. $$

Instead of the above optimization problem, we choose to look for a function $f$ that minimizes a functional, so that the problem remains equivalent with the first one. Let this function belong to a Reproducing Kernel Hilbert Space $\mathcal{H}$. The appropriate functional should be of the form $$ \Phi[f]= \big\lVert f \big\rVert^2_{\mathcal{H}} + \sum_{i=1}^{m}f(\mathbf{x}_i) + \cdots, $$ but I do not know how to express the quadratic form $\mathbf{w}^TA\mathbf{w}$ in terms of f. May you help?

What I have thought so far is as follows. We have "replaced" the quantity $\mathbf{w}\cdot\mathbf{w}$ by the norm $\big\lVert f \big\rVert^2_{\mathcal{H}}$, so we probably could write $$ \mathbf{w}^TA\mathbf{w} = \mathbf{w}^T\big(LDL^T\big)\mathbf{w} = \mathbf{w}^T \big(LD^{1/2}\big)\big(LD^{1/2}\big)^T \mathbf{w} = \mathbf{w'}\cdot\mathbf{w'}, $$ where $\mathbf{w'}=\Big(LD^{1/2}\Big)^T\mathbf{w}$. Could we then find a connection between the norm we should use instead?