# Automatic Feature Extraction

Summary

"Automatic Feature Extraction (AFE) is a technique by which approximate eigen-functions can be extracted from gram matrices constructed via kernel functions. These eigen-functions are features engineered by a particular kernel.""

## Some Mathematical Background¶

### Definitions¶

Let $X_k \ \in \ \mathbb{R}^d \ , \ k = 1, \cdots ,n$ be a random sample drawn from a distribution $F(x)$. Let $C \in \mathbb{R}^d$ be a compact set such that, $\mathcal{H} = \mathcal{L}^2(C)$ be a Hilbert space of functions given by the inner product below.

$$<f,g>_{\mathcal{H}} = \int f(x)g(x) dF(x)$$

Further let $M(\mathcal{H}, \mathcal{H})$ be a class of linear operators from $\mathcal{H}$ to $\mathcal{H}$.

### Nyström method¶

Automatic Feature Extraction (AFE) using the Nyström method aims at finding a finite dimensional approximation to the kernel eigenfunction expansion of Mercer kernels, as shown below.

$$K(x,t) = \sum_i{\lambda_i \phi(x)\phi(t)}$$

It is well known that Mercer kernels form a Reproducing Kernel Hilbert Space (RHKS) of functions. Every Mercer kernel defines a unique RHKS of functions as shown by the Moore-Aronszajn theorem. For a more involved treatment of RHKS and their applications the reader may refer to the book written by Bertinet et.al.

Mercer's theorem states that the spectral decomposition of integral operator of $K$, $\mathcal{T} \in M(\mathcal{H},\mathcal{H})$ defined below yields the eigenfunctions which span the RHKS generated by $K$ and having an inner product defined as above.

$$(\mathcal{T}\phi_i)(t) = \int K(x,t) \phi(x) dF(x)$$

Equation above is more commonly also known as the Fredholm integral equation of the first kind. Nyström's method method approximates this integral using the quadrature constructed by considering a finite kernel matrix constructed out of a prototype set $X_k \ k = 1, \cdots, m$ and calculating its spectral decomposition consisting of eigenvalues $\lambda_k$ and eigen-vectors $u_k$. This yields an expression for the approximate non-linear feature map $\hat{\phi} : \mathbb{R}^d \longrightarrow \mathbb{R}^m$.

$$\hat{\phi}_{i}(t) = \frac{\sqrt{m}}{\lambda_i}\sum_{k=1}^{m}K(X_k, t)u_{k,i}$$

## AFE in DynaML Kernels¶

The SVMKernel[M] contains an implementation of AFE in the method

 1 2 3 4 featureMapping( decomposition: (DenseVector[Double], DenseMatrix[Double]))( prototypes: List[DenseVector[Double]])( data: DenseVector[Double]): DenseVector[Double] 

Note

The SVMKernel class is extended by all the implemented library kernels in DynaML thereby enabling the use of AFE in potentially any model employing kernels.