# Convex

## Model Solvers¶

Model solvers are implementations which either solve for the parameters/coefficients which determine the prediction of a model. Below is a list of all model solvers currently implemented, they are all sub-classes/subtraits of the top level optimization API. Refer to the wiki page on optimizers for more details on extending the API and writing your own optimizers.

The bread and butter of any machine learning framework, the GradientDescent class in the dynaml.optimization package provides gradient based optimization primitives for solving optimization problems of the form.

$$f(w) := \lambda\, R(w) + \frac1n \sum_{k=1}^n L(w;x_k,y_k) \label{eq:regPrimal} \ .$$

Name Class Equation
Logistic Gradient LogisticGradient $L = \frac1n \sum_{k=1}^n \log(1+\exp( -y_k w^T x_k)), y_k \in \{-1, +1\}$
Least Squares Gradient LeastSquaresGradient $L = \frac1n \sum_{k=1}^n \|w^{T} \cdot x_k - y_k\|^2$

#### Updaters¶

Name Class Equation
$L_1$ Updater L1Updater $R = \|\|w\|\|_{1}$
$L_2$ Updater SquaredL2Updater $R = \frac{1}{2} \|\|w\|\|^2$
BFGS Updater SimpleBFGSUpdater
 1 2 3 4 5 6 7 8 9 val data: Stream[(DenseVector[Double], Double)] = ... val num_points = data.length val initial_params: DenseVector[Double] = ... val optimizer = new GradientDescent( new LogisticGradient, new SquaredL2Updater ) val params = optimizer.setRegParam(0.002).optimize( num_points, data, initial_params) 

### Quasi-Newton (BFGS)¶

The Broydon-Fletcher-Goldfarb-Shanno (BFGS) is a Quasi-Newton based second order optimization method. To calculate an update to the parameters, it requires calculation of the inverse Hessian $\mathit{H}^{-1}$ as well as the gradient at each iteration.

  1 2 3 4 5 6 7 8 9 10 val optimizer = QuasiNewtonOptimizer( new LeastSquaresGradient, new SimpleBFGSUpdater) val data: Stream[(DenseVector[Double], Double)] = ... val num_points = data.length val initial_params: DenseVector[Double] = ... val params = optimizer.setRegParam(0.002).optimize( num_points, data, initial_params) 

### Regularized Least Squares¶

This subroutine solves the regularized least squares optimization problem as shown below.

$$\min_{w} \ \mathcal{J}_{P}(w) = \frac{1}{2} \gamma \ w^Tw + \frac{1}{2} \sum_{k = 1}^{N} (y_k - w^T \varphi(x_k))^2$$
  1 2 3 4 5 6 7 8 9 10 val num_dim = ... val designMatrix: DenseMatrix[Double] = ... val response: DenseVector[Double] = ... val optimizer = new RegularizedLSSolver() val x = optimizer.setRegParam(0.05).optimize( designMatrix.nrow, (designMatrix, response), DenseVector.ones[Double](num_dim)) 

### Back propagation with Momentum¶

This is the most common learning methods for supervised training of feed forward neural networks, the edge weights are adjusted using the generalized delta rule.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 val data: Seq[(DenseVector[Double], DenseVector[Double])] = _ //Input, Hidden, Output val num_units_by_layer = Seq(5, 8, 3) val acts = Seq(VectorSigmoid, VectorTansig) val breezeStackFactory = NeuralStackFactory(num_units_by_layer)(acts) //Random variable which samples layer weights val stackInitializer = GenericFFNeuralNet.getWeightInitializer( num_units_by_layer ) val opt_backprop = new FFBackProp(breezeStackFactory) val learned_stack = opt_backprop.optimize( data.length, data, stackInitializer.draw) 

Deprecated back propagation API

  1 2 3 4 5 6 7 8 9 10 11 12 val data: Stream[(DenseVector[Double], DenseVector[Double])] = ... val initParam = FFNeuralGraph(num_inputs = data.head._1.length, num_outputs = data.head._2.length, hidden_layers = 1, List("logsig", "linear"), List(5)) val optimizer = new BackPropogation() .setNumIterations(100) .setStepSize(0.01) val newparams = optimizer.optimize(data.length, data, initParam) 

The conjugate gradient method is used to solve linear systems of the form $Ax = b$ where $A$ is a symmetric positive definite matrix.

 1 2 3 4 5 6 7 8 val num_dim = ... val A: DenseMatrix[Double] = ... val b: DenseVector[Double] = ... ///Solves A.x = b val x = ConjugateGradient.runCG(A, b, DenseVector.ones[Double](num_dim), epsilon = 0.005, MAX_ITERATIONS = 50) 

### Dual LSSVM Solver¶

The LSSVM solver solves the linear program that results from the application of the Karush, Kuhn Tucker conditions on the LSSVM optimization problem.

$$\left[\begin{array}{c|c} 0 & 1^\intercal_v \\ \hline 1_v & K + \gamma^{-1} \mathit{I} \end{array}\right] \left[\begin{array}{c} b \\ \hline \alpha \end{array}\right] = \left[\begin{array}{c} 0 \\ \hline y \end{array}\right]$$
  1 2 3 4 5 6 7 8 9 10 11 val data: Stream[(DenseVector[Double], Double)] = ... val kernelMatrix: DenseMatrix[Double] = ... val initParam = DenseVector.ones[Double](num_points+1) val optimizer = new LSSVMLinearSolver() val alpha = optimizer.optimize(num_points, (kernelMatrix, DenseVector(data.map(_._2).toArray)), initParam) 

### Committee Model Solver¶

The committee model solver aims to find the optimum values of weights applied to the predictions of a set of base models. The weights are calculated as follows.

\alpha = \frac{C^{-1} \overrightarrow{1}}{\overrightarrow{1}^T C^{-1} \overrightarrow{1}}

Where $C$ is the sample correlation matrix of errors for all combinations of the base models calculated on the training data.

 1 2 3 4 5 6 7 val optimizer= new CommitteeModelSolver() //Data Structure containing for each training point the following couple //(predictions from base models as a vector, actual target) val predictionsTargets: Stream[(DenseVector[Double], Double)] = ... val params = optimizer.optimize(num_points, predictionsTargets, DenseVector.ones[Double](num_of_models))