# Pont-sur-Sambre Power Plant

System identification refers to the process of learning a predictive model for a given dynamic system i.e. a system whose dynamics evolve with time. The most important aspect of these models is their structure, specifically the following are the common dynamic system models for discretely sampled time dependent systems.

## DaISy: System Identification Database¶

DaISy is a database of (artificial and real world) dynamic systems maintained by the STADIUS research group at KU Leuven. We will work with the power plant data set listed on the DaISy home page in this post. Using DynaML, which comes preloaded with the power plant data, we will train LSSVM models to predict the various output indicators of the power plant in question.

## System Identification Models¶

Below is a quick and dirty description of non-linear auto-regressive (NARX) models which are popular in the system identification research community and among practitioners.

### Nonlinear AutoRegresive (NAR)¶

Signal $y(t)$ modeled as a function of its previous $p$ values

\begin{align} y(t) = f(y(t-1), y(t-2), \cdots, y(t-p)) + \epsilon(t) \end{align}

### Nonlinear AutoRegressive with eXogenous inputs (NARX)¶

Signal $y(t)$ modeled as a function of the previous $p$ values of itself and the $m$ exogenous inputs $u_{1}, \cdots u_{m}$

\begin{align} \begin{split} y(t) = & f(y(t-1), y(t-2), \cdots, y(t-p), \\ & u_{1}(t-1), u_{1}(t-2), \cdots, u_{1}(t-p),\\ & \cdots, \\ & u_{m}(t-1), u_{m}(t-2), \cdots, u_{m}(t-p)) \\ & + \epsilon(t) \end{split} \end{align}

## Pont-sur-Sambre Power Plant Data¶

You can obtain the metadata from this link, it is also summarized below.

### Data Attributes¶

Instances: 200

Inputs:

1. Gas flow
2. Turbine valves opening
3. Super heater spray flow
4. Gas dampers
5. Air flow

Outputs: 6. Steam pressure 7. Main stem temperature 8. Reheat steam temperature

## System Model¶

An LS-SVM NARX of autoregressive order $p = 2$ is chosen to model the plant output data. An LS-SVM model builds a predictor of the following form.

\begin{align*} y(x) = \sum_{k = 1}^{N}\alpha_k K(\mathbf{x}, \mathbf{x_k}) + b \end{align*}

Which is the result of solving the following linear system.

\left[\begin{array}{c|c} 0 & 1^\intercal_v \\ \hline 1_v & K + \gamma^{-1} \mathit{I} \end{array}\right] \left[\begin{array}{c} b \\ \hline \alpha \end{array}\right] = \left[\begin{array}{c} 0 \\ \hline y \end{array}\right]

Here the matrix $K$ is constructed from the training data using a kernel function $K(\mathbf{x}, \mathbf{y})$.

### Choice of Kernel Function¶

For this problem we choose a polynomial kernel.

\begin{align*} K(\mathbf{x},\mathbf{y}) = K_{poly}(\mathbf{x},\mathbf{y}) = (\mathbf{x}^{T}.\mathbf{y} + \alpha)^{d} \end{align*}

## Syntax¶

The DaisyPowerPlant program can be used to train and test LS-SVM models on the Pont Sur-Sambre power plant data.

Parameter | Type | Default value |Notes --------|-----------|-----------|------------| kernel | CovarianceFunction | - | The kernel function driving the LS-SVM model. deltaT | Int | 2 | Order of auto-regressive model i.e. number of steps in the past to look for input features. timelag | Int | 0 | The number of steps in the past to start using inputs. num_training | Int | 150 | Number of training data instances. column| Int | 7 | The column number of the output variable (indexed from 0).
opt | Map[String, Double]| - | Extra options for model selection routine.

## Steam Pressure¶

 1 2 3 4 5 DynaML>DaisyPowerPlant(new PolynomialKernel(2, 0.5), opt = Map("regularization" -> "2.5", "globalOpt" -> "GS", "grid" -> "4", "step" -> "0.1"), num_training = 100, deltaT = 2, column = 6) 
 1 2 3 4 5 6 7 8 9 16/03/04 17:13:43 INFO RegressionMetrics: Regression Model Performance: steam pressure 16/03/04 17:13:43 INFO RegressionMetrics: ============================ 16/03/04 17:13:43 INFO RegressionMetrics: MAE: 82.12740530161123 16/03/04 17:13:43 INFO RegressionMetrics: RMSE: 104.39251587470388 16/03/04 17:13:43 INFO RegressionMetrics: RMSLE: 0.9660077848586197 16/03/04 17:13:43 INFO RegressionMetrics: R^2: 0.8395534877128238 16/03/04 17:13:43 INFO RegressionMetrics: Corr. Coefficient: 0.9311734118932473 16/03/04 17:13:43 INFO RegressionMetrics: Model Yield: 0.6288000962818303 16/03/04 17:13:43 INFO RegressionMetrics: Std Dev of Residuals: 87.82754320038951 

## Reheat Steam Temperature¶

 1 2 3 4 DaisyPowerPlant(new PolynomialKernel(2, 1.5), opt = Map("regularization" -> "2.5", "globalOpt" -> "GS", "grid" -> "4", "step" -> "0.1"), num_training = 150, deltaT = 1, column = 8) 
 1 2 3 4 5 6 7 8 9 16/03/04 16:50:42 INFO RegressionMetrics: Regression Model Performance: reheat steam temperature 16/03/04 16:50:42 INFO RegressionMetrics: ============================ 16/03/04 16:50:42 INFO RegressionMetrics: MAE: 124.60921194767073 16/03/04 16:50:42 INFO RegressionMetrics: RMSE: 137.33314302068544 16/03/04 16:50:42 INFO RegressionMetrics: RMSLE: 0.5275727128626408 16/03/04 16:50:42 INFO RegressionMetrics: R^2: 0.8247581957573777 16/03/04 16:50:42 INFO RegressionMetrics: Corr. Coefficient: 0.9744133881055823 16/03/04 16:50:42 INFO RegressionMetrics: Model Yield: 0.7871288689840381 16/03/04 16:50:42 INFO RegressionMetrics: Std Dev of Residuals: 111.86852905896446 

## Source Code¶

Below is the example program as a github gist, to view the original program in DynaML, click here.