# Performance Evaluation

Model evaluation is the litmus test for knowing if your modeling effort is headed in the right direction and for comparing various alternative models (or hypothesis) attempting to explain a phenomenon. The evaluation package contains classes and traits to calculate performance metrics for DynaML models.

Classes which implement model performance calculation can extend the Metrics[P] trait. The Metrics trait requires that its sub-classes implement three methods or behaviors.

• Print out the performance metrics (whatever they may be) to the screen i.e. print method.
• Return the key performance indicators in the form of a breeze DenseVector[Double], i.e. the kpi() method.

## Regression Models¶

Regression models are generally evaluated on a few standard metrics such as mean square error, mean absolute error, coefficient of determination ($R^2$), etc. DynaML has implementations for single output and multi-output regression models.

### Single Output¶

Small Test Set

The RegressionMetrics class takes as input a scala list containing the predictions and actual outputs and calculates the following metrics.

• Mean Absolute Error (mae)
• Root Mean Square Error (rmse)
• Correlation Coefficient ($\rho_{y \hat{y}}$)
• Coefficient of Determination ($R^2$)
 1 2 3 4 5 6 7 //Predictions computed by any model. val predictionAndOutputs: List[(Double, Double)] = ... val metrics = new RegressionMetrics(predictionAndOutputs, predictionAndOutputs.length) //Print results on screen metrics.print 

Large Test Set

The RegressionMetricsSpark class takes as input an Apache Spark RDD containing the predictions and actual outputs and calculates the same metrics as above.

 1 2 3 4 5 6 7 //Predictions computed by any model. val predictionAndOutputs: RDD[(Double, Double)] = ... val metrics = new RegressionMetricsSpark(predictionAndOutputs, predictionAndOutputs.length) //Print results on screen metrics.print 

### Multiple Outputs¶

The MultiRegressionMetrics class calculates regression performance for multi-output models.

 1 2 3 4 5 6 7 //Predictions computed by any model. val predictionAndOutputs: List[(DenseVector[Double], DenseVector[Double])] = ... val metrics = new MultiRegressionMetrics(predictionAndOutputs, predictionAndOutputs.length) //Print results on screen metrics.print 

## Classification Models¶

Currently (as of v1.4) there is only a binary classification implementation for calculating model performance.

### Binary Classification¶

Small Test Sets

The BinaryClassificationMetrics class calculates the following performance indicators.

• Classification accuracy
• F-measure
• Precision-Recall Curve (and area under it).
• Receiver Operating Characteristic (and area under it)
• Matthew's Correlation Coefficient
 1 2 3 4 5 6 7 8 9 val scoresAndLabels: List[(Double, Double)] = ... //Set logisticFlag = true in case outputs are produced via logistic regression val metrics = new BinaryClassificationMetrics( scoresAndLabels, scoresAndLabels.length, logisticFlag = true) metrics.print 

Large Test Sets

The BinaryClassificationMetricsSpark class takes as input an Apache Spark RDD containing the predictions and actual labels and calculates the same metrics as above.