Microsoft ML.NET 2.0: Improve machine learning model quality in three easy steps

1. ML.NET 2.0

Sometimes the shortest path is the one you know

As a .NET developer, I have been tempted many times into the world of Python by what it offers in the field of AI and I remember I was preparing material for an event, where I was demoing .NET on Raspberry Pi. Of course, working on the Raspberry Pi, you can’t resist reading the data directly from the sensors, and reading a lot of data from the sensors, naturally, the next step came, to do something useful with that data. It is known that the most painful part of A.I. it’s the lack of data. Or, as I like to say, in AI data is not the fuel but the engine, unlike classical programming. Well, I had that data, but I didn’t know what to do with it, despite the huge potential.

Ok, so I have the data and I don’t know what to do with it. Extending my knowledge to another language is not necessarily a problem, but learning the related libraries is a big problem of time and integration. Let’s imagine an environment where all processes are prepared for a .NET solution, having to prepare the environment for Python can be a serious problem for a .NET developer because it is a different ecosystem.

The moment when ML.NET was released was perfect for me because I felt like I got wings. ML.NET is easy to use for building an ML model, even for those with no data science background, and perhaps most importantly, it’s a .NET framework where you only write C# code.

ML.NET is a young framework and has a lot to recover before it reaches the level of AI leaders, but I don’t know if we should only see the disadvantages. Let’s take a look at this ML.NET performance paper: https://arxiv.org/pdf/1905.05715.pdf:

Using a 9GB Amazon review data set ML.NET trained a sentiment-analysis model with 95% accuracy. Other popular machine learning frameworks failed to process the dataset due to memory errors. Training on 10% of the data set, to let all the frameworks complete training, ML.NET demonstrated the highest speed and accuracy.
The performance evaluation found similar results in other machine learning scenarios, including click-through rate prediction and flight delay prediction.

Model Builder (AutoML)

Through the Model Builder, ML.NET provides developers with a very useful tool for training ML models for a predefined time interval, starting from a data set and ending with the selection of the best trainer for the selected scenario. The selection is made according to the quality of the explored models, the ML model thus obtained can be consumed immediately. In addition, Model Builder is able to generate boilerplate code with all steps taken interactively providing a good starting point in the world of AI.

Model Builder is an extraordinary tool that allows you to preselect the features that you want to include in building the model, but it doesn’t give you any suggestions about which features are more relevant. On the other hand, a model that contains too many features will require more time for training but also for prediction, and many times some features alter the quality of the built model more than they help, and this means that by carefully reducing the dimensionality of the ML model we could increase the accuracy and performance of the model. Let’s keep this in mind, because we will deal with it a little further down in this article.

I was saying earlier that Model Builder is capable of generating boilerplate code, but, from software development perspective, we would like to automate this process of training the ML model and we are not satisfied with the code thus generated.
But I have good news, at its core Model Builder has the AutoML, which provides the entire ML model training experience, and it would be great if we could take control of this code and take it a few steps further by analyzing the quality of this model, in order to make some improvements.

Assuming we loaded some data in trainingDataView object, these are the few lines that trigger the AutoML experiencing for multi-class classification (this is the scenario I chose for this article) in a given time interval.

Context = new MLContext(seed: 1);
        
SweepablePipeline preprocessingPipeline = Context.Transforms
  .Conversion.MapValueToKey(columnInference.ColumnInformation.LabelColumnName, columnInference.ColumnInformation.LabelColumnName)
  .Append(Context.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation));

var pipeline = preprocessingPipeline   
  .Append(Context.Auto().MultiClassification(labelColumnName: columnInference.ColumnInformation.LabelColumnName));

AutoMLExperiment experiment = Context.Auto()
  .CreateExperiment()
  .SetPipeline(pipeline)
  .SetMulticlassClassificationMetric(MulticlassClassificationMetric.MicroAccuracy, labelColumn: columnInference.ColumnInformation.LabelColumnName)
  .SetTrainingTimeInSeconds(time)
  .SetDataset(data);

var experimentResult = await experiment.RunAsync();

AutoML Output:

Best trainer: FastTreeOva                    Accuracy: 0.926  Training time: 338
----------------------------------------------------------------------------------
            MicroAccuracy      MacroAccuracy            LogLoss   LogLossReduction
                    0.926              0.929              0.235              0.826

Model Analysis

Measure twice and cut once

I would like to go a little further and see how we can improve an ML model. Again, we’re doing this without having any knowledge of Data Science, but don’t get me wrong, having a basic understanding of Data Science would help you better understand what you’re doing.

We might think that the most difficult part is building the training model, because, obviously, that’s where the magic of things seems to happen, but for this part, Model Builder helps us with great success using its AutoML. In fact, the most difficult part is to build a good enough model, and for this we will rely on the correlation matrix analysis and the PFI analysis. In fact, if we could automate these steps, it would be extraordinary, because for each model obtained through AutoML we could go further obtaining a better model with minimal effort.

Correlation Matrix

In Machine Learning a Correlation Matrix (also known as heatmap) is a table that displays the correlation coefficients between all possible pairs of the existing features (like the Cartesian product).

Generally speaking, having more features is a good thing, but this is increasing the dimensionality of our model, therefore the model performance (training time and prediction time, as well), but do we really need all these features? Some features might be rather redundant. 

A value closer to 0 means low (or not) correlated features, a value closer to 1 means high correlated features, and a value closer to -1 means inverted high correlated features.

var matrix = Correlation.PearsonMatrix(dataArray);

Correlation Matrix output (using a predefined threshold of 0,9):

Correlation Matrix, threshold: 0.9
----------------------------------------------------------------------------------
            TemperatuTemperatuLuminosit Infrared Distance      PIR Humidity
 Temperature   1.0000   0.9998   0.4831   0.5865  -0.2873  -0.0020   0.0607
Temperature2   0.9998   1.0000   0.4822   0.5855  -0.2859  -0.0018   0.0621
  Luminosity   0.4831   0.4822   1.0000   0.4388  -0.5428   0.1175   0.0457
    Infrared   0.5865   0.5855   0.4388   1.0000  -0.3765   0.0359   0.0051
    Distance  -0.2873  -0.2859  -0.5428  -0.3765   1.0000  -0.1412  -0.0824
         PIR  -0.0020  -0.0018   0.1175   0.0359  -0.1412   1.0000   0.0809
    Humidity   0.0607   0.0621   0.0457   0.0051  -0.0824   0.0809   1.0000
----------------------------------------------------------------------------------
No Feature         vs. Feature                    Rate
1. Temperature     vs. Temperature2             0.9998

Since Temperature2 and Temperature seem to be redundant (maybe in practice the distance between the sensors is small!), we may consider removing one of them.

Permutation Feature Importance

Using Permutation Feature Importance (PFI), we learn how to interpret ML.NET machine learning model predictions, because PFI shows the relative contribution each feature makes to a prediction. The way PFI works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest decreases. The larger the change, the more important that feature is. Additionally, by highlighting the most relevant features, we can focus on using a subset of more meaningful features which can potentially reduce noise and training time. We have to decide carefully which features we don’t need, because by removing some not-so-relevant features we risk introducing bias in our model.

You can see occasionally negative values in PFI results. In those cases, the predictions on the shuffled (or noisy) data happened to be more accurate than the real data. This happens when the feature didn’t matter (should have had importance close to 0), but random chance caused the predictions on shuffled data to be more accurate. This is more common with small datasets, like the one in this example, because there is more room for luck or chance.

The code may look a little more complicated because we need to replicate the pipeline used for AutoML.

var model = experimentResult.Model as Microsoft.ML.Data.TransformerChain<Microsoft.ML.ITransformer>;
var transformedTrainingData = experimentResult.Model.Transform(trainingData);

var pfi = Context.MulticlassClassification
.PermutationFeatureImportance(model.LastTransformer, transformedTrainingData, permutationCount: 3);

var metrics = pfi.Select(p => (p.Key, p.Value.MicroAccuracy)).OrderBy(m => m.MicroAccuracy.Mean);

PFI metrics output (using a predefined threshold of 0.02)

PFI (by MicroAccuracy), threshold: 0.02
----------------------------------------------------------------------------------
    No Feature           MicroAccuracy        95% Mean
    1. Infrared                -0.2467          0.0274
    2. Luminosity              -0.2181          0.0121
    3. Temperature             -0.1224          0.0019
    4. Distance                -0.0795          0.0025
    5. Temperature2            -0.0257          0.0043
    6. CreatedAt               -0.0186          0.0074 (candidate for deletion!)
    7. PIR                     -0.0076          0.0033 (candidate for deletion!)
    8. Humidity                 0.0000          0.0000 (candidate for deletion!)

We extract the best-performing trainer from the experimentResult object obtained by running AutoML, and proceed to its evaluation.

var predictions = experimentResult.BestRun.Model.Transform(testingData);
var metrics = Context.MulticlassClassification.Evaluate(predictions);

Evaluation results with the original dataset:

Best trainer: FastTreeOva                    Accuracy: 0.926  Training time: 338
----------------------------------------------------------------------------------
            MicroAccuracy      MacroAccuracy            LogLoss   LogLossReduction
                    0.926              0.929              0.235              0.826

Let’s get rid of the redundant features (indicated by the Correlation Matrix) and the irrelevant features (indicated by the PFI) from dataset and proceed to another experimentation with the diminished dataset and let’s check the evaluation settings again.

Now the diminished list of the features look like this:

var features = { "Temperature", "Luminosity", "Infrared", "Distance" };

Evaluation results with diminished (without redundant or irrelevant) set of features.

 Best trainer: FastTreeOva                    Accuracy: 0.943  Training time: 342
----------------------------------------------------------------------------------
            MicroAccuracy      MacroAccuracy            LogLoss   LogLossReduction
                    0.943              0.942              0.207              0.847

3. Conclusions

Less is more

If we have a dataset with many unprocessed features, the PFI will mark the candidate features for deletion. By removing one or more of these features and retraining the model we may get a better model (AutoML will take care of finding the best trainer for a given set of features). 

If the Correlation Matrix identifies highly correlated features, again, we may delete such features one by one and do the retraining and check if the model gets better with the new set of features.

I nice improvement of the source code associated with this article would be to automate the deletion of the most irrelevant features (using PFI) or the most correlated (redundant) features (using Correlation Matrix).

You can find the code associated with this article here:

https://github.com/dcostea/AutoMLSample and more articles about ML.NET on my blog: http://apexcode.ro/

Author: Daniel Costea
Trainer. Developer. Speaker. Microsoft MVP.