Technical Debt in ML models

Technical Debts are also there in ML:

Complex models erode boundaries

* -- Entanglement of features and feature distributions
* -- Correction cascades creating cascade chains of models and dependency hell
* -- Undeclared consumers for the model predictions

Data dependencies are costlier than code dependencies

* -- Unstable Data Dependencies unstable input data or signals or predictions from a previous model..(For ex: in [speech to text](, the syllables is a prediction and signal/input to the word-language model)
* -- Underutilized Data Dependencies   (Creep in via Legacy Features, Bundled Features,
  Correlated Features etc)
* -- Static analysis of data dependencies can help mitigate these issues to some extent

Feedback loops

* -- Direct Feedback loops(In speech to text ,it can come from changes in languages and
* -- Hidden Feedback loops (These can come from not understanding the business use-case as
  explicitly as possible or other things like change in the nature of the use-case and
  demand itself. For ex: user expectations changing after getting used to the tool)

ML-system anti-patterns:

* -- Glue code -- in general, things like cleaning code, connecting model prediction, and
  business use-case etc.
* -- Pipeline Jungles  Huge mess of pre-processing of audio files.. different formats,
  different language and accent detection(this can also be cascaded models) etc..

* -- Dead code on Experimental codepaths: probably from a bunch of experimental models
  different NN architectures, different custom models etc..

* -- Abstraction Debt: No clear standard abstraction for ML models. (like RDBMS for

Common Smells:

* --  Plain-old-data type smells.. assume some data types but the input stream is
* -- Multiple Language smell: this is programming language and how using multiple languages
  in a project cause multiple problems/issues at the interfaces.
* -- Prototype smell: The prototype is written and makes invalid assumptions. Even whatever
  validation that has been done for the prototype is not valid outside of the small audience
  this was tested on.

Configuration Debt:

* -- Wide range of configurable options from input data stream segregation/categorizations,
  model size and dependencies tuned to latency/thoroughput of the predictions, model choice,
  input features, data summarization methods, verification methods etc..
* -- If there's a lack of configuration management the system can become a black box
  impossible to debug and therefore improve. While these are similar to common software
  applications, these are doubly problematic in ML models as a lot of models are considered
  black-box by default and are already hard to reason about without these configuration

Dealing With Changes in the external world.

* -- Fixed thresholds in Dynamic Systems:
* -- Monitoring and Testing for the model's failure limits (for ex: in case of a data
    Things to monitor: * -- Prediction Bias
               * -- Action Limits(say a trading algo relying on a model should
                 have limits)
               * -- Up-stream Producers (aka data pre-processing pipelines, for
                 ex: a moving window of 100 ticks/events may not be right for
                 different(higher) velocity of input data.)


* -- Data testing Debt
* -- Reproducibility Debt
* -- Process Management Debt
* -- Cultural Debt


Based on a small post found here.

One of the standard problems in ML with meta modelling algorithms(Algorithms that run multiple statistical models over given data and identifies the best fitting model. For ex: randomforest  or the rarely practical genetic algorithm) ,  is that they might favour overly complex models that over fit the given training data but on live/test data perform poorly.

The way these meta modelling algorithms work is they have a objective function(usually the RMS of error of the stats/sub model with the data)  they pick the model based on.(i.e: whichever model yields lowest value of the objective function).  So we can just add a complexity penalty(one obvious idea is the rank of the polynomial that model uses to fit, but how does that work for comparing with exponential functions?)  and the objective function suddenly becomes RMS(Error) + Complexity_penalty(model).


Now depending on the right choice of Error function and Complexity penalty this can find models that may perform less than more complex models on the training data, but can perform better in the live model scenario.

The idea of complexity penalty itself is not new, I don’t dare say ML borrowed it from scientific experimentation methods or some thing but the idea that the more complex a theory or a model it should be penalized over a simpler theory or model is very old. Here’s a better written post on it.

Related Post: