• Philip Rooney

How Probabilistic Methods Can Help You Make Business Decisions


TL;DR.

With probabilistic methods:

  • You don’t need as much data as required in neural networks; making AI more leverageable for companies with limited data.

  • Probabilistic methods capture intelligence from both data and existing domain knowledge.

  • By capturing information about uncertainty, probabilistic methods can naturally handle data which is not pristine.

  • Models developed are naturally more interpretable, thereby mitigating the problem of black-box approaches (such as deep learning).

  • Models developed will predict the probability of an outcome; making for better decision making.

By now, anybody in the technology space is aware of some of the power of deep learning, reinforcement learning and NLP. Indeed we are all being influenced and impacted by the applications built using these machine learning techniques every day. Fewer have heard of another type of machine learning that will have a huge impact on the further adoption and efficacy of AI: Probabilistic Programming.


What is Probabilistic Programming?

Probabilistic programming is applicable to many traditional problems; most notably, and unlike the AI methods mentioned above, the need for “big data” is not so paramount. That makes AI innovations possible for most almost any company. After all, every company has some unique datasets even if they aren’t huge. My aim here is to demystify what probabilistic programming is about without getting too technical.

At the heart of probabilistic programming are Bayesian methods, which use Bayes’ Theorem, a mathematical idea dating back to 1763. These methods update models that describe how a particular system behaves using new data. Despite having been known for two and half centuries, applications using Bayesian methods have been limited by the complexity of solving the systems of equations it produces.


Probabilistic programming makes it possible to apply Bayesian reasoning to a wider range of problems than was possible only a decade ago. This is thanks to the introduction, and representative simplicity, of high-level probabilistic programming languages like Stan and the emergence of cutting-edge probabilistic inference techniques (for example HMC: Hamiltonian Monte-Carlo methods). These developments allow data scientists to create sophisticated statistical models without being held back by the complexities of how to fit these models to the data.


Making the most of data and knowledge

The key to getting the most out of your data is becoming comfortable with converting business problems into data problems. Capturing data allows the creation of statistical models that, if done correctly, will produce useful insights and recommendations for your business.


The real world is complex, so we use models as approximations, making simplifying assumptions so that a problem is computationally tractable. Roughly speaking, a good model should capture the important aspects of the behaviour of a system that interests you. By thoughtfully building probabilistic models, with purposefully chosen assumptions about your data, you will have outputs that are easier to interpret. This will help you and your team of experts assess the validity of the results. For instance, your data scientists should be able to run tests to ensure that the model fits the data reasonably.


For this reason, probabilistic models are often created by teams made up of domain experts and data scientists. Probabilistic models do not need to be simple and can be programmed to capture more complicated behaviours such as the incorporation of seasonal trends. By combining the skills and know-how of domain experts and data scientists, you can extract more useful information from your data.


Domain experts will often have a solid, nuanced, and sometimes complex understanding of the processes they wish to capture in a model. They are often best placed to make reasonable assumptions about the behaviour you want to understand.


Data scientists should have the technical and creative skills necessary to make a probabilistic model that represents this system and is capable of producing meaningful results. Both parties should then work together to interpret the results of the updated model and understand its strengths and weaknesses.


A project should have begun by identifying a business problem. For example, suppose that you want to improve the product recommendations to the customers of your online business. Your domain experts might explain how best do that based on their experience and data analytics; for instance, they might believe that customer preferences are likely to fall within a few discrete categories. Added to that your data scientists can identify some appropriate methods to create a model; in this case, they might recommend the use of matrix factorisation.



Data Quality, Size and Uncertainties

Having read around machine learning problems, and knowing the nature of your data, you will have concerns about what results to expect. Perhaps you are worried about the quality of your data, its size, or the uncertainty in applying the model. I now explain how these issues may be mitigated.


Perhaps you run a business which collects data which are incomplete or contains errors. For instance, some data may be missing, or there may be errors during input. The data may have some uncertainty due to measurement error. However, your domain experts may be aware of the types of errors that occur and have some idea of their rate and circumstances. For instance, you want to use customers buying products as a proxy for them liking it. However, you know a non-negligible number of customers have returned these products, and this data has not been saved over the years.


In these cases, probabilistic programming can be an excellent tool. Many machine learning methods currently assume that the data they are applied to is pristine and representative. The problem, in such cases, is that by failing to account for the inherent uncertainty in such data, you might easily end up making uninformed and wrong decisions. A probabilistic model does not need to assume that your data is correct. So long as you can model the uncertainty in the data, probabilistic programming will fold that through into uncertainty in the result. So, in our example, instead of keeping a customer buying a product as proof they liked it, you can use it as a probability that they liked it, scaled by the return rate.


Probabilistic programming techniques are therefore more robust to errors in data. Using them can make otherwise unusable data sources useful.


The dataset you intend to use may be small, but your team have expert knowledge on the system that created the data. We'll discuss the problem of how much data is required in a future post. For now, suffice it to say, that often a useful probabilistic model may require only a few data points. It is worth remembering that if a decision is likely to have a high impact on your business; a small improvement in your understanding, through a useful model, could be invaluable.


In addition to using data of variable quality, Bayesian models lend themselves neatly to combining multiple complex datasets in order to improve results. If a real-world process affects the data in two datasets, then modelling both datasets may give a better estimate of that underlying process, compared with using either set individually. This makes it natural and easy for you to combine multiple datasets that you own, or to supplement these datasets with suitable public datasets.


Probabilistic programming in practice


Let’s consider an example where you are an agricultural manager and you want to use a machine learning model to help you make decisions about crops on a farm owners land, and you have to explain these decisions to that owner. With historic data on crop yields, temperature and precipitation you, with others, can build a model to explain the yields as a function of weather. As the manager, you will be interested in applying this model to predict your yields in the coming year. Probabilistic programming can be used to develop a model which uses a plausible range of conditions (temperature and precipitation) to calculate the possible range of yield outcomes.


In the case of predicting yields, you may want to know not only the outcome for the most plausible weather conditions but the plausible range of outcomes for all possible weather conditions, weighted by their likelihood; after all, you will want to know if there is a non-zero chance of a catastrophic outcome, such as total crop failure. Probabilistic programming can help with this as it can make predictions from a fitted model over the plausible range of future data, proportional to the chance of them occurring. This makes probabilistic programming an excellent tool to understand the full uncertainty of a model given a dataset.


Also, you may be as interested in understanding why a prediction is being made and what it can tell you about a critical business process, as much as you are in the final result. As the agricultural manager, you will likely need to justify your decisions to the landowner. They would be unsatisfied with trusting everything to a difficult to interpret model (such as one built on deep learning). In the yield example, you might need to explain to the farm owner why you chose a particular weather model and be able to demonstrate how well your model fits the data, giving them confidence in the decisions you have subsequently made.


Unlocking the potential in your data


Probabilistic programming is the natural solution to many data problems which businesses face. While it can require more effort and technical skill to create a useful model, it has broad applications and can be robust to the complex, messy data that is often generated. Furthermore, it is usually simpler to interpret and can give more nuanced results which can help you make better business decisions.


There is a common misconception that in order to benefit from machine learning you need vast quantities of pristine data that is expensive and time-consuming to acquire. This simply is not true as we have attempted to explain in this post. Therefore, while being mindful that the quality of data is important, no company should rule out the possible benefits of machine learning; after all, every company has some datasets which are unique and have potential and often unlocked, value.

©2019 DataJavelin Ltd. Registered in England and Wales, company number 11357308​