Machine Learning

Review- CausalML: A Python Package for Causal Machine Learning

This is a review of the CausalML package, a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research.

2 years ago • 5 min read

By Adrien Payong

Add speed and simplicity to your Machine Learning workflow today

Get started

Introduction

People have started devoting more attention to algorithms that integrate causal inference with machine learning. CausalML is a toolkit that implements techniques for causal inference. Multiple Python-based methods are made available through this package. The objective is to unite the two worlds of academic study and practical implementation of approaches. The major ideas and applications of the package are summarized in this review.

Machine learning algorithms are implemented in the CausalML Python package to provide modeling and causal inference techniques. Treatment(or intervention) comparison or ATE can be performed using standard causal analysis techniques.

It’s useful to have a finer-grained approximation of these effects. To estimate the effect at the particular level, CausalML provides the user with the possibility to evaluate the description of the variation on ATE(or CATE). By giving each customer a treatment that fits their needs based on these predictions, a lot of optimization and personalization options open up.

What is uplift modeling?

Thanks to CausalML, uplift modeling has become a strong modeling tool. Uplift modeling is a set of methods a business can use to predict the positive or negative impact of an action on a particular customer outcome. Customer relationship management, promotions, incentives, advertisements, customer service, recommendation systems, and even product design all make use of it to better target their customers and allocate their budgets.

An optimal treatment strategy is provided after assessing the ITE or CATE of the treatment for a user or set of users, taking into account the potential lift by and cost of the treatment. After getting a promotional email, a manager at a telecommunications company can predict how many customers who fit a certain profile will renew their service in the next billing cycle.

Uplift modeling process

Researchers will submit a random sample of the population to the action being analyzed (treatment dataset).
Another disjointed, random sample is also selected, to which the action is not applied. This is the control dataset, which will be used as a baseline to see how well the action worked.
Now that we have two sets of data to work with (treatment and control), we can create a model that predicts the difference between the two sets of data rather than the probability of objects belonging to a specific class.

CausalML: Python package for causal machine learning

Conducting a randomized experiment to draw causal inferences is not something that this toolkit is meant to substitute. Estimation of treatment(or intervention) comparison for issues related to business sometimes requires randomized experiments. Even though uplifting model, can be used with empirical data, this particular implementation is best used with data from a randomized experiment.

According to the paper,

Applications to observational data where the treatment is not assigned randomly should take extra caution. In a non-randomized experiment, there is often a selection bias in the treatment assignment (a.k.a. the confounding effect). One main challenge is that omitting potential confounding variables from the model can produce biased estimation for the treatment effect. On the other hand, properly randomized experiments do not suffer from such selection bias, that provides a better basis for uplift modeling to estimate the CATE (or individual level lift).

Python packages for causalML

Some packages are available that interact with CausalML:

Pylift includes just one metalearner. The current version of the CausalML package contributes by acting as a central hub for uplift modeling techniques.
Ensemble algorithms for uplift modeling.
DoWhy Python module uses graphical models to provide a structured approach to the issue of drawing causal inference.
EconML Python module was made so that machine learning techniques could be used to examine the variation of treatment effect from econometrics.

Why do we need CausalML?

Causal inference and machine learning have been a popular academic topics. The experience of researchers at Uber has led us to think that this study will produce real-world applications. The authors of this toolkit set out to expand the audience for such applications. The purpose of the first release of this toolkit was to make uplift modeling techniques more accessible to a large audience.

We can read from the paper,

Further, we have built the package flexible in terms of the types of outcome variables that can be modelled, covering both regression and classification type tasks. The package also contains algorithms that can be used with data from experiments with multiple treatment groups.

Algorithms supported by CausalML

There are various algorithms that can be used with this package, however here are a few examples:

Meta-learner algorithms

T-Learner: T-Learner is a two-step process. In the first step, the control response function is estimated using data from the control group by a base learner, which can be any supervised learning or regression estimator. Second, the treatment response function is estimated.
S-learner: With only one machine learning model, S-learner can estimate the treatment effect.
X-Learner: X-Learner can be described in three stages: First, estimate the response functions using any supervised learning or regression algorithm and denote the estimated functions. Second, impute the treatment’s effect on the user level. Third, weighted average can be used to define the CATE estimate.
R-learner: Out-of-fold estimations of outcomes and propensity scores are used by R-learner.
Doubly robust (DR) learner: In two steps, DR-learner cross-fits a highly robust scoring function to estimate the CATE.
TMLE learner: To estimate a statistical quantity of interest, we can use the semiparametric Targeted Maximum Likelihood Estimation (TMLE).

Tree-based algorithms:

Uplift trees and random forests with Euclidean distance, KL divergence, and Chi-Square.
Uplift trees or random forests based on contextual treatment selection.
Causal Tree — Work in Progress

You can look at the documentation for more information.

What issues does CausalML address?

Targeting improvement, individualized interaction, and analysis of cause and effect are just a few of the many applications of CausalML.

Striving for the best performance

To maximize our marketing ROI, it is possible to use our toolkit to zero in the most promising prospects.
When we promote products and services for our existing clientele, we can target our promotional efforts toward those clients who are most likely to buy a new item or service as a result of the campaign, thereby freeing up inbox real estate for the rest of our audience.
According to an internal study, uplift modeling used on as little as 30 percent of users may have the same impact on sales as a blanket campaign offered to all consumers.

Evaluate the relationship between cause and effect

Because of CausalML’s extensive capabilities, we can assess the effect of a specific event on empirical data.
It s possible to analyze the impact of cross-selling on customers’ prospective platform spending. Since we don’t want to prevent some consumers from making the shift to the brand-new item, conducting a randomized test would be impossible.
We can use this package to understand the repercussions of cross-selling throughout the entire platform.

Personalization

You can use CausalML to customize the user’s experience .
There are a variety of avenues through which a company can communicate with its consumers, from upselling to messaging.
Using CausalML, one can determine the ideal tailored offer for every customer by estimating the impact of every possible combination.

Conclusion

Uber’s CausalML developers are constantly fine-tuning and updating the package. The team’s goal is to make the methods already included in the toolkit more efficient. Powerful uplift modeling tools are planned for the future. They are looking at uplift modeling and other modeling strategies to address optimization issues.

CausalML: A Python Package for Causal Machine Learning

Add speed and simplicity to your Machine Learning workflow today

Get started

References

Huigang Chen, CausalML: Python Package for Causal Machine Learning, introduction, https://arxiv.org/pdf/2002.11631.pdf

Documentation Meta-Learner Algorithms, ¶:https://causalml.readthedocs.io/en/latest/methodology.html

Documentation Meta-Learner Algorithms¶, https://causalml.readthedocs.io/en/latest/about.html

Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning, https://arxiv.org/pdf/1706.03461.pdf

An Illustrated Guide to TMLE, Part I: Introduction and Motivation, https://www.khstats.com/blog/tmle/tutorial

Blog

Docs

Community

ML Showcase

Professional Services

Talk to an Expert

Rodin: Roll-out Diffusion Network

NARF: Neural Articulated Radiance Fields

Solutions

Product

Resources

Company