CORRELATION IS NOT A MEASURE OF CAUSATION
CORRELATION IS NOT A MEASURE OF CAUSATION
Indira Baimbetova
Master of Faculty of Information Technology, Kazakh-British Technical University,
Kazakhstan, Almaty
ABSTRACT
The majority of Machine Learning-based projects are more concerned with forecasting outcomes than with understanding causality. Data correlations are a strength of machine learning, but not causality. It's important not to make the mistake of confusing correlation with causation. This problem severely limits the capacity to rely on machine learning for decision-making. As a result, tools that can comprehend causal linkages between data and generate machine learning solutions that can generalize effectively are required. Current systems, according to machine learning specialists, lack the ability to recognize or react to unexpected situations for which they have not been expressly programmed or educated.
Keywords: causality, correlation, randomization, causal models.
Introduction
We're engrossed in the concept of cause and effect. Is it more or less likely that I will contract COVID if I get vaccinated? These are the kinds of decisions we make all the time, both good and poor. We are thinking about cause whenever we examine the potential downstream repercussions of our decisions, whether consciously or unconsciously. We're picturing what the world would look like in various scenarios: what would happen if we did X?
Although the concept of human consciousness is a stretch, causation is about to trigger a revolution in the way we use data. "Causality...is the next frontier of AI and machine learning," writes Jeannette Wing in an MIT Technology Review article.
Causality is a concept that helps us to reason about the world and is crucial in all types of decision-making. It's crucial to business decisions, yet it's difficult to come by. Will sales increase if we cut our prices? In medicine, causality is crucial: would this new treatment shrink cancer tumors? This type of reasoning necessitates imagination: we must be able to envision what will happen if we do X as well as what will happen if we do not do X. When data is used correctly, it allows us to predict the future based on what has happened in the past. And when it's misused, we just keep making the same mistakes we've previously made. We may also use causal inference to create interventions: knowing why a client makes certain actions, such as churning, will have a significant impact on the success of your intervention.
When causality may not exist, we have heuristics like "correlation does not imply causation" and "previous performance is no sign of future returns," but pinning down causal effects strictly is difficult. It's no coincidence that the majority of causality heuristics are negative—much it's easier to disprove causality than it is to prove it. It's more critical than ever to re-evaluate methodologies for establishing causality as data science, statistics, machine learning, and AI grow in importance in business.
In this paper, I will try to emphasize the significance and necessity of causal inference as a message for machine learning progress.
Machine Learning Problems
Machine Learning-based solutions have a number of drawbacks. As you may be aware, machine learning algorithms in their current state are biased, have a low level of explainability, and are limited in their ability to generalize patterns seen in a training data set across numerous applications. Improving generalization has become critical. Generalization refers to a model's ability to adapt to new, previously unseen data derived from the same distribution as the model's initial data. Furthermore, existing machine learning techniques have a tendency to overfit the data. Indeed, instead of identifying the real/causal links that will continue to hold across time, they aim to learn the past correctly.
We commonly employ all available variables (or a subset selected for predictive performance) to predict an outcome in supervised learning. We incorporate a considerably deeper dependency structure between variables with structural causal models(Figure 1)
Figure 1. structure between variables with structural causal models
When the topic expands away from very limited instances, data won't answer the question because Deep Learning (DL) has concentrated too much on correlation without causality. In fact, a lot of real-world data isn't created in the same way that data used to train AI models is[1]. To put it another way, deep learning is good at detecting patterns in data, but it can't explain how they're related.
Causality is defined as the influence of one event, process, or state, a cause, on the production of another event, process, or state, an effect, where the cause is partially accountable for the effect and the effect is partly dependent on the cause. The ability to understand the causes and effects of many occurrences in complex systems would aid us in developing better solutions in a variety of fields, including health care, justice, and agriculture. Indeed, when correlations are misconstrued for causality, these fields should be cautious.
The Cause-and-Effect Chain
Judea Pearl, an author of much key work[2] in causality, portrays three types of reasoning we might undertake as rungs on a ladder in her book The Book of Why. These rungs illustrate when causality is required and what it provides(Figure 2).
Figure 2. The ladder of causation, as described in The Book of Why
We can use statistical and predictive reasoning on the first rung. This covers the majority (but not all) of our machine learning work. We can construct complex forecasts, infer latent variables in complex deep generative models, and group data based on subtle relationships. All of these items are stacked on the first rung.For instance, a bank wants to know which of its current company loans are likely to default so that it can prepare financial projections that account for potential losses.
Interventional reasoning is the second rung. We can use interventional reasoning to forecast what will happen if a system is altered. This allows us to specify which traits are unique to the specific observations we've made and which should remain constant in new situations. A causal model is required for this type of reasoning. In causation, intervening is a crucial step, and we'll cover both interventions and causal models in this chapter. For example, a bank may change its policies in order to lower the amount of defaulted loans. To predict what will happen as a result of this intervention, the bank must first comprehend the causal relationships that influence loan default.
Counterfactual reasoning is the third rung. On this rung, we can discuss not only what has occurred, but also what might have occurred if the conditions had been different. Intervention requires a more thoroughly described causal model than counterfactual reasoning. This type of reasoning is extremely strong, as it allows for a mathematical explanation of computing in alternate universes with different happenings. For instance, a bank could want to know what the anticipated return on a loan would have been if they had offered alternative terms.
Causal Findings
One of the most basic objectives in science is to find causal correlations. Randomized experiments[3] are a popular method. Researchers, for example, recruit people and randomly divide them into two groups to see if a freshly created drug is effective for cancer treatment. The first is the control group, which receives a placebo, and the second is the treatment group, which receives the newly discovered medicine. Randomization is used to eliminate the impact of potential confounders. For example, age can be one of the confounders that impacts both whether or not to take the drug and the treatment result. As a result, in actual tests, the age distribution of the two groups should be nearly identical.
The Average Treatment Effect (ATE) can be used to calculate a quantitative estimate of a medicine's effectiveness:
ATE = E[Y|do(X = 1)] - E[Y|do(X = 0)], (1)
There is do(X=1) denotes the administration of medicine to the patients and do(X=0) denotes the administration of a placebo to the subjects. Intervention is represented by the do operator, which removes all incident edges to X and sets X to a fixed value (Figure 3).
Figure 3. Intervening on X is equivalent to removing all incident edges to X and setting Xto a fixed value
Randomized studies, on the other hand, are sometimes prohibitively expensive and difficult to carry out, and they might even raise ethical concerns. There are two types of traditional techniques to causal discovery.
Consider how conditional independence restrictions can be used to learn causal links (Figure 4).
Figure 4. Illustrations of using conditional independence constraints to identify the causal structure
Assume you have three variables, X, Y, and Z, with a causal network that looks like this: Y - X -> Z. There is only one independence restriction in this case: X, Y, and Z must all be independent. We can reconstruct the causal graph up to the Markov equivalence class:Y – X – Z using this constraint. Note that the causal directions cannot be determined uniquely in this example; Y- X -> Z, Y -> X -> Z, and Y - X - Z all have the same independence restrictions (Figure 4(a)). Consider the following scenario (Figure 4(b)): Y -> X - Z -> Y -> X - Z -> Y -> X Y and Z are reliant on X in this situation, although Y and Z are minimally independent. As a result, we can define the causal structure between the three variables, which is Y -> X - Z, in a unique way. The so-called score-based approach looks for the equivalence class that gives the highest score under certain scoring criteria, such as the Bayesian Information Criterion (BIC), the posterior of the graph given the data, and generalized score functions[4].
Constrained functional causal models[5], which express the effect as a function of the direct causes plus an independent noise term, are another set of techniques. The causal direction predicted by the constrained functional causal model is generically identifiable, in that model assumptions such as noise-cause independence hold only for the true causal direction and are violated for the incorrect causal direction. Assume we have two observed variables, X and Y, with the relationship X -> Y. We further suppose that Y follows the functional causal model below:
Y = ƒ(X) + E , (2)
when X represents the source of Y, E represents the effect of some unmeasured components, and f represents the causal mechanism that decides the value of Y, together with the values of X and E. Furthermore, the noise term E is unaffected by the cause X. If we go backwards in time, that is, if we go backwards in time,
X = ɡ(Y) + E' , (3)
E' is no longer a function of Y. As a result, we can use this imbalance to determine the causative direction.
Counclusion
Correlation is and will continue to be a valuable tool. It's a means to an end, not a goal in and of itself. We urgently need to move past the notion that correlation is a sufficient proxy for causality. As we've seen in the many given instances, a lack of understanding of the problem's causal structure can lead to incorrect conclusions. Because issues with causal inference are common in some socially significant domains of research, such as health, sociology, and finance, researchers must exercise extreme caution when validating any causal claims they make after examining the data. Indeed, without this knowledge, a researcher risks reaching incorrect findings, which could have disastrous implications, thus this phase should never be skipped. The fundamentals of causal inference were discussed, as well as instances that demonstrated its usefulness in machine learning research.
References:
- Moritz Hardt, Benjamin Recht. Patterns, predictions, and actions: A story about machine learning. 2021.
- Judea Pearl, Dana Mackenzie’s. The Book of Why. The New Science of Cause and Effect. New York: Basic Books. 2018.
- Ding, Peng, Exploring the Role of Randomization in Causal Inference, 2015.
- Huang, B., Zhang, K., Lin, Y., Schölkopf, B., & Glymour, C. Generalized Score Functions for Causal Discovery. 2018.
- Harmen Oppewal. Concept of Causality and Conditions for Causality. 2010.