The Return of Pseudosciences in Artificial Intelligence: Have Machine Learning and Deep Learning Forgotten Lessons from Statistics and History?
In this paper, we contend that the designers and final users of these ML methods have forgotten a fundamental lesson from statistics: correlation does not imply causation. Not only do most state-of-the-art methods neglect this crucial principle, but by doing so they often produce nonsensical or flawed causal models, akin to social astrology or physiognomy. Consequently, we argue that current efforts to make AI models more ethical by merely reducing biases in the training data are insufficient. Through examples, we will demonstrate that the potential for harm posed by these methods can only be mitigated by a complete rethinking of their core models, improved quality assessment metrics and policies, and by maintaining humans oversight throughout the process.
Given the seriousness and sensitivity of some of these tasks delegated to AI systems, there is a growing interest within both the ML and ethics communities to advocate for fairer and socially responsible AI algorithms [1, 2] that minimize harm’s risks and are as unbiased as possible. This movement has taken various forms including the development of explainable AI [3], and a stronger focus on training Machine Learning algorithms using data that have been curated to reduce biases of all kinds [4].
primary ethical concern with current machine learning and deep learning methods lies in the undue attribution of causality by their designers and users. Indeed, while it is undeniable that Deep Learning methods -and in particular convolutional networks [5] for images and videos, as well as autoencoders [6] for complex data- are highly effective at identifying complex and intricate relationships as well as correlations from large amount of training data, it remains a fundamental error to assume that such systems, which are inherently statistical, can be trusted for sensitive tasks that should require explainability. We contend that bestowing these deep learning-based systems with what amounts to ”oracle-like” powers is not only selling snake oil, but also akin to endorsing pseudosciences such as Lombrosianism, physiognomy, and social astrology. Moreover, we argue that, in addition to their lack of knowledge and disregard for historical context in various application domains, too many Machine Learning researchers seem have forgotten the fact that the field of machine learning originated as a branch of statistics, where a key tenet is that correlation does not imply causation.
• By examining the metrics used to validate these methods and exploring simple probabilities, we will highlight the substantial harm these algorithms can cause when applied in critical public sectors such as justice and security.
• We will then address how the prevailing focus on reducing bias through curated training data, while promoting AI fairness, fails to tackle the core issue, which lies in the models themselves. Indeed, the “theory-free” argument put forward by proponents of current AI methods not only makes biases more challenging to detect, but is frequently used alongside high-quality metrics as a misleading justification for the alleged fairness of a system.
Marketing is perhaps the most well-known of these purposes, and also the most innocuous: such AI programs are referred to as recommender systems [28]. These systems aim to target the right advertisements to the right individual, increasing the likelihood of converting views into sales. AI sponsored advertising selects online ads, is embedded into smart TVs, and is also used by retailers to recommend products based on prior purchases. However, other categories of AI-based profiling carry more serious implications. One example is credit rating algorithms, which determine who is eligible for a loan, whether it be a small loan for e-commerce [29], or a larger one for purchasing a home [30]. Furthermore, despite several regulations such as in the European Union, the explainability of these models remains challenging and costly [31], and they have often proven to perpetuate the same racial or social-economic biases as other AI methods.
With a 95% precision and recall -which is within the norm for current state-of-the-art methods-, London for instance would potentially feature 4800 to 9600 wrongly convicted people by such AI systems.
However, numerous philosophers and scientists have compellingly argued that Science and the knowledge it produces are shaped by human normative values [50, 51]. Thus, the notion “value-free” Science may well be a myth. Indeed, scientific research is always conducted within a broader context, and its value depends on the specific applications it serves and its direct (or indirect) impacts on human lives [52]. In essence, any scientific research that serves a purpose can never truly be “value-free”.
In the field of Machine Learning and Artificial Intelligence, a parallel concept to “value-free” science has emerged in the form of the so-called “theory-free” models [49, 53]. Proponents of theory-free models argue that because these models do not rely on specific mechanisms from application fields and are “data-driven”, they would be free from human biases, preconceived judgments, and ontological categories. We contend that the argument of “theory-free” AI models is a fallacy, scientific quackery, and far too often serves as a smoke screen to legitimize bigotry through a “data-driven” pseudo-truth:
In this paper, we have explored the troubling resurgence of pseudoscientific methodologies within the realm of Artificial Intelligence. In particular, we have discussed how the Deep Learning technology made it easier to hide the pseudoscientific nature of some applied tasks due to their inherent complexity, black-box type model, but also thanks to their seemingly high accuracy. Our analysis further highlights a critical issue: despite their advanced capabilities, these AI systems have often neglected fundamental lessons from statistics, and in particular the principle that correlation does not imply causation. We have shown how the high performances of these models and reliance on the “theory-free” ideology made it possible to inadvertently replicate and even exacerbate the errors of past pseudosciences. This includes approaches reminiscent of Lombrosianism and physiognomy, which once justified discriminatory practices through dubious correlations
Our findings suggest that merely addressing biases in training data is insufficient for mitigating the risks posed by these technologies. Instead, a more comprehensive strategy is required. Such strategy should start by a better training of future ML expert to include ethic courses so that they can tell in advance what applications are ethically and morally acceptable or not. Further key elements of the strategy should involve a fundamental reflection on Deep Learning models and “theory-free” models at large, the systematic use of quality metrics relevant to assess the harm potential of an AI model, and stringent human oversight by field experts