Trust, Fairness, Explainability, and the Black Box

A Discussion of the Grounds for Responsible AI Systems

Aug 16, 2025

As algorithms take on more decisions in our daily lives — from what we see online to whether we qualify for a loan — the question arises: what does it really mean to trust artificial intelligence? In his 2024 article, “Fairness and Explainability for Enabling Trust in AI Systems,” Dimitris Sacharidis offers a thorough examination of the ethical, technical, and regulatory measures aimed at keeping artificial intelligence accountable to human values.

His analysis does not concern any AI technologies, but rather, AI systems, which Sacharidis defines as “a computer-based technology or software that is designed to perform tasks that typically require human intelligence.” (2024: 86) He adds that his inquiry is mostly concerned with machine learning (and particularly deep learning) based systems. He writes:

The distinguishing aspect of machine learning systems over rule-based systems is that they are not explicitly programmed how to act intelligently, but are rather implicitly programmed and learn from data directly. An ML model consists of a large set of parameters, whose values are learned during the training phase to best achieve a task-specific objective defined over a collection of training data. After testing and validation, the fitted ML model is then deployed as an AI system to make inferences about new, unseen data. While the training, testing, validation, and deployment phases are explicitly specified and well understood, the model itself, or more precisely its internal parameters, are not always interpretable by humans. (Sacharidis 2024: 86)

When it comes to such technologies, people’s feelings tend to fluctuate between blind enthusiasm and irrational fear. In response, philosophers, especially those specializing in the philosophy of trust, have begun discussing what would make AI trustworthy or deserving of trust. At its core, this is a philosophical problem which challenges what we mean by trust itself. But, it also has practical significance affecting how people accept and respond to technological shifts like AI.

Sacharidis’ approach to this question is original in that it calls for fostering a dual approach using both fairness and explainability. While the former is largely based on Rawls’s theory of justice and well understood, the latter is more open to discussion.

Sacharidis’s key argument is that trustworthiness depends on explainability. He calls for designing AI systems where explainability is not an afterthought, but a guiding principle. The result would be a greater likelihood that we would trust AI systems.

In this article, I will discuss his positions on fairness in part 1 and explainability in part 2. Then, building on similar discussions about trust and AI, I will argue that Sacharidis is wrong to put so much importance on explainability.

What would a “fair” AI system be?

Sacharidis’ dual approach starts with a discussion on the relationship between fairness and AI based on regulatory guidelines, such as the European “Ethics Guidelines for Trustworthy AI,” and describes this relationship as one of the fundamental pillars of trust in AI systems. He explains that the fairness of AI systems refers to “algorithmic fairness” and adds that:

The general objective is to ensure that AI systems, or algorithms, do not produce outcomes that discriminate against individuals. In algorithmic fairness, this general objective is formalized so that it can be quantified. This leads to measures of unfairness that can be calculated over the outputs of an AI system, ML model, or generally an algorithm. A specific measure can then be used to design fairness-aware systems or fairness mitigating procedures. (Sacharidis 2024: 97)

Drawing from legal, political, and technical viewpoints, he argues that AI systems should be designed to promote both “individual fairness” and “group fairness.”

Individual fairness asserts that similar individuals should be treated similarly. In contrast, group fairness aims to achieve statistical parity between protected groups (e.g., race, gender, or age). Both are important because, while individual fairness appeals to the intuition of equal treatment, group fairness targets structural inequities.

However, these two approaches are not always compatible, and choosing between them inevitably involves a moral judgment. In a hiring context, for instance, individual fairness might mean that two candidates with similar qualifications should have the same probability of being hired, regardless of their demographic group. Group fairness, on the other hand, might require that the proportion of hires from different demographic groups be roughly equal overall. However, these goals can conflict. Strictly following individual fairness could perpetuate an existing imbalance if one group has had fewer opportunities to gain qualifications historically. Conversely, strictly following group fairness could necessitate hiring a less-qualified candidate to address systemic disparities.

Fairness is also discussed in the context of ranking systems, such as job or loan recommendation algorithms. In these systems, fairness can apply to either the individuals being ranked (“item-side”) or the users receiving the rankings (“user-side”). These nuances show that fairness is not something that can be captured by one formula. It depends on the social setting, the system’s purpose, and who is affected.

Much of this discussion is rooted in a Rawlsian sensibility. In A Theory of Justice, Rawls argues that fairness requires designing principles of justice from an original position, where individuals are ignorant of their own social status, identity, and natural endowments. This idea resonates with attempts to define fairness in AI systems in a group-blind, risk-averse way.

Remarkably, Sacharidis’ overview reveals an implicit limitation of Rawls’s model. Achieving such fairness often requires considering empirical distributions, causal structures, and actionable forms of recourse. Designing just AI systems necessitates navigating the tension between ideal theory and real-world asymmetries.

As expected, fairness is a core element of trust in AI systems. However, what constitutes fairness is shaped by context, society, and culture. What is fair in one context may be considered unfair in another. This is likely difficult for AI systems to grasp, at least at their current stage of development.

Why is explainability important here?

While fairness addresses whether people are treated justly by AI systems, explainability addresses why decisions are made. According to Sacharidis, trust in AI presupposes a certain degree of intelligibility. In other words, we tend to trust what we can understand and withhold trust from what remains opaque.

In traditional rule-based systems, the logic behind decisions could often be traced and audited. However, contemporary machine learning models, particularly deep learning systems, generate decisions through layers of abstraction that are inaccessible to human reason. As a result, even high-performing models can be difficult to understand — leaving users and regulators unsure about how and why decisions are made.

Sacharidis distinguishes two broad categories of explainability: Intrinsic explainability and Post-hoc explainability. He explains the difference as follows:

Intrinsic explainability (or interpretability) “concerns ML models that are transparent and simple enough that we can understand how they work. Therefore, providing explanations is relatively straightforward and it often suffices to describe how the model works and what its feature values are.” (Sacharidis 2024: 90)
Post-hoc explainability “refers to the idea that the model is not modified or restricted in any way and the goal is to explain its outputs after it is trained.” Researchers “often call this black-box explainability to emphasize that it is suitable for opaque complex models that to us seem like black boxes.” (Sacharidis 2024: 90)

Moreover, explainability serves different purposes for different users. Sacharidis notes that explanations may aim to enhance transparency, enable error correction, foster trust, persuade users, improve decision-making processes, and increase satisfaction. These goals are not always compatible.

For example, a persuasive explanation may not be truthful — an AI might blame a system error on “a temporary glitch,” which sounds reassuring but hides the real cause. Conversely, a technically accurate explanation involving neural network weights or data drift may be too complex for most users to understand, making it less useful.

Furthermore, explanations may be local, focused on individual outputs, or global, summarizing model behavior across the board. Explanations come in many forms: a highlighted image, a short text, a confidence score — each suited to a different audience or purpose.

Evaluating explanations is also multifaceted. Sacharidis reviews various quantitative and qualitative methods. However, the question remains: The concept of a good explanation is inextricably linked to human judgment, institutional context, and the purpose of the interaction.

Explainability, like fairness, is not just a technical feature. It is shaped by how people use the system, what they expect, and what is at stake. In this sense, Sacharidis’ approach aligns with a broader shift in AI ethics, moving away from abstract metrics and toward human-centered design.

Sacharidis argues that explainability is essential for establishing trust in AI systems because people are more likely to trust decisions they understand. In many cases, it might be crucial to understand how a model arrives at its output for accountability and error correction for instance. Explanations help users detect biases, contest outcomes, and understand the task itself. Without explainability, particularly in complex models such as deep neural networks, AI decisions can remain opaque, which puts their legitimacy at risk.

Is explainability really a factor here?

In a recent paper published in Philosophy & Technology, Sam Baron (2025) challenges the idea that explainability is necessary for trusting AI systems. He argues that this view imports assumptions about interpersonal trust into human-machine relations, where they do not belong.

Strong conceptions of trust, those involving understanding the intentions, reasoning, or motivations of the trusted agent, are particularly difficult to apply to AI systems because they lack minds, intentions, and moral agency. If explainability is justified only by appealing to such strong conceptions of trust, then the case for its necessity in AI systems becomes much weaker.

Instead, Baron favors a moderate conception of trust that aligns more naturally with AI contexts. According to this view, trust can be based on evidence of reliability, consistency, and competence rather than insight into internal reasoning. We routinely trust systems — such as airplanes and medical instruments — without knowing exactly how they work, as long as they perform reliably. Similarly, an AI system that demonstrates accuracy, robustness, and safety under different conditions may be considered trustworthy in practice, even if its decision-making process remains unclear.

This perspective questions the growing emphasis on post-hoc explainability methods as a necessary condition for trustworthy AI. While explainability can be valuable in certain situations, such as supporting contestability or facilitating debugging, it is not essential to trust. In fact, placing too much weight on explainability can be misleading, particularly when it leads to overly simplistic, selective, or inaccurate interpretations of complex models.

Baron also emphasizes the importance of considering context when evaluating explainability. In high-stakes domains such as healthcare, explanations may serve legal, ethical, or procedural purposes that extend beyond mere trust. In such cases, transparency can justify decisions to affected parties and enable recourse in the event of harm.

However, these are fundamentally institutional demands, not epistemic ones. The important thing is not whether an individual user understands the model, but whether the system is governed in ways that ensure accountability and fairness.

Moreover, regarding the necessity of explainability for trusting AI systems, Hähnel et al. (2025) note that:

While some authors consider AI systems trustworthy only to the extent that their internal processes can be made transparent and explainable, others point out that, after all, we do trust humans without being able to understand their cognitive processes.

Conclusion

In light of these discussions, it becomes clear that, while explainability is often presented as a cornerstone of trustworthy AI, it may not occupy the foundational role that some assume.

Research convincingly demonstrates the importance of AI in promoting contestability and user confidence. However, critics are right to point out that trusting AI may not require an understanding of how a system works. Rather, it requires confidence that the system works reliably, safely, and within appropriate institutional safeguards.

This does not mean explainability does not matter — it just changes how and when it matters. It may be vital in domains where decisions must be challenged or justified, but less so where reliability and performance suffice.

Finally, both fairness and explainability must be understood as normative tools whose relevance depends on the goals, users, and social settings in which AI systems operate, not as universal technical criteria. Building responsible AI is not about following one rule. It means balancing different values — justice, transparency, accountability — with the practical need for systems that actually work.

Subscribe & Stay Curious

References

Baron, S. (2025). Trust, explainability and AI. Philosophy & Technology, 38(4).
Hähnel, M., Hauswald, R. (2025). Trust and Opacity in Artificial Intelligence: Mapping the Discourse. Philosophy & Technology, 38, 115.
Sacharidis, D. (2024). Fairness and Explainability for Enabling Trust in AI Systems. In A Human-Centered Perspective of Intelligent Personalized Environments and Systems, edited by B. Ferwerda et al., 85–110, Cham: Springer.
High-Level Expert Group on AI. (2019). Ethics Guidelines for Trustworthy Artificial Intelligence. European Commission.

You know, Cannot Name It

Aug 16

You write about trust in opaque systems as if it were a matter of convenience. But what you’re really pointing to is something else: the willingness to live in a world where decisions are made without explanation, and where the institution is no longer human but the machine itself. That’s what you left unsaid.

Expand full comment

8 replies by Romaric Jannel and others

My GloB

Explainability and fairness are interrelated, post-event normative (safety) mechanisms that we attempt to impose on the construction of AI and other systems to avoid accidents (harm and lawsuits).

As you also point out, the main import of the machine is that it realises the work first and foremost. As such, any preemtive restrictions, however well intended, must perforce limit the optimal development of the machine and therefore be counterproductive.

AI, like all machines, is the replication of what the human can and will do. It is a beefed-up re-creation of what the human already does and would like to do at greater scale to achieve satisfaction at various levels (both mental and material).

In reality, trust only comes into the equation when the machine delivers as expected or better, and only in areas where its use delivers satisfaction.

Truly trusting the technology is, to a large degree an ideal for which we have no basis, measurement or answer, especially under the identified conditions of Sacharidis' implicit programming.

5 replies by Romaric Jannel and others

13 more comments...

Philosophy and Beyond

Discussion about this post