Agents and Explainable Reinforcement Learning

5 Mar

Agent Systems within modern AI are generally based on Reinforcement Learning (RL). RL agents that learn by interacting with an environment to maximize cumulative rewards. They follow a closed-loop cycle of observation, decision-making, action, and learning. Every cycle for an RL agent is typically made of these steps:

Observation
- The agent receives information from the environment (e.g., the state of a process, environment, or sensor data). This information is used to understand the current situation.
Decision Process
- The agent uses a Policy to decide on an action. A Policy is a strategy that maps observations (states) to actions.
- Actions can be chosen using either a deterministic (same action every time for a given state) or stochastic (probability-based) process.
- The symbolic system of Hybrid Intelligence excels in the deterministic part of the policy, while the neural system excels in the stochastic (uncertain, or unstructured) part.
Action
- The agent takes an action that influences the environment.
- This could be filling in a form, executing an application, adjusting a parameter, or making a recommendation, and so on.
Feedback & Rewards
- The environment returns feedback, typically in the form of a reward or penalty.
- The agent also receives a new observation representing the environment's updated state.
Learning Mechanism
- The agent updates its policy to improve decision-making using a Learning Algorithm, which in turn can be:
  - Model-Free (e.g., Q-Learning, Policy Gradient): Directly learn from experiences.
  - Model-Based: Use a predictive model of the environment for planning.
Loop & Iteration
- This cycle repeats continuously, allowing the agent to learn from experience and improve over time.

The reward function in traditional RL utilises neural net components without symbolic reasoning, which have two characteristics relevant to the discussion in this paper:

Neural networks used within RL, especially for reward functions, are opaque, making it difficult to interpret why a specific reward was assigned to an action.

Reward functions in RL are exclusively focused on maximising cumulative rewards, which can result in unintended behaviours or misaligned optimisation outcomes.

Neuro-Symbolic Reinforcement Learning

Just like in Explainable Neural Networks (XNNs), Explainable Reinforcement Learning (XRL) agents decision-making combines:

RL-like neural network components, replaced by an XNN.

Symbolic Hierarchical Decision Structures, embodying concepts, rules and relationships into a layered framework.

Symbolic Embedding Framework, ensuring logical consistency and modular knowledge processing.

The Symbolic Hierarchical Decision Structure is built on the foundational Symbolic Knowledge Graph that embodies concepts, rules, and relationships into a layered framework, enforcing structured reasoning and preventing contradictions, ensuring decisions follow predefined logic rather than relying solely on statistical patterns.

The  Symbolic Embedding Framework decomposes learned knowledge into distinct components, allowing the processing of causal relationships, symbolic rules, and statistical associations separately while maintaining coherence and enhancing adaptability, precision, and interpretability.

Multi-Modal Vector Spaces integrate symbolic logic with unstructured data formats such as text, images, and numerical inputs to bridge structured reasoning with real-world variability, allowing Agent Services built on Hybrid Intelligence to process diverse data sources while maintaining alignment with a structured knowledge base.

In tandem these mechanisms ensure structured, explainable decisions that align with human reasoning, balancing logical consistency with adaptable knowledge representation.

XRL significantly expands the concept of reinforcement learning by introducing a number of innovations that enable the dual objective of maximising reward and maximising explanation quality.

Key Innovations in XRL

XRL agents can act based on both their observations in the data and on an evaluation of why they are making certain decisions in a process that assess the reasoning behind their actions. Unlike RL agents, XRL agents maximise cumulative reward while maintaining transparency, trust, and accountability. These innovations make a big difference in high-impact applications:

Trust & Adoption: In critical domains, stakeholders demand explainable decisions.

Ethics & Bias Reduction: Introspection helps avoid biased or harmful decisions.

Regulatory Compliance: As AI regulations tighten, explainability becomes essential.

Human-AI Collaboration: Explanations enable better collaboration and understanding.

How XRL agents differ from standard RL agents

Introspection Mechanism
- The agent evaluates its own decisions before acting and checks for consistency, validity, and alignment with goals or ethical considerations. This helps the agent avoid biased or illogical decisions.
Action and Explanation Pairing
- Unlike standard RL agents that only output actions, XRL agents output both actions and explanations which builds trust and transparency by justifying decisions.
- Example: An AI recommending a medical treatment would also explain "why this treatment is best".
Explanation-Based World Model
- The agent maintains a world model that incorporates cause-and-effect relationships which allows the agent to reason about consequences and provide causal explanations.
- It supports symbolic reasoning, enhancing interpretability.
- The world model is used to perform plausibility checks and eliminate algorithmic hallucinations.
Causal and Symbolic Reasoning
- XRL agents use a symbolic hierarchy or knowledge graph to reason explicitly.
- They can infer causal relationships, supporting more meaningful explanations.
- Example: "I chose this action because it leads to a positive outcome according to this rule."
Feedback with Explanations
- XRL agents learn from both rewards and from feedback on explanations.
- If an explanation is unclear or inconsistent, the agent can improve its introspection mechanism.

The explanation models inherent in Hybrid Intelligence allow XRL Agents to introspect by reasoning and “thinking through” the explanation behind a suggested plan of action, before it executes the plan. This means that the learning mechanism is informed by knowledge that is richer in both depth and breadth, enabling it to reason, self-evaluate, and refine its logic.

 
 
 
 
  Aspect
  Standard RL Agents
  XRL Agents
 

  Output
  Action
  Action and Explanation
 

  Objective Function
  Maximize
  reward only
  Maximize

    reward AND
  explanation quality
 

  State Space
  Raw features
  only
  Extended
  with Introspection
 

  Decision Making
  State
  → Action
  State
  → Action + Why?
 

  Exploration Strategy
  Greedy
  Explanation-aware
  exploration
 

  Explanation Depth
  None
  Causal
  depth, testability, symbolic coherence
 

  Symbolic Alignment
  None
  Aligned with
  symbolic hierarchies 
 

  Complexity Control
  None
  Regularized
  for minimal variation (hard-to-vary explanations)
 

  World Model
  Optional,
  Predictive
  Explanation-based,
  causal reasoning with plausibility checks
 

  Transparency
  Opaque
  ("Black Box")
  Transparent
  ("Glass Box")
 

  Trust and Safety
  Limited by
  Unexplained Decisions
  High, due to
  Introspection and Justification
 

  
  
  
 

Key Differences between RL and XRL Agents

Explanation Quality

XRL agents operate a dual-objective function that optimizes both performance and the quality of generated explanations. The explanation component is weighted and assessed using metrics for causal depth, testability, and symbolic coherence. Additionally, a hard-to-vary constraint prevents arbitrary justifications by penalising overly flexible explanations, ensuring consistency and reliability in decision-making.

Causal depth measures how effectively an explanation uncovers the underlying causes of an observed phenomenon, focusing on why something happens by tracing back to root causes and systemic influences. It enables layered insights by exploring immediate triggers, contributing factors, and interconnected mechanisms to provide a comprehensive understanding and empowers actionability by revealing intervention points in the causal chain, making it easier to address issues or optimise processes with intentionality.

Testability measures how well an explanation or model can be evaluated, verified, and challenged based on empirical evidence or logical consistency. A testable explanation allows for structured validation, enabling models to be systematically assessed for accuracy and reliability. This ensures that decisions are based on evidence rather than assumptions, reducing uncertainty and increasing confidence in outcomes. By supporting iterative refinement and falsifiability, testability enables continuous improvement, ensuring that models remain robust and aligned with real-world conditions.

Difficult-to-vary refers to how resistant an explanation is to arbitrary modifications while remaining valid. An explanation with high difficulty-to-vary is one where each component is necessary and interdependent, meaning that altering any part would weaken its coherence or accuracy. This principle ensures that explanations are not constructed post hoc or adjusted to fit specific outcomes but instead reflect fundamental, underlying truths. By enforcing logical consistency and robustness, difficult-to-vary explanations provide higher reliability, trust, rigour and defensibility.

Non-ad-hoc reasoning ensures that explanations and decisions are derived from structured, principled methods rather than arbitrary adjustments or post hoc justifications. A non-ad-hoc explanation remains consistent across different contexts and does not rely on after-the-fact modifications to fit a specific outcome. This principle guarantees that reasoning is stable, predictable, and logically sound, rigorous and internally consistent, preventing bias or retrofitting conclusions to align with desired results.

Symbolic coherence ensures that explanations and decision-making processes align with established symbolic knowledge, including predefined rules, logical structures, and expert-validated principles. Strong symbolic coherence maintains internal consistency, ensuring that outputs are interpretable and meaningfully connected to structured knowledge representations. This reinforces alignment with domain-specific standards, reducing contradictions, and preventing erratic or inconsistent reasoning.

The Benefits of Hybrid Intelligence XRL Agents

XRL agents provide multiple commercially useful benefits over standard RL agents.

Transparency and Interpretability

Clear Decision Justifications provide transparent, human-readable explanations for their actions, ensuring interpretability and accountability.

Enhanced Trust ensures users can trace decisions back to their causal origins, verifying alignment with established reasoning principles, which builds trust and confidence in AI systems.

Causal Self-Assessment and Policy Justification

Self-Analysis of Decisions empowers agents to continuously evaluate the reasoning behind their actions, examining both observed data and learned policies to ensure logical consistency.

Feature Influence Explanations identify and explain which features or symbolic relationships most impacted a decision, enhancing traceability and clarity in reasoning.

Robust and Adaptive Decision-Making

Continuous Improvement through ongoing introspection enables XRL agents refine their decision policies, leading to more accurate and reliable choices over time.

Alignment with Symbolic Models allows agents to detect inconsistencies and adjust their strategies to maintain structured reasoning by comparing learned policies with predefined symbolic rules.

Debugging, Fault Detection, and Bias Mitigation

Comprehensive Debugging becomes possible by maintaining a detailed record of decisions and explanations, XRL agents can identify inconsistencies such as overfitting, symbolic violations, or unstable explanations.

Bias Detection and Ethical Challenges are detected and reduced through introspection by assessing if sensitive features disproportionately influence decisions without causal justification, ensuring fairness and ethical standards.

Refined Exploration and Learning Strategies

Guided Exploration allows XRL agents adapt exploration strategies based on the quality of explanations, optimizing reward maximization while improving decision clarity and precision.

Selective De-Automation provides an effective backstop when explanation quality is low, agents can request human input, reinforcing decision-making with expert knowledge.

Enhanced User Collaboration and Decision Support

Contextual Feedback Loops combined with introspective feedback allow users to engage with the agent’s reasoning process, enabling collaborative decision-making and enhanced understanding.

Dynamic Learning Adaptation allows agents learn from human corrections or feedback, continuously improving their reasoning framework and symbolic alignment.

Angelo Dalli