Hybrid Agentic Computing
What is the difference between standard RL Agents and Hybrid XRL Agents?
Hybrid Intelligence Agents use Explainable Reinforcement Learning (XRL). XRL agents use an explainable version of the Bellman equation, with the addition of an explainable state space and explainable actions.
Explainable actions can consume and/or generate explanations for the current state.
The explanations generated for all states can be stored in the explanation state space, which is used by the XRL agent for introspection.
A new type of operator, the explainable operator, is introduced in XRL, which defines the expected reward for an agent in a particular state and explanation, performing an action while providing an explanation for its action.
XRL agents differ from standard RL agents in the sense that an XRL agent action may act purely on the explanation space without affecting the agent environment.
What we call machine introspection is a chain of purely explanatory actions made possible in the Hybrid Intelligence Framework through XRL. Insert Link to the Agents and XRL article
XRL agents with introspection offer a transformative approach to Agents by prioritizing:
Performance and Causal Rigor: Maximizing both reward and explanation quality.
Symbolic Integration: Grounding decisions in the Hybrid Intelligence symbolic hierarchy.
Self-Assessment: Enabling trust, transparency, and robustness through introspective feedback.
This unique XRL feature is useful in all situations where the explanation, interpretation, justification, model fitting, scenario reasoning, planning, or other similar characteristic/criterion is necessary to achieve an optimal solution to the problem or goal currently being solved by the agent.
Standard RL Action Space (Affects the Environment)
In traditional Reinforcement Learning (RL), the action space consists of all possible actions an agent can take to directly influence the environment. The agent learns purely through trial and error, adjusting its policy based on observed rewards.
Actions are chosen to maximise expected cumulative reward under the learned policy.
The agent does not explicitly evaluate or justify why an action is optimal beyond its observed impact on the reward function.
This reactive approach is analogous to biological learning in animals, adapting behaviour based on past reinforcement without deeper introspection.
Standard RL decision rationale: "I am taking this action because it maximizes my reward under the policy I have learned."
Explainable RL (XRL) Explanatory Space (Affects Internal Model)
Explainable Reinforcement Learning (XRL) introduces a second dimension: Explanatory Space, where the agent learns both which actions maximise reward and also why those actions are justified.
The agent maintains an internal explanation model that tracks causal relationships, decision rationale, and alignment with symbolic reasoning frameworks.
Changes to Explanatory Space alone modify only the agent’s internal representation, without directly affecting the environment—this is introspection (the machine equivalent of thinking through scenarios before acting).
When both Explanatory Space and Action Space are updated together, the agent engages in enhanced justified action, ensuring that actions are not only reward-optimal but also explainable.
XRL decision rationale: "I am taking this action because it maximizes my reward while also improving my understanding and the quality of my explanation according to my internal explanation model."
Technical Side note
for more information and to delve deeper, refer to Dalli et. al, 2020, “Architecture for explainable reinforcement learning”, US11455576B2