Crossfactual AI: How Charli is Verifying the Verifiers
Fact-checking Large Language Models with Agentic AI and Contextual Cross-Retrieval Augmentation
“I have the privilege of authoring this article, but the credit truly belongs to the brilliant minds behind the work—our exceptional team of scientists, including Dr. Iman Saberi, Amirreza Esmaeili, and our Chief Scientist, Dr. Elham Alipour, along with Dr. Fatemah Fard’s outstanding team at UBC.”
One of the core intellectual property pillars embedded within Charli’s Adaptive Agentic AI architecture is its advanced capability for financial crossfactual verification—a mechanism designed to validate and challenge the outputs of large language models (LLMs) through structured reference checks against trusted ground truth data. This capability is central to maintaining accuracy in financial and analytical contexts, where even minor hallucinations or misattributions can have significant consequences.
Most LLM-based systems operate under a dangerous assumption: that if an output sounds plausible, it is plausible.In the high-stakes world of financial AI, we don’t get that luxury. Hallucinated figures, out-of-date references, or subtle inconsistencies can erode trust, misinform investors, and amplify risk.
At the heart of this process is Charli’s Fact Check Analysis (FCA) subsystem—part of the broader Forensic AI toolkit. FCA is engineered not simply as a post-processing layer but as an active, model-integrated verifier that operates in conjunction with LLM outputs. It evaluates the semantic alignment, temporal consistency, and citation integrity of generated responses against a curated and dynamically weighted set of ground truth references.
As demonstrated in the video, the FCA model systematically compares the outputs from LLMs like ChatGPT against structured source data—evaluating factual accuracy with a probabilistic scoring system. The system flags discrepancies not just at a sentence level, but at the claim level, annotating outputs with confidence scores that reflect both the semantic proximity to truth and the provenance traceability of cited information.
Notably, the Fact Check Analysis engine doesn’t tolerate mediocrity. In the video, you'll observe responses with ~70% confidence ratings—levels that might pass muster in general-use GenAI applications but are categorically filtered out within Charli. Our threshold for surfacing actionable AI-derived insights typically exceeds 90% confidence, ensuring that only high-fidelity results propagate into investor dashboards or decision-making environments.
This fact-checking capability is currently undergoing third-party validation as part of an independent academic collaboration, with plans to release a research version of the model to institutions including the University of British Columbia. We believe strongly in advancing the state of verifiable AI through open collaboration, and are committed to contributing IP that supports trusted, explainable, and academically rigorous AI methodologies.
Beyond RAG
Another related and foundational component of Charli’s architecture is a retrieval method that significantly outperforms conventional Retrieval-Augmented Generation (RAG) by ensuring that only contextually appropriate and reliable reference material is used during analysis. Charli’s proprietary approach—Contextual Cross-Retrieval Augmentation (CCRA)—elevates retrieval accuracy by layering in temporal constraints, precise entity resolution, and discourse-aware segmentation. The result is not just relevant inputs, but situationally coherent ones—crucial in dynamic domains like finance, where even subtle context shifts (such as referencing outdated M&A events or misaligned quarterly data) can lead to misleading or invalid outcomes.
Where this is going
The FCA model and CCRA are just two components in Charli’s comprehensive architecture for Agentic inference and explainability. As we continue to evolve our pipeline, we’re exploring deeper integrations with other forensic subsystems, including counterfactual simulation engines, citation path tracing, and error propagation detectors. These tools are not optional in high-stakes domains—they’re foundational.