TMCnet News

Causal Dynamics Lab outperforms Anthropic and OpenAI in multiple coding tests

[May 05, 2026]

Causal Dynamics Lab outperforms Anthropic and OpenAI in multiple coding tests

San Francisco, CA, May 05, 2026 (GLOBE NEWSWIRE) -- AI coding tools are now producing code faster than teams can check what it will do in real use. Today, Causal Dynamics Lab (CDL) announced new research explaining why this happens, along with a new product called Cielara Code. This product achieved the highest accuracy in code localization among AI coding tools, outperforming both Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) across three independent tests.

CDL studied how coding agents operate by tracking their actions across thousands of coding sessions. They found 56.8% of agents' actions involved reading files, and 24.2% involved using grep. Less than 1% of their actions were actual code edits. The problem was not that agents couldn't write code; they had difficulty finding the correct code to edit. The situation worsened with more complex tasks: when a correct fix involved more than six files, the agents' ability to recall the necessary information dropped significantly, and the computing power used in failed attempts increased by a factor of 4 compared to successful ones.

"Every coding agent out there today uses grep, which is like a surgeon operating without imaging," said Hasibul Haque, CEO at Causal Dynamics Lab. "We created Cielara Code to help agents see better: it provides a clear understanding of the working environment, making the reasons behind each change clear and verifiable."

Causal Dynamics team.

The 2025 DORA report showed the use of AI coding tools led to a 7.2% drop in deployment stability. AWS CTO Werner Vogels called this problem "dynamic verification debt." A well-known issue with Claude Code (GitHub issue #42796) illustrates the same problem on a larger scale: current agents treat code as flat text without showing how files connect, how functions call each other, or how changes affect the overall system.

How Cielara Code works
Cielara Code uses a model to represent a customer's production environment in a 6-layer causal graph. This graph includes information on what the code does, why it was created, who owns it, its limitations, where it runs, and what happens at runtime. If there is a failure, it can be linked back to the specific code change, the developer who approved it, and the reason for that change. Before an agent begins to explore, Cielara Code builds a Code Dependency Causal Graph. This graph tracks four types of relationships, allowing the agent to navigate the structure rather than just look through files one by one.

Benchmark results
Across three independent benchmarks, Cielara Code beat both Claue Code (Opus-4.6) and OpenAI Codex (GPT-5.4) at the hardest part of agent work: finding the right place to make a change. Overall localization accuracy hit 0.774, versus 0.738 for Claude Code and 0.707 for Codex. On MULocBench (1,033 issues across 46 repositories), Cielara reached 0.752 recall@5 versus 0.727 for Claude Code, and cut mean task time from 141.84 to 128.62 seconds. The result: fewer wrong-file edits, fewer failed runs, and 30 to 40 percent lower compute cost per task.

REASONARA: causal memory at enterprise scale
Cielara Code makes this practical through REASONARA, a graph-structured causal memory layer that stores 125M+ tokens of effective context but retrieves only what matters for each query. A typical lookup uses 1,000–2,500 tokens, compared with 23,000–115,000 for full-context approaches — a reduction of up to 98%. On independent benchmarks, REASONARA scores 94% on UltraDomain, 92% on LoCoMo, 73% on LoCoMo-plus, and 87.4% on LongMemEval, and runs 5–8× faster than Codex high-reasoning mode. The roadmap targets a one-billion-token context window.

Cielara Code is a safety layer for AI coding agents. It aims to enhance the safety of their output rather than replace them. Currently, 11 Fortune 100 and over 40 Fortune 500 companies use Cielara Code on their codebase.

"Board members and auditors expect more proactive risk management. Leaders now want proof that security can anticipate risks caused by fast-moving AI and automation, instead of just reacting after incidents," said the CISO of one of the largest law firms in the United States, who is also a Cielara Code customer.

Phillip Miller, Vice President, Global Chief Information Security Officer, H&R Block added: “Enterprises need solutions to problems they cannot solve with people alone. Cielera's technology is a generational leap towards the original promise of AI: tackling complexity 7x24 with acquired knowledge, deep reasoning, and unbeatable accuracy. For engineering teams, this means a single engine to discover faults in real-world deployments (including legacy, cloud) and provide clear resolution steps. When I wrote, Hacking Success, I described a world where AI needs strong, directive policy (not rules / guardrails) to be safe and effective. Information Security lags behind the innovation curve, as most options rely on legacy thinking including posture, gateways, and logging. Enterprises now have an option to leverage Cielera's models to oversee deployments of AI agents, models, and their supporting infrastructure.”

The team
The team has strong skills based on the problem they are addressing. CEO Hasibul Haque led platform engineering at Uber during its rapid growth. CTO Ryan Turner was a Staff Engineer at Uber and helped maintain the SPIRE Project within the Cloud Native Computing Foundation (CNCF). R&D is led by Dr. Xuchao Zhang, who worked at Microsoft Research, and Dr. Liang Zhao from Emory University, who has 200+publications and is ranked among the top 2% of scientists by Stanford University. CDL has a formal research partnership with Emory's AI Lab.

"AI has already changed how people find information. The next step is to change how people make decisions by exploring possibilities, comparing options, and understanding the outcomes before making a choice," said Matt Fisher, former Co-Founder and CTO of Daydream and an Adjunct Professor at Brown University. "That shift towards exploring outcomes is what CDL is focusing on."

What's next
The Production World Model serves as a foundation. Cielara Code and REASONARA are the first products to use this foundation. In the future, Causal Dynamics Lab will fully simulate the effects of changes in code, infrastructure, policy, and operation. This will create a permanent reasoning layer in the enterprise system that any AI agent can access before making changes that affect production.

Media images can be found here.

Methodology: Benchmarks were run against Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) using the publicly available MULocBench, UltraDomain, LoCoMo, and LongMemEval test harnesses. Full methodology, configuration, and reproduction instructions are available at [research.causaldynamics.com/benchmarks].

About Causal Dynamics Lab
Causal Dynamics Lab builds validation infrastructure for AI-generated software. Its platform, Cielara, predicts how proposed changes will behave in production before they ship, powered by REASONARA, a graph-structured causal memory system. The company was founded by former Uber platform engineers and AI researchers from Microsoft Research and Emory University, including a Stanford Top 2% Scientist with 200+ publications at NeurIPS, ICLR, and KDD. Headquartered in San Francisco. CausalDynamics.com.

For further information please contact the Causal Dynamics Lab press office via Bilal Mahmoood on [email protected] or +447714007257.

[ Back To TMCnet.com's Homepage ]

ITEXPO Begins in:

TMCnet News

Causal Dynamics Lab outperforms Anthropic and OpenAI in multiple coding tests