
Cecuro’s specialized AI flags 92% of exploited DeFi contracts
Specialized AI outperforms general models in DeFi exploit detection
An open benchmark created by security firm Cecuro compared a purpose-built analysis agent against a general coding assistant using the same underlying frontier model. The test set contained ninety exploited smart contracts, collectively accounting for verified losses of roughly $228 million; the specialized workflow surfaced vulnerabilities linked to a substantially larger share of that value.
Cecuro’s approach layers structured review steps, DeFi-specific detectors and targeted heuristics on top of a base model, rather than relying on out-of-the-box prompts or single audits. That architectural choice produced a detection outcome far above the baseline, demonstrating how an application layer can change security efficacy even when the underlying model is identical.
The company released the evaluation framework, the dataset and a reference baseline on GitHub, while withholding its full agent implementation to avoid potential misuse. The public materials let other teams reproduce the comparisons and test defensive methods against recorded incidents.
This benchmark arrives as adversarial tooling rapidly improves: recent external studies indicate automated exploit capability has been accelerating, bringing down the marginal cost of scanning and exploitation. That wider arms race — easier offensive tooling versus defensive adaptation — frames why specialized detection strategies matter now.
Several contracts in the benchmark had passed prior professional audits but were still compromised, underlining gaps in conventional review practices. Cecuro argues these gaps are addressable by repeatable, domain-aware procedures rather than generalist AI alone.
- Dataset: 90 live exploited contracts (Oct 2024–early 2026).
- Verified losses covered by dataset: about $228M.
- Cecuro-detected exploit value: roughly $96.8M.
The baseline agent, built from a GPT-5.1 coding stack, captured a noticeably smaller slice of high-value vulnerabilities under the same test conditions. That gap points to the limits of one-off audits or general-purpose assistants when facing sophisticated, multi-stage DeFi weaknesses.
For practitioners, the takeaway is tactical: embed protocol-aware checks, perform structured multi-pass reviews and prioritize heuristics that are proven on historical incidents. For defenders, open benchmarks provide a practical yardstick to measure progress against real loss events.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you
OpenAI unveils EVMbench to benchmark AI for smart-contract security
OpenAI released EVMbench, a new evaluation framework that measures AI systems’ ability to detect, exploit in test conditions, and remediate vulnerabilities in EVM-compatible smart contracts. Built with Paradigm and drawing on real-world flaws, the benchmark aims to create a repeatable standard for assessing AI-driven defenses around code that secures large sums of on‑chain value.

Consensys’ Linea integrates Phylax’s Credible Layer to block smart-contract exploits at the protocol level
Linea has incorporated Phylax Systems’ Credible Layer to enforce developer-defined safety rules on-chain, preventing certain exploit scenarios before they execute. The move is positioned to reduce security risk for DeFi builders and to make the network more attractive to institutional users.
