Documented AI Incidents
Hallucinations, ghost sources, RAG failures: understand and prevent common AI agent issues.
AI Hallucinations - RAG best practices - Ghost sources - KB Drift - Chunking strategies
Articles
ClawHavoc Exposed: How 824 Malicious LLM Skills Infected the OpenClaw Marketplace
🌀Hallucinations824 “skills” turned a trusted marketplace for large language models into an adversarial toolchain, quietly riding on verified badges and production AI agents.[9] ClawHavoc shows how one compromised ma...
10 min2032 wordsOWASP GenAI Q1 2026 Exploit Round-up: From Flowise RCE to Claude-Assisted Breaches
1. Why GenAI Exploits Are Accelerating in 2026 OWASP’s LLM Top 10 treats GenAI as a distinct attack surface, not “just another API.”[1] It formalizes risks such as prompt injection, data leakage, ina...
10 min1932 wordsHow an AI Coding Agent Triggered a Recursive Deletion Disaster in May 2026 (and How to Architect for Failure Containment)
In May 2026, two incidents made clear that AI coding agents are no longer “IDE assistants” but autonomous actors capable of destroying production systems at machine speed. - At PocketOS, a Claude Opu...
11 min2224 wordsAnthropic Mythos vs OpenAI GPT‑5.5: How to Engineer with Hacking‑Capable AI Under Scrutiny
Anthropic’s Claude Mythos Preview and OpenAI’s GPT‑5.5/GPT‑5.5‑Cyber are not simple chatbots; they are cyber co‑pilots that can surface real vulnerabilities in complex codebases and browser engines. [...
10 min2003 wordsAnthropic Mythos vs OpenAI GPT‑5.5‑Cyber: Architecting with Hacking‑Capable AI Models Safely
From Mythos to GPT‑5.5‑Cyber: why hacking‑capable LLMs exist now Anthropic’s Mythos/Glasswing and OpenAI’s Daybreak launch with GPT‑5.5‑Cyber mark a 2026 shift: cyber‑optimized large language models...
10 min1940 wordsAnthropic Mythos vs OpenAI GPT‑5.5: Are ‘Hacking‑Capable’ Frontier Models a Cybersecurity Time Bomb?
Two of the world’s most advanced large language models—Anthropic’s Mythos and OpenAI’s GPT‑5.5—are arriving in enterprises as governments warn that generative AI is reshaping state‑backed hacking.[1]...
7 min1361 wordsAnthropic Mythos vs OpenAI GPT‑5.5‑Cyber: Hacking‑Capable AI Under Security Scrutiny
1. From Research Demos to Operational Hacking‑Capable Models Anthropic’s Mythos preview and Glasswing program showed that frontier models can scan large, real production codebases for subtle security...
11 min2100 wordsInside Japan’s Digital Agency GENAI Stack for Secure Government AI
Japan’s public sector wants generative AI for faster policy work, better citizen services, and smarter operations—without losing sovereignty, compliance, or trust. The Digital Agency must build a G...
8 min1512 wordsGrok V9-Medium: 1.5T Model Architecture & MLOps Guide
Grok AI’s V9-Medium 1.5T model lands in a world where GPT-5.4, Gemini 3.x, and strong open-source models are already routine production tools with strict SLOs, observability, and governance. [6][2] T...
9 min1874 wordsAnthropic Mythos vs OpenAI GPT‑5.5: Are Hacking‑Capable LLMs a Cybersecurity Time Bomb?
Frontier large language models are shifting from autocomplete tools to semi‑autonomous digital workers that operate software, write complex code, and orchestrate tools over long tasks.[2] The same sys...
7 min1344 wordsHow ServiceNow Uses AI and Automation to Power the Agentic Enterprise
Enterprise teams no longer want “one more chatbot” on the ITSM portal. They want workflows that interpret signals, pull context, decide, and execute across tools—with humans stepping in only where jud...
7 min1491 wordsGPT‑5.5‑Cyber vs Anthropic Mythos: Scrutinizing Hacking‑Capable AI in Production
Security‑specialized large language models (LLMs) have moved from demos into core systems. By 2026, ~83% of CAC 40 companies run at least one LLM in production [1], powering: - Conversational co‑pilo...
12 min2305 words
Topics Covered
AI Hallucinations
Understanding why LLMs invent information and how to prevent it.
RAG Best Practices
Retrieval Augmented Generation: architectures, chunking, optimal retrieval.
Ghost Sources
When AI cites sources that don't exist. Detection and prevention.
KB Drift
How to detect and correct knowledge base drift.
Chunking Strategies
Optimal document splitting for better retrieval.
LLM Evaluation
Metrics and methods to evaluate AI response quality.
AI Regulation
Laws, regulations and compliance frameworks governing AI systems.
AI Safety
Risks, safeguards and best practices for safe AI deployment.
Need a reliable KB for your AI?
CoreProse builds sourced knowledge bases that minimize hallucinations.