The Air Canada chatbot case is the first widely publicized ruling where an enterprise was held financially liable for an LLM hallucination in customer support. Treat it as a production post‑mortem: what failed, where legal duty sat, and how to architect, govern, and operate LLM systems so your company does not become the next headline.
In Moffatt v. Air Canada, a customer asking about bereavement fares was told by the airline’s website chatbot that he could buy a full‑fare ticket and later claim a bereavement discount. That retroactive discount did not exist in Air Canada’s written policy, yet the chatbot stated it as fact on the official site, leading to a tribunal award of roughly CA$812 when the airline refused to honor it.[9][10][11]
This was a live production system giving policy‑level assurances, and a court treated those words as the airline’s own.[9][12]
1. Deconstructing the Air Canada Chatbot Failure
When Jake Moffatt’s grandmother died, he used Air Canada’s website chatbot to ask about bereavement fares.[10] The bot:
- Described a policy allowing him to book immediately and apply for a refund after travel.
- Linked to a general bereavement page that did not contain those terms.[9][12]
After his trip, human agents denied the refund, saying discounts had to be applied before travel and no retroactive benefit existed. When Moffatt cited the chatbot transcript, Air Canada argued:
- The bot was a separate tool.
- General accuracy disclaimers on the website shielded the airline from responsibility.[9][12]
The Canadian Civil Resolution Tribunal rejected this. It held that:[9][12]
- The chatbot was part of Air Canada’s website and thus a communication channel for the airline.
- Customers could not reasonably distinguish between policy pages and chatbot answers on the same domain.
- Inconsistency between chatbot and written policy was the airline’s problem, not the customer’s.
💼 Key takeaway: There was no “AI interface” carve‑out. The chatbot’s statements were attributable to Air Canada like those of a human agent or static page.[9]
This aligns with broader evidence:
- Leading LLMs frequently make confident but incorrect statements in legal contexts, including mischaracterizing statutes and fabricating citations.[1]
- A mental‑health nonprofit suspended its chatbot after it gave harmful advice to people with eating disorders, showing regulators and the public see AI‑mediated communication as inseparable from the sponsoring organization.[5]
Mini‑conclusion: The failure chain was not just “the model hallucinated.” It was:
- Policy‑level answers,
- On an official domain,
- Without guardrails or cross‑checks,
- Backed by a legal strategy that tried to blame the interface.
This raises the question: where does liability actually sit when LLMs are embedded in customer‑facing flows?
2. Where Liability Actually Sits in LLM Systems
The tribunal’s reasoning matches a broader trend: enterprises remain responsible for AI‑mediated advice just as for human agents and scripted content.[4][9]
In financial services, governance frameworks state that generative AI used in client interactions must meet existing suitability, conduct, and disclosure standards.[4] There is no exemption because the channel is probabilistic or powered by a third‑party model.[4][6]
Boards and executives are expected to treat AI risks as part of enterprise risk management, not as an R&D side project.[2][4] Frameworks emphasize:
- Board‑level accountability for AI risk.
- Clear ownership for model design, deployment, and monitoring.
- Integration of AI incidents into existing risk and compliance structures.[2][4]
⚠️ Warning: Scapegoating “the AI team” after an incident conflicts with emerging best practice and will likely look evasive to regulators and tribunals.[2][4]
Generic website disclaimers are weak protection when an LLM gives specific, authoritative statements about entitlements—discounts, refunds, legal rights. The duty of care is far higher than for vague marketing copy.[2]
As agentic AI systems gain capabilities to plan and act—rebooking passengers, issuing credits, touching payment flows—the risk profile starts to resemble unauthorized operational actions, not just bad information.[3][5]
Policy experts expect more activity on deceptive practices, consumer protection, and sector‑specific AI rules, with proposals pointing toward more accountability, not immunity, for harms caused by deployed AI.[6][8]
Mini‑conclusion: Liability is structurally anchored in the deploying enterprise. Vendors, models, and disclaimers shape contracts but do not move the legal duty off your balance sheet.
With that allocation of responsibility, the next question is how engineering and LLMOps can reduce the risk of policy‑level hallucinations.
3. Engineering and LLMOps Controls to Prevent Policy Hallucinations
Hallucinations about legal terms, pricing, and policy are not just quality issues; they are a distinct risk class that can instantly create enforceable expectations, as Air Canada learned.[1][9]
Grounding in canonical policy
For legally consequential flows:
- Use retrieval‑augmented generation (RAG) over versioned, canonical policy documents.
- Require the model to cite the specific policy section it uses.
- Disallow answers when no relevant policy snippet is found.[1][2]
Research on AI risk management stresses task‑specific controls: LLMs answering legal or policy questions should be tightly constrained and auditable, not allowed to improvise benefits or rights.[1][2]
💡 Design pattern: Treat the LLM as a natural‑language interface to an immutable policy store, not as a policy engine.
Policy‑aware orchestration and guardrails
An LLM gateway or orchestration layer should enforce:
- Schema‑validated response templates (e.g., fields for “policy citation,” “effective date”).
- Deterministic business‑rules checks before including any entitlement or discount.
- Safe refusal patterns such as “I cannot find a policy that allows that; here is the official policy link.”[2][3]
For high‑impact actions—fare changes, refunds, credits—agentic flows must route through backend services that enforce canonical rules. The model can propose actions, but the service decides, logs, and enforces constraints.[3][7]
⚡ Critical control: Do not let the model directly commit transactions or update customer records without a rules engine or human in the loop.[3][7]
Continuous evaluation and incident handling
Reliability practices from safety‑critical domains are increasingly applied to LLMs:
- Benchmarks and synthetic tests focused on legal and policy queries.
- Regression testing whenever you change models, prompts, or knowledge bases.
- Monitoring for drift in hallucination rates on key flows.[1][2][7]
When something goes wrong, treat it like a security or safety incident:
- Capture full transcripts and metadata.
- Perform root‑cause analysis across data, prompts, and orchestration layers.
- Feed outcomes into updated guardrails and governance forums.[2][5]
Mini‑conclusion: Architect for policy fidelity as a first‑class non‑functional requirement, alongside latency and cost. LLM behavior alone is not a control surface; your gateway, knowledge, and rules are.
Technical controls only work if governance and ownership are aligned.
4. Cross‑Functional Governance and Implementation Roadmap
The Air Canada case exposed diffusion of responsibility: the chatbot was treated as something “other” than the airline’s own voice.[9][12] Governance must close that gap.
Build an AI risk register
Explicitly list “contractual misrepresentation by LLM interfaces” as a top‑tier risk in your AI risk register, with named owners in:
📊 Governance move: When a tribunal asks “Who was responsible for ensuring this chatbot did not misstate policy?” you should have a clear, documented answer.[2][4]
Adopt structured AI governance
Adapt frameworks from regulated sectors:
- Maintain a model inventory with purposes, data sources, and owners.[2][4]
- Require use‑case approvals for customer‑facing models.
- Document the legal basis and control set for each AI‑mediated interaction.[4]
Phase deployments:
- Internal copilots, where errors are buffered by trained employees.
- Customer‑facing, read‑only assistance tightly grounded in existing content.
- Action‑taking agents, only after controls and monitoring have matured.[3][5]
UX, disclosures, and oversight
For external chatbots:
- Provide visible, plain‑language disclosures about capabilities and limits.
- Encourage users to verify key entitlements via links to canonical documents.
- Highlight when answers are based on specific policy documents and effective dates.[1][4][10]
Integrate AI incident reviews into existing risk and compliance committees, not isolated postmortems inside AI teams.[2][4][8] Monitor evolving AI policy, especially around deceptive practices and digital consumer protection, and treat cases like Air Canada’s as early signals of how tribunals will interpret fairness and transparency duties.[6][8][9]
💼 Governance principle: Generative AI is not a parallel universe. It belongs inside your existing risk, legal, and operational frameworks.
Mini‑conclusion: Cross‑functional governance turns technical controls into defensible practice. Without it, even well‑engineered systems can become legal liabilities.
The Air Canada chatbot ruling crystallizes a simple reality: once LLMs enter customer‑facing flows, their words are your words.[9] Engineering, LLMOps, and legal teams must jointly design systems where models cannot invent benefits, misstate policies, or act without guardrails. By grounding outputs in canonical policies, embedding strong orchestration and monitoring, and aligning governance with emerging regulatory expectations, you can capture LLM value without inheriting avoidable liability.[1][2][4]
Use this case as a tabletop exercise: map where your current chatbots and copilots could misrepresent rights or entitlements, quantify the financial and regulatory exposure, and prioritize a cross‑functional hardening sprint that brings your architecture, controls, and governance up to the standard this ruling implicitly demands.
Sources & References (10)
- 1Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive
Pitiphothivichit/iStock A new study finds disturbing and pervasive errors among three popular models on a wide range of legal tasks. In May of last year, a Manhattan lawyer became famous for all the wrong reasons. He submitted a legal brief generated largely by ChatGPT. And the judge did not take
- 23 AI Risk Management Frameworks for 2025 + Best Practices
e. In three separate incidents, employees pasted sensitive data, including proprietary semiconductor designs, into the chat. They didn’t realize that their inputs could be used to train future models. Samsung’s immediate response was to ban ChatGPT. While drastic, this stop-everything reaction is c
- 3Securing Agentic AI — A CISO playbook for autonomy, guardrails, and control
Unlike traditional chatbots that only respond to prompts, agentic AI systems can autonomously plan and act across IT systems. This autonomy brings transformative efficiency — e.g. automating software fixes or customer support — but also introduces new cyber risks that boards must grasp. Key concerns
- 4FINOS AI Governance Framework:
AI, especially Generative AI, is reshaping financial services, enhancing products, client interactions, and productivity. However, challenges like hallucinations and model unpredictability make safe deployment complex. Rapid advancements require flexible governance. Financial institutions are eager
- 5THE AI-FICATION OF CYBERTHREATS: TREND MICRO SECURITY PREDICTIONS FOR 2026
or operational disruptions. Some organizations might even adopt agentic AI in sensitive domains without sufficient safeguards, increasing the likelihood of operational, safety, or security incidents. Agentic capabilities are not just lucrative for business; they also appeal to cybercriminals and n
- 6Expert Predictions on What’s at Stake in AI Policy in 2026
ouse AI policy czar David Sacks’ proposal on preemption, which will likely contain a twisted version of California’s SB 53 with “carveouts” that won’t actually protect vulnerable groups like children. There will be lots of hand wringing over proposed legislation, but massive settlements and judgmen
- 7Predicting the Six Biggest Impacts AI Will Have on OT Cybersecurity
No facet of manufacturing will be spared. Jan 7, 2026 Artificial intelligence continues to be the source of the most optimism, pessimism, anxiety, predictions, conversations, forecasts, reports, surveys and debate throughout the industrial realm. Whether you're bullish, bearish or just confused on
- 8Expert Predictions on What’s at Stake in AI Policy in 2026
J.B. Branch, Ilana Beller / Jan 6, 2026 J.B. Branch is the Big Tech accountability advocate for Public Citizen’s Congress Watch division, and Ilana Beller leads Public Citizen’s state legislative work relating to artificial intelligence. US President Donald Trump displays a signed executive order
Documents privés (2)
- 9Air Canada must pay damages after chatbot lies to grieving passenger about discount
th this situation – a support bot telling him the wrong info – Moffatt took the airline to a tribunal, claiming the corporation was negligent and misrepresented information, leaving him out of pocket.
- 10Air Canada chatbot promised a discount. Now the airline has to pay it.
Canada airline to pay customer after chatbot gave false information - The Washington Post =============== Democracy Dies in Darkness ![Image 1](https://www.washingtonpost.com/wp-apps/imrs.php?src=ht
This article was generated by CoreProse
AI-powered content with verified sources and automatic fact-checking