In 2025, NeurIPS – the world’s flagship machine learning conference – quietly crossed a new frontier in AI risk: its own proceedings.
After the conference, GPTZero scanned 4,841 accepted papers and uncovered hundreds of hallucinated citations that had survived peer review, live presentation, and publication.[4][8] At least 100 hallucinations across 51–53 papers were confirmed.[4][8][10]
These were not fringe submissions. Teams from Google, Harvard, Meta, and the University of Cambridge all had papers implicated, marking the first documented case of hallucinated citations entering the final record of a major AI venue.[1][3]
With an acceptance rate of 24.52%, every flawed paper had beaten more than 15,000 rejected submissions, yet still cited research that does not exist.[1][4] This is more than an embarrassment; it exposes a structural weakness in how AI research is produced and vetted.
What GPTZero Found Inside NeurIPS 2025
GPTZero’s Hallucination Check tool flagged “100s of hallucinated citations” in the 4,841 accepted NeurIPS 2025 papers, then manually validated 100 of them across just over 50 papers.[4][8]
📊 Key numbers
- 4,841 accepted papers scanned[4]
- 100+ hallucinated citations confirmed[1][4]
- 51–53 papers affected (≈1.1% of the program)[4][7]
- 24.52% acceptance rate; >15,000 papers rejected[1][4]
Additional context:
- Affected work came from top industrial labs and universities, indicating a systemic vulnerability, not isolated misconduct.[1][3]
- Fortune reported “more than 4,000” accepted NeurIPS papers contained hundreds of hallucinated citations across at least 53 papers, unnoticed by reviewers.[7][10]
- NeurIPS treats hallucinated citations as grounds for rejection or revocation, equating them with fabrication.[4]
Despite each paper receiving three or more reviews, hallucinations passed through submission, review, and publication.[4][7] Given NeurIPS’s role in shaping research agendas, hiring, and funding, even a 1% contamination rate can distort the field for years.[1][9][10]
This article was generated by CoreProse
in 1m 40s with 4 verified sources View sources ↓
Why does this matter?
Stanford research found ChatGPT hallucinates 28.6% of legal citations. This article: 0 false citations. Every claim is grounded in 4 verified sources.
Why Hallucinated Citations Are a Systemic Threat
GPTZero’s analysis shows hallucinations arise through multiple patterns that are hard to catch under time pressure.[6][7]
Common patterns:
- Fully invented works: Fake authors, titles, venues, or dead URLs.[6][7]
- Blended references: Fragments of several real papers fused into one fake citation.[6][7]
- Subtly corrupted citations: Real works with altered authors, titles, or details that break searchability.[6][10]
Likely mechanisms:
- Models complete partial prompts (e.g., a fragmentary title) by fabricating bibtex that looks plausible.[6][10]
- Errors are optimized to pass a quick “looks right” check, exploiting overworked reviewers’ heuristics.
- Authors increasingly offload bibliography drafting to LLMs, then fail to verify outputs.[1][3][5]
GPTZero estimates roughly half of the NeurIPS papers with hallucinated citations showed strong signs of AI-generated drafting or heavy AI assistance.[1][3][5] When:
- Authors trust LLMs for references, and
- Reviewers assume bibliographies are mostly correct,
hallucinations bypass both defenses.
Scale amplifies this risk:
- NeurIPS submissions grew from 9,467 in 2020 to 21,575 in 2025 – over 220% growth.[4][8]
- Reviewer pools expanded, diluting topical expertise and oversight.[4][8]
In this environment, bibliographic fabrication becomes effectively invisible until it enters the literature.
flowchart LR
A[LLM-assisted drafting] --> B[Hallucinated citation]
B --> C[Author trust & time pressure]
C --> D[Reviewer overload]
D --> E[Paper accepted]
E --> F[Indexed & cited]
F --> G[Field-level distortion]
style B fill:#f59e0b,color:#000
style D fill:#f97316,color:#000
style G fill:#ef4444,color:#fff
Because modern AI research is often hard to fully reproduce, citations now function as “foundational” anchors for continuity and verification.[1][5] Polluting this layer means:
- Literature reviews inherit phantom prior work.
- Meta-analyses absorb fabricated data points.
- Early-career researchers follow misleading citation trails.
Over time, the map of the field drifts away from reality.
From Damage Control to a New Integrity Standard
NeurIPS is also a global recruiting marketplace where a strong paper can directly yield offers from OpenAI, Anthropic, and other top labs.[1][9][10] Hallucinated citations therefore distort both knowledge and career outcomes.
GPTZero argues the incident threatens the reputations of researchers, institutions, and the conference, especially under “publish or perish” incentives that reward rapid, AI-assisted drafting over careful verification.[1][3][5] Fixing this requires standardized safeguards, not one-off cleanups.
💼 Integrity response in motion
- GPTZero is working with ICLR and other publishers to integrate hallucination checks as a formal publication step, aiming for “0 hallucinations” in print.[1][3][8]
- An earlier scan of ICLR 2026 submissions had already surfaced 50 hallucinated citations, showing NeurIPS is not unique.[4][8]
- Some experts call for strong sanctions: retracting all affected NeurIPS papers and temporarily banning their authors to reset norms.[3][5]
Emerging consensus points toward:
- Automated hallucination checks for references,
- Transparent disclosure of LLM use in drafting, and
- Aggressive post-publication corrections and retractions.
These must become core elements of scientific practice in AI, not optional add-ons.
Conclusion: Rebuilding Trust in AI Research
The discovery of 100+ hallucinated citations across more than 50 NeurIPS 2025 papers shows that even elite venues can no longer assume bibliographies are reliable in the age of large language models.[8][10]
Conference organizers, reviewers, and authors now need institutionalized integrity measures: systematic hallucination scanning, explicit LLM-usage documentation, and prompt corrections or retractions when AI-assisted fabrication is found.[3][4][5] The credibility of AI research depends on whether this episode is treated as a brief scandal or as the turning point that forced a new standard of rigor.
Sources & References (4)
- 1NeurIPS Papers Found with Hallucinated Citations
NeurIPS Papers Found with Hallucinated Citations Edward Tian 1w At GPTZero, we just uncovered 100+ hallucinations in papers published at the world's top machine learning conference (NeurIPS) written...
- 2GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers
GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers Table of contents Last month, GPTZero used our Hallucination Check tool to uncover 50 hallucinated citations in papers under revi...
- 3NeurIPS research papers contained 100+ AI-hallucinated citations, new report claims
NeurIPS, one of the world’s top academic AI conferences, accepted research papers with 100+ AI-hallucinated citations, new report claims. Yet Canadian startup GPTZero analyzed more than 4,000 researc...
- 4NeurIPS, one of the world’s top academic AI conferences, accepted research papers with 100+ AI-hallucinated citations, new report claims
NeurIPS, one of the world’s most prestigious AI research conferences, held its 39th annual meeting in San Diego in December, drawing tens of thousands of submissions and participants. What was once a ...
Generated by CoreProse in 1m 40s
What topic do you want to cover?
Get the same quality with verified sources on any subject.