June 4, 2026

Does legal AI still hallucinate? The honest answer for a small firm

Yes, legal AI still hallucinates, including the tools built specifically for legal work. A Stanford study published in the Journal of Empirical Legal Studies tested the leading legal research tools and found they produced false or misleading answers between 17 and 33 percent of the time, while a general chatbot was wrong about 43 percent of the time. The "built for law" label lowers the error rate. It does not remove it, and the firms that stay out of trouble treat the output as a draft to check, not an answer to trust.

Does legal AI still hallucinate?

Yes. In the Stanford research (Magesh and colleagues, testing tools in 2024), Lexis+ AI hallucinated about 17 percent of the time and Westlaw AI-Assisted Research about 33 percent, against roughly 43 percent for GPT-4 on its own. These were not consumer chatbots guessing from memory. They were the legal-specific research products, the ones sold to firms as the safe option. The takeaway most lawyers drew was blunt: the tools made for law still invent things, just less often than a raw chatbot.

Why does a tool built for law still make things up?

Because the engine underneath is still a language model, and language models generate text by predicting what reads as right. Legal tools shrink the problem with retrieval, often called RAG: instead of answering from memory, the tool first pulls real cases and statutes, then writes its answer on top of that material. That grounding cuts the error rate sharply, which is why the legal tools beat the general chatbot. The catch is the writing step. The model can still misread a source, overstate what a case holds, or attach a citation that does not support the sentence. In one Stanford example, a system stated that a procedural rule said something it does not. The source was real. The claim about it was not.

What do these hallucinations actually look like?

They are rarely an obvious fake, and the worst kind is the one that reads correctly. They come in three shapes. First, outright fabrication: a case or citation that does not exist. Second, mischaracterization: a real case cited for a point it never made. Third, misapplied authority: a real, correctly described case that does not fit the question, because it comes from the wrong context or has been overtaken. The invented case is easy to catch once you look. The confident, cleanly formatted answer that reads exactly like good law, and is wrong, is the one that slips into a filing.

Why are lawyers starting to blame the AI vendors?

Because the marketing promised more than the tools deliver. One vendor advertised "100% hallucination-free linked legal citations" and later narrowed that to mean only the links, conceding no tool can guarantee full accuracy. Another said its product avoids hallucinations by relying on trusted content. After the Stanford results, those claims looked overstated, and as Law360 has reported, attorneys hitting bad output are now pointing at the vendors. The hard part is that the penalty does not land on the vendor. It lands on the lawyer who signed and filed the brief. Courts have suspended and fined lawyers for submitting fabricated AI citations, and the pattern is not slowing. For the background, JurisLabs has written about the AI mistake getting lawyers sanctioned and what Mata v. Avianca actually involved.

What actually lowers the risk when you set these tools up?

Configuration and a verification step do more for your risk than which brand you pick. From configuring AI inside working firms, three moves matter more than the logo on the tool:

Ground it in a known source set. Point the tool at your own documents and a defined library instead of the open web, so it has less room to invent.
Keep answers scoped and short. Longer output carries more chances to be wrong; Stanford found the tool that wrote longer answers also hallucinated more, so ask narrow questions and resist the long essay.
Verify every citation before it leaves the building. Build a check step into the workflow so no AI-generated case or quote reaches a filing until a person has confirmed it says what the draft claims.

The "zero hallucinations" you sometimes see claimed is the result of that discipline, not a switch you turn on. It is also why the safe choice depends as much on setup as on the tool, which JurisLabs covers in which AI tools are actually safe for a law firm to use.

So should a small firm use legal AI at all?

Yes, with eyes open. These tools genuinely save time on research and drafting, and they often surface issues a tired reader would miss. The mistake is trusting the answer because the box says it was made for law. Treat the output as a fast first draft, verify every citation before it goes anywhere, and set the tool up so client files are handled safely from the start. Done that way, the hallucination rate stops being a threat to your name and becomes a known limit you manage. If you want to see how your current tools are configured, and where a verification step belongs in your workflow, book a 20-minute call.

Does legal AI still hallucinate? The honest answer for a small firm