How to Ground an AI in Primary Legislation
General AI invents the law 58 to 88% of the time (Stanford, 2024). Grounding fixes most of it. A step-by-step guide to grounding an AI in primary legislation.
- AI
- RAG
- Legislation
- WHS
- Grounding
If you ask a general AI model a specific question about the law, it will often answer confidently and be wrong. Not vaguely wrong, but specifically wrong: a section number that doesn't exist, a case that was never decided, a duty the Act doesn't contain. Stanford researchers measured this. General large language models hallucinated on verifiable legal questions between 58% and 88% of the time (Stanford RegLab, Large Legal Fictions, 2024). The fix isn't a better model. It's grounding: making the model reason from the actual text of the law instead of its memory of it.
I build AI tools for safety work, including a skill that encodes the model WHS Act 2011 and HSWA 2015 so a model can answer from the legislation rather than guess at it. I wrote up that build separately in encoding the WHS Act into an AI skill. This piece is the generalisable method underneath it: how to ground an AI in primary legislation, whatever the Act, and why you still have to check its work.
Why can't you just ask ChatGPT about the law?
Because a general model doesn't store the law, it stores the shape of it. It has read enough legislation to produce text that looks exactly like a statute, which is precisely the danger: the output is fluent, formatted and plausible, and a good share of it is invented. The Stanford team found models hallucinate a court's core holding at least 75% of the time, and the headline rate ran from 58% for GPT-4 to 88% for Llama 2 (Stanford RegLab, 2024).
| Category | Value (%) |
|---|---|
| GPT-4 | 58% |
| GPT-3.5 | 69% |
| Llama 2 (70B) | 88% |
This isn't a hypothetical risk. In 2023, a New York court sanctioned two lawyers and their firm US$5,000 after they filed a brief full of cases ChatGPT had fabricated (ABA Journal, 2023). It has only grown since. By mid-2026, an independent database tracking the problem had logged more than 1,600 court cases worldwide involving AI-hallucinated content (Damien Charlotin, AI Hallucination Cases Database, 2026). The model isn't lying. It has no concept of the difference between a real section and a convincing one, and that is exactly why you cannot let it work from memory.
What does it mean to ground an AI in legislation?
Grounding means retrieving the actual text of the law and putting it in front of the model so it answers from that text, not its training data. The technique is retrieval-augmented generation, or RAG, introduced by Lewis and colleagues in 2020, and it's now the standard way to make a model answer from authoritative documents (Lewis et al., 2020). Instead of asking "what does the model remember about section 19?", you fetch section 19, paste it into the prompt, and ask the model to reason from what's in front of it.
The effect is large when you do it well. In one study, giving a model the single document that contained the answer lifted its accuracy from 56% to 88% (Liu et al., Lost in the Middle, 2023). That is the whole bet of grounding: the model is a capable reader and a poor rememberer, so stop asking it to remember. The four steps below are how you actually do that with a piece of legislation, and the last one, verification, is the step most people skip.
Step 1: Start from the authoritative primary source
Ground against the official text, not the first PDF a search throws up. This matters more for law than almost anything else, because an out-of-date or unofficial copy will ground your model in the wrong rule. In Australia, the Federal Register of Legislation is the approved whole-of-government source, and its authorised versions are taken to be reliable by a court or tribunal unless proven otherwise (Federal Register of Legislation, 2026). That reliability is the property you want your AI to inherit.
| Jurisdiction | Where the authoritative text lives |
|---|---|
| Commonwealth (Australia) | Federal Register of Legislation (legislation.gov.au), authorised PDF versions |
| NSW and other states | The state register, for example legislation.nsw.gov.au, with point-in-time versions |
| Model WHS laws | Safe Work Australia (model Act and Regulations, not law until a jurisdiction adopts them) |
| New Zealand | New Zealand Legislation (legislation.govt.nz), for example the HSWA 2015 |
One subtlety the law adds that a normal document doesn't: it changes over time. A register publishes compilations, each showing the law as amended at a particular point in time. So ground against the version that was in force for the question you're answering, and record that version date, because "what did section 19 say" has a different answer depending on when you ask.
Step 2: Chunk the law by provision, and keep the citation
Split the text by section or clause, not by page or token count. A retrieval system breaks a document into chunks it can search, and for legislation the natural unit is the provision: a section, a subsection, a regulation. Chunk it that way and each retrieved piece is a self-contained unit of law that still makes sense on its own. Chunk it by arbitrary length and you'll slice section 19(3) in half, retrieve the back end, and ground the model in a fragment.
Carry the citation with every chunk as metadata: the Act, the section number, and the version date from Step 1. This is the quiet step that makes the whole thing auditable later. When the model cites section 19(3)(c1), you want to trace that answer straight back to the exact chunk it came from, and the exact authorised text behind that chunk. No metadata, no audit trail.
Step 3: Retrieve the right provisions, and make the model quote them
Now do the actual grounding: retrieve the provisions relevant to the question and instruct the model to answer only from them. Retrieval usually runs on embeddings or search, pulling the handful of sections that match the query. Then the prompt does the disciplining. Tell the model to quote the provision, cite it, and reason from the quoted text, and to say plainly when the answer isn't in what it was given.
You are answering strictly from the legislation provided below.
Rules:
1. Use ONLY the sections provided. Do not rely on prior knowledge of the law.
2. Quote the exact words of the provision you rely on, and cite the section.
3. If the provided sections do not answer the question, say:
"That is not addressed in the provided text." Do not guess.
Provided sections:
[retrieved provisions, each with Act, section number and version date]
Question: [the user's question]That last rule is the one people skip, and it's the most important. A model that will say "not in the provided text" is a model that has stopped inventing. Watch the volume of context too, because models read unevenly: accuracy degrades when the relevant passage is buried in the middle of a long input, the effect Liu and colleagues named "lost in the middle" (Liu et al., 2023). Retrieve the few right provisions, not the whole Act.
Step 4: Verify every citation against the source
Check each cited provision against the authorised text, every time, because grounding reduces hallucination without removing it. This is the step the vendors wish you'd skip. When Stanford tested commercial legal AI tools that are grounded with retrieval, the ones marketing themselves as avoiding hallucination, it found those claims were overstated: the tools still hallucinated between 17% and 33% of the time (Stanford RegLab and HAI, Hallucination-Free?, 2024).
| Category | Value (%) |
|---|---|
| Lexis+ AI | 17% |
| Ask Practical Law AI | 17% |
| Westlaw AI-Assisted Research | 33% |
So the verification can't be the model's own word. The cheap, reliable check is to confirm that every section the model cites actually exists in the authorised text and actually says what the answer claims. You can automate the existence check and the quote match. It's the same discipline I built into my open-data dashboard. The machine can describe the source, but it can never quietly invent a figure, because every figure is matched against the data before it publishes. A citation that can't be traced to the source gets discarded, not shipped.
Where grounding still falls short
Grounding is a strong control, not a finished one, so a competent person stays on the legal call. Even in the narrow task of summarising a document you hand it, the best models still fabricate a few percent of the time, and many sit above 10% (Vectara Hallucination Leaderboard, 2025). Three failure modes survive grounding, and you manage each the way you'd manage any control.
Retrieval can fetch the wrong provision, so the model reasons faithfully from text that doesn't answer the question. The model can misread or over-generalise from a section that is correct. And the law moves underneath you, so a perfectly grounded answer goes stale when the Act is amended, which is why the version date from Step 1 isn't optional. None of these are reasons not to ground. They're reasons to keep a human who knows the law between the output and the decision.
This is the same line I hold everywhere AI meets safety, set out in the field guide: AI belongs around the decision, not on it. Grounding doesn't transfer the duty to be right about the law to the machine. It just makes the machine a far more useful, and far more honest, research assistant.
Ground it, then check it
Grounding is the difference between an AI that performs legal knowledge and one that works from the law. The method is not exotic: get the authorised text, split it by provision, retrieve the right sections, make the model quote and cite them, and verify every citation against the source. Do that and you turn a confident fabricator into a careful reader. Skip the verification and you've just made the fabrication harder to spot.
If you want the worked build behind this, see how I encoded the WHS Act 2011 and HSWA 2015 into an AI skill. For where this fits in a wider safety practice, the field guide covers the principle and AI-assisted SWMS and risk assessments applies it to the highest-volume document on any site. If you're grounding an AI in your own legislation or standards, reach out. I'm always happy to compare notes.
Frequently asked questions
- Why does AI make up laws and cases?
- Because a general model answers from a statistical memory of its training data, not from the text of the law. It has learned what legislation sounds like, so it produces plausible section numbers and case names that do not exist. Stanford found general LLMs hallucinate on verifiable legal questions 58% to 88% of the time.
- What is grounding for legal AI?
- Grounding, usually via retrieval-augmented generation (RAG), means retrieving the relevant text of the law and placing it in the model's context so it answers from that source rather than its memory. The technique was introduced by Lewis and colleagues in 2020 and is now the standard way to make AI answer from authoritative documents.
- Does grounding stop AI hallucinating about the law?
- It reduces it sharply but does not eliminate it. In a 2024 Stanford study, purpose-built legal AI tools grounded with retrieval still hallucinated 17% to 33% of the time, despite vendor claims of being hallucination-free. Treat grounding as a strong control, not a guarantee, and verify every citation.
- Where do I get the authoritative text of a law to ground an AI?
- Use the official register, not a random PDF. In Australia, the Federal Register of Legislation is the approved whole-of-government source, and its authorised versions are treated as reliable by a court unless proven otherwise. Use the point-in-time version that was in force for the question you are asking.
- Can I trust an AI legal tool that says it is hallucination-free?
- Be sceptical. When Stanford tested tools that marketed themselves as avoiding hallucinations, it found the claims were overstated, with hallucination rates of 17% to 33%. A vendor saying 100% hallucination-free is making a marketing claim, not a measured one. Ask how it grounds answers and verify the output yourself.