May 31, 2026
What does it actually mean when an AI tool says it won't train on your data?
When an AI vendor says it will not train on your data, that sentence is almost always true and almost always incomplete. It is true for one account tier, under one contract, with one setting turned the right way. The same brand will happily train on your data on the tier most lawyers are actually using, and nothing on the screen will tell you which deal you are in.
The same brand sells you two completely different deals
Microsoft sells you Copilot twice. Its own privacy statement says of the consumer version: "In certain markets, we use conversation data to train the generative AI models in Copilot, unless you choose to opt-out of such training." That is a default-on training posture for the tier you reach by signing in with a personal account. The enterprise version is governed by a separate contract, and the same document says so plainly: "In the event of a conflict between this Microsoft privacy statement and the terms of any agreement(s) between a customer and Microsoft for Enterprise and Developer Products, the terms of those agreement(s) will control."
Read those two lines together and the marketing claim comes into focus. "We do not train on your data" describes the enterprise deal. The consumer login you opened in thirty seconds is the other deal. They are the same brand and they are not the same product.
"Won't train on your data" is a contract, not a personality trait
The protection you are counting on is a clause, and it only protects you where the clause applies. Under OpenAI's data processing addendum, "OpenAI acts as a Data Processor on the Customer's behalf" and "will process Customer Data only in accordance with Customer Instructions." That is a real and strong commitment. It lives in the contract that governs the business and API products. It is not a property of the word OpenAI. Type the same client sentences into ChatGPT's consumer app, with no such contract attached to your account, and you are in a different regime that the clause never reached.
So the promise is only as good as two things: the document it lives in, and whether that document governs the account you are actually logged into.
Why the structure matters more than the brand
The risk here is not that any one tool is sinister. It is structural. The American Bar Association's first formal ethics opinion on these tools describes them this way: "GAI tools that produce new text are prediction tools that generate a statistically probable output when prompted," and notes that some are "described as self-learning, meaning they will learn from themselves as they cull more data." A self-learning tool improving itself on your inputs is the default business model, not a malfunction. The vendor turns it off for the customers who pay for it to be off.
Which means the sentence "we won't train on your data" carries three hidden conditions every time: which tier you are on, which contract governs that tier, and which settings are switched on inside it. Miss any one and the promise on the homepage was never describing you.
What this looks like when it goes wrong
A lawyer reads "we do not train on your data" on a vendor's homepage, feels reassured, and drafts a sensitive client letter in the free app they already had open. The homepage was describing the enterprise product. The free app was the consumer deal, the one with training on by default. Nothing on the screen flagged the difference, because the two products look identical while you type. The exposure does not announce itself. It just happens, quietly, on the tier the firm never meant to be using for real work.
The part you actually have to check
None of this is a reason to avoid AI. The protections are real. They are just conditional, and confirming the condition applies to your account is a short investigation the marketing line will never do for you: which tier you are on, which contract governs it, and the one or two settings underneath. That check is not hard. It is a step almost nobody takes, because the reassuring sentence felt like the answer.
The vendors are not lying. They are describing their best product and letting you assume it is the one you are using. Closing the gap between that promise and your actual setup, tool by tool, is exactly what a JurisLabs configuration audit does: it reads which deal you are really on and tells you where your client data is going. If you have ever pasted client text into an AI tool and trusted a line on a homepage, that is the thing worth checking. The first call is free.