GitHub Copilot Has a Class System for Privacy

GitHub’s April 24 policy update did not introduce a subtle tweak. It drew a hard line through the product. If you pay for Copilot as an individual on Free, Pro, or Pro+, GitHub may use your inputs, outputs, code snippets, and associated context to train its AI models unless you opt out. If you are on Business or Enterprise, GitHub says your data is protected by contract and is not used for model training.

That is a class system. Same product family. Different privacy boundary.

The technical detail that matters is the phrase “associated context”. Copilot does not work from a neat prompt box. It works from whatever you had open, selected, pasted, referenced, or discussed while asking for help. That can include private implementation details, stack traces, file names, inline comments, security notes, schema fragments, and code that never leaves your laptop in any deliberate publish step. In 2026, the context window is the product. GitHub decided that for individual plans, that context window is also training exhaust.

GitHub’s blog post frames this as normal industry practice and says internal use of Microsoft employee interaction data improved acceptance rates across multiple languages. Fine. That is a product argument. The architectural argument is different. Once a tool consumes prompts plus local code context plus generated output plus feedback loops, it stops looking like autocomplete and starts looking like a telemetry pipeline attached to your editor.

GitHub’s own docs make the split explicit.

For individual subscribers, the setting is called “Allow GitHub to use my data for AI model training”. Starting April 24, 2026, users on Free, Pro, and Pro+ are included unless they disable it. For Business and Enterprise, GitHub says customer data is not used for model training and points to the Data Protection Agreement as the boundary.

That split is the whole story. The premium feature is not better completions. The premium feature is a lawyer.

The repo at rest distinction is technically true and strategically slippery

The Hacker News thread on the announcement surfaced GitHub’s most important clarification: GitHub says it is not training on private repositories “at rest.” It is training on interaction data produced while you are actively using Copilot, unless your plan or settings block that use. That distinction is technically real. It is also where the product language starts doing legal work.

A private repository at rest is a directory tree on disk or a hosted repo in storage. Interaction data is what happens when you ask Copilot to explain a function, refactor a test, summarize a stack trace, or generate a migration. If the model sees a few hundred lines of code to answer that request, the difference between “repo data” and “interaction data” matters to counsel and procurement. It matters a lot less to the person whose proprietary code was inside the prompt path five seconds earlier.

This is the same move old desktop software used in the 2000s when “anonymous diagnostics” turned into product analytics by another name. The box on the installer said crash reporting. The vendor got a running transcript of how the software was used in the wild. Copilot is doing the editor era version of that pattern, except the payload is source code context instead of stack frames.

GitHub’s control plane tells you who the real customer is

Read the docs instead of the announcement copy. The docs are where products confess.

The content exclusion feature is available for Copilot Business and Copilot Enterprise. Repository admins, organization owners, and enterprise owners can exclude files, directories, file types, and filesystem paths. That is what a real control surface looks like.

Then the same page drops the line that should stop anyone cold: content exclusion does not apply to GitHub Copilot CLI, Copilot cloud agent, or Agent mode in Copilot Chat in IDEs.

That means the higher-autonomy surfaces are the ones where the exclusion model falls apart.

This matters because the individual subscriber settings page says Copilot cloud agent is enabled in all repositories by default for personal accounts unless you block it. The same page says third-party coding agents can access the same repositories that Copilot cloud agent can access.

Put those pieces together and the shape of the system is obvious:

Individual plans default into model training unless the user opts out
Personal repos default into cloud agent availability unless the user opts out
Third-party coding agents inherit that repository scope
Exclusion controls do not cover CLI, cloud agent, or IDE agent mode

That is not a bug. That is product segmentation.

Enterprise customers get governance because enterprise customers bring procurement, security review, and renewal leverage. Individual developers get a privacy toggle buried in settings and blog language about helping improve the model for everyone.

The 150-character rule tells you where GitHub still acts cautious

GitHub’s personal policy page says that when public code matching is blocked, most Copilot products compare suggestions plus about 150 characters of surrounding code against public GitHub code and suppress near-matches.

That number is a useful tell. GitHub knows context matters. It knows small windows of adjacent code are enough to determine whether an output is too close to existing source. The company is willing to use surrounding context aggressively when the issue is copyright risk. It is equally willing to describe surrounding context vaguely when the issue is training intake.

That asymmetry is the product strategy. Precision for liability. Abstraction for collection.

The real product being sold is workflow capture

The training value is not in isolated snippets. Public GitHub already has infinite snippets. The valuable part is sequence.

What file was open when the user asked for help. What they highlighted. What error they were looking at. What completion they accepted. What they rejected. What they rewrote one minute later. What the final version looked like after three attempts.

That is developer behavior data. It is closer to session replay for programming than to an old autocomplete corpus.

GitHub more or less says this out loud. The company wants real-world interaction data so models can better understand development workflows. Of course it does. The future moat is not raw code. It is traces of expert correction under real constraints.

This is why the plan split matters so much. Enterprise customers are expensive and politically sensitive, so their workflow capture is fenced off. Individual developers are a giant behavioral dataset with much weaker negotiating power.

If you learned to code during the browser toolbar years, this structure feels familiar. The free tier got the defaults that monetized behavior. The paid tier got the admin console. GitHub did the 2026 version with copilots, agents, and model training settings.

What to do if you use Copilot without an enterprise wrapper

Treat Copilot like any other hosted system that can see your working set.

First, disable training if you are on an individual plan and do not want your interaction data reused. The setting lives at github.com/settings/copilot/features.

Second, do not assume repository privacy equals prompt privacy. If you paste code into chat, ask for a refactor across proprietary modules, or hand the cloud agent a task, you have moved from storage concerns into interaction concerns.

Third, understand that exclusion controls are not universal. GitHub says CLI, cloud agent, and IDE Agent mode do not support content exclusion. Those are exactly the surfaces people use when they want the model to operate with the most context and autonomy.

Fourth, decide whether individual Copilot belongs anywhere near regulated, client-confidential, or security-sensitive work. The docs already answered that question for enterprises by building a different contract class around those cases.

GitHub did not discover a new truth on April 24. It formalized the old one. AI coding tools are not selling completions. They are buying workflow data, then reselling the improved system back to you. Business and Enterprise customers get to refuse the trade by default. Everybody else gets a dropdown.