Why this is different from other AI questions

Most of the AI conversation in the mission-driven sector focuses on productivity: saving time on writing, streamlining reports, automating administrative tasks. Those are real benefits, and they're worth pursuing. But when your organization serves people in vulnerable circumstances, there's a category of AI risk that productivity-focused conversations tend to skip past.

The people your organization serves often share information they wouldn't share with anyone else. Housing status, immigration details, health conditions, substance use history, domestic violence situations, financial distress. This information is shared in confidence, frequently under legal protection, and sometimes at genuine personal risk to the person disclosing it. When that information enters an AI tool, the question of where it goes and who can access it is not a theoretical concern. It is the concern.

What happens to data in AI tools

Understanding the data question requires knowing a few things about how AI tools handle inputs.

Free tiers of most AI tools use your inputs for training. When you enter text into the free version of ChatGPT, for example, that input may be used to improve future models. OpenAI's terms are explicit about this. The same is generally true for free tiers of other major AI tools. This means that client information entered into a free-tier tool could, in principle, inform the tool's future responses in ways that are impossible to predict or control.

Paid tiers typically don't train on your data, but read the fine print. Most major AI providers offer business or team plans that include data processing agreements and commitments not to use your inputs for model training. These agreements matter, and they're worth the cost for any organization handling sensitive information. But the specific terms vary by provider, and "we don't train on your data" doesn't mean "we don't process or temporarily store your data." Know what you're agreeing to.

Data residency and jurisdiction matter. Where the servers are located, which country's laws govern the data, and whether the provider can be compelled to disclose information under subpoena are all relevant considerations for organizations handling legally protected data. Most major AI tools process data on US-based servers, but the specifics vary and may change.

A useful rule of thumb: if the information would require a breach notification if it were exposed in a data incident, it should not go into a general-purpose AI tool, regardless of the pricing tier.

Drawing the line for your organization

The most effective approach I've seen is to establish clear categories in your AI use policy that distinguish between data types rather than trying to evaluate every possible use case individually.

Category 1: Never enters an AI tool. Client names combined with any identifying information. Case notes. Health records. Immigration status. Financial details of the people you serve. Anything covered by HIPAA, FERPA, 42 CFR Part 2, or your state's confidentiality statutes. Anything shared under a promise of confidentiality, explicit or implied.

Category 2: May enter an approved, paid-tier tool with identifying information removed. Aggregated and anonymized program data. De-identified outcome metrics. General descriptions of service patterns or community needs that couldn't be traced back to an individual. The key word here is "couldn't," not "probably wouldn't." If there are fewer than a handful of people in a given category, the data isn't meaningfully anonymous even without names attached.

Category 3: Fine for AI tools. Internal administrative content. First drafts of communications that don't reference specific clients. Meeting notes about organizational operations. Research and planning documents. Marketing and fundraising content. General correspondence.

Most staff, once they see these categories laid out, can sort their own work into the right bucket without needing to ask. The goal is a framework simple enough to follow in the middle of a busy day, not a decision tree that requires a legal consultation for every use case.

The compliance landscape

Several regulatory frameworks are directly relevant to how mission-driven organizations can use AI with client data, and the specifics depend on your work.

HIPAA applies to organizations that handle protected health information. If your organization is a covered entity or a business associate of one, entering PHI into an AI tool that doesn't have a signed Business Associate Agreement is a potential violation, regardless of the tool's privacy policy. Most general-purpose AI tools do not offer BAAs, though some enterprise-tier products are beginning to.

FERPA governs educational records. Organizations that work with schools or handle student information need to ensure that AI tools receiving that data meet FERPA's disclosure and consent requirements.

42 CFR Part 2 provides especially strict protections for substance use disorder treatment records. These records have disclosure restrictions that go beyond HIPAA, and entering them into AI tools would likely constitute an unauthorized disclosure.

State privacy laws vary widely and are changing fast. California's CCPA/CPRA, Colorado's CPA, Virginia's CDPA, and similar laws in other states may impose additional obligations depending on what data you collect and from whom. If your organization operates in multiple states, the most restrictive applicable law generally sets the floor.

None of this should be read as legal advice, and the right move is to consult with an attorney who knows your specific regulatory environment. But the general principle is clear: the legal landscape is moving toward more protection for personal data, not less, and organizations that build conservative data practices now will have less to unwind later.

What to tell your staff

The most important thing is clarity. Staff who are uncertain about the rules tend to do one of two things: avoid AI entirely (which means the organization misses real productivity gains) or use it without thinking about data boundaries (which creates risk). Neither outcome serves the organization well.

A short, focused training session that walks through your data categories with examples specific to your programs is more effective than a general policy memo. When a case manager can see a concrete example of what's okay to paste into an AI tool and what isn't, using scenarios from their own workflow, the policy becomes actionable rather than abstract.

It also helps to give staff a clear path for edge cases. "If you're not sure whether something is okay to put into an AI tool, ask [specific person] before you do" is a much more useful instruction than "use your best judgment." People's best judgment varies, and the consequences of a wrong call with sensitive data are not evenly distributed. The client whose information was shared bears the risk, not the staff member who shared it.

The self-hosted option

There's a point on the privacy spectrum that most articles about AI tools don't mention: running a model yourself. Open-source large language models like Meta's Llama and Mistral's offerings can be installed on your own servers or a private cloud instance, which means your data never leaves infrastructure you control. No third-party terms of service, no training on your inputs, no questions about data residency.

For organizations handling the most sensitive categories of data, this is worth knowing about. A self-hosted model processes everything locally. Client intake summaries, case notes, program data. None of it travels to an external API. For organizations subject to strict regulatory requirements or serving populations where any external data exposure is unacceptable, this can be the difference between being able to use AI at all and not being able to.

The trade-offs are real, though. Self-hosted models require technical infrastructure that most mission-driven organizations don't have in-house: server hardware or cloud computing budget, someone who can manage the deployment, and ongoing maintenance as models are updated. The models themselves are capable but generally less polished than commercial offerings like ChatGPT or Claude, particularly for tasks that benefit from large context windows or tool integrations. And the cost structure is different: instead of a per-user subscription, you're paying for compute, which can be cheaper or more expensive depending on usage patterns.

For most organizations I work with, the honest recommendation is that a paid commercial tool with a data processing agreement covers the privacy needs well enough, and the simplicity is worth the trade-off. But for larger organizations with IT capacity, or for specific high-sensitivity use cases where no data can leave the building, self-hosted models are a legitimate option that's getting more accessible every year. It's worth knowing the option exists even if it's not the right starting point for your organization today.

Building trust through care

There's a broader point here that goes beyond compliance. The people your organization serves chose to trust you with sensitive information. Some of them are in situations where the wrong disclosure could affect their safety, their housing, their custody arrangements, or their legal status. Treating their data with care isn't a regulatory checkbox. It's a reflection of the same values that brought your organization into existence.

The organizations I work with that handle this well tend to talk about data privacy not as a constraint on AI adoption but as a precondition for it. Getting the data boundaries right first makes everything else possible: staff feel confident using AI tools for the tasks where they're appropriate, leadership can report to funders and boards with clarity about their practices, and the people you serve can continue to trust you with the information you need to help them.

Building an AI practice that protects client data while giving your team real productivity gains requires getting the foundations right. I help mission-driven organizations draw those lines clearly.

Book a 30-minute conversation