PostHog opts you into AI training by default. The fine p...

What happened

PostHog published a blog post announcing that customer data flowing through its product analytics platform will be used to train AI models — and the setting is on by default for existing and new accounts. There is an opt-out toggle in project settings. The post frames the change as necessary to ship better AI features (anomaly detection, natural-language SQL over your event data, automated insights), and argues that opting in is the cooperative move because every customer benefits from a better-trained model.

The Hacker News thread climbed to 195 points within hours, and the top comments are not about AI capability — they are about the default. Several long-time PostHog users noted they only learned about the change because the HN post surfaced it, not because of an in-product notification or an email to account admins. A handful of EU-based customers flagged that opt-in-by-default is, at minimum, a tense fit with GDPR's Article 6 lawful basis requirements when the data being repurposed includes personal data collected under a different original purpose.

PostHog is open-source-core and self-hostable, which complicates the framing. The opt-in default applies to PostHog Cloud customers. Self-hosters are unaffected because their data never leaves their infrastructure — a point PostHog's own response in the HN thread leans on heavily.

Why it matters

The surface story is "another vendor is training on customer data." The deeper story is that product analytics data is one of the highest-signal, lowest-consent datasets in the modern SaaS stack, and the industry has never really agreed on who owns it for downstream purposes.

Think about what actually sits in a PostHog instance. Event streams with user IDs, often with email addresses as the distinct_id. Session recordings — literal video of users interacting with your product, including any PII they typed into forms before your masking rules caught it. Feature flag evaluations tied to user cohorts. Funnel data that reveals your conversion mechanics. This is not log data. This is behavioral data about your end users, who never signed a contract with PostHog. They signed one with you.

That's where the legal chain matters. Your DPA with PostHog likely names them as a processor and you as the controller. Processors can only use data for the purposes the controller specifies. Training a general-purpose AI model on processor data is, in most readings of GDPR and the equivalent state-level US laws, a new purpose that requires either a refreshed lawful basis or a contractual amendment with affirmative customer consent — not a default toggle. PostHog's terms update is the contractual amendment piece, but the affirmative consent piece is doing a lot of work in the word "default."

Compare to the recent moves from peers. Mixpanel has stayed quiet on this front. Amplitude's terms explicitly carve out customer data from model training. Segment (Twilio) restricts it to aggregated, de-identified usage. Heap allows it but requires explicit opt-in. PostHog is the outlier here, and they're betting that the open-source credibility plus the self-host escape hatch covers the optics.

The community reaction is split along predictable lines. One camp argues PostHog has earned trust through open-sourcing the entire platform and the opt-out is real, so this is fine. The other camp argues that "opt-out is fine" is exactly the framing that got us tracking pixels and third-party cookies, and the precedent — a beloved OSS-friendly company flipping the default — is what's actually corrosive.

What this means for your stack

If you run PostHog Cloud, do three things this week. First, log in and flip the AI training toggle off until your legal team has reviewed the new terms. The setting lives in project settings under data management. The default-on state means your decision to do nothing is itself a decision that may be hard to defend to a regulator or an enterprise customer auditing your subprocessor list. Second, check whether your own DPA with your end users permits sub-processor use of their data for AI model training; for most B2B SaaS contracts, the answer is no without amendment. Third, audit your session recording masking rules — if you were sloppy about masking inputs because "it's just PostHog," the threat model just changed.

If you're evaluating PostHog right now, this isn't a dealbreaker, but it changes the calculus. The right comparison is no longer PostHog vs. Mixpanel on features and price — it's PostHog Cloud vs. self-hosted PostHog vs. a competitor whose data-use terms are stricter by default. Self-hosting PostHog has gotten meaningfully easier in the last 18 months with the Helm chart and the managed-self-host offerings from cloud resellers; for any team subject to HIPAA, SOC 2 Type II with strict subprocessor controls, or EU data residency requirements, the self-host path is now the lower-friction option even if the sticker price looks worse.

For everyone else: this is a useful prompt to actually read the data-use sections of every analytics, error tracking, and observability vendor you have. Sentry, Datadog, LogRocket, FullStory, and the rest have all quietly updated terms in the last twelve months. PostHog is just the one that's loud about it.

Looking ahead

The interesting question is whether PostHog walks this back. The HN thread is exactly the kind of signal that has caused similar reversals — GitLab on telemetry in 2019, Audacity on data collection in 2021, Hashicorp on the BSL license in 2023 (partially). PostHog's CEO has a track record of engaging directly in community threads and adjusting course; watch for a follow-up post within the next week. If the default flips to opt-in and the company ships a clear in-product notification flow for the change, this becomes a footnote. If it doesn't, expect a measurable bump in self-host migrations and a more cautious read on every "we're an OSS company, you can trust us" pitch for the next year.

PostHog opts you into AI training by default. The fine print matters.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

PostHog will train AI models with your data (opted-in by default)

// community takes

PostHog opts you into AI training by default. The fine print matters.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

PostHog will train AI models with your data (opted-in by default)

// community takes

// share this