Measuring Users Without Identifying Them

The question Dick Hardt was asking in 2005 — how much do you have to reveal to participate? — was aimed at login screens. You needed an account. The account required an email address. The email address was connected to your name, maybe your employer, maybe your location. Before you could read an article or post a comment, you had already disclosed a meaningful slice of your identity.

That question has a less obvious analogue for anyone running a website today. When you measure your own traffic — how many people visited, which pages they read, where they came from — how much do you have to know about the person in order to count them? The answer, it turns out, is considerably less than most analytics tools collect by default. And the gap between what is measured and what is required for measurement is the space that privacy-first analytics occupies.

Measuring Behaviour vs. Knowing the Person

The distinction sounds simple but breaks down quickly in practice. Measuring behaviour means recording events: a page was loaded, a button was clicked, a purchase was completed. Knowing the person means attaching those events to an identity — a persistent identifier that connects today’s visit to last month’s visit, to a cross-site advertising profile, to a device fingerprint, to an email address, to a name.

Traditional web analytics tools collapsed this distinction by design. A persistent cookie assigned a unique ID to your browser. Every page load sent that ID to the analytics server along with the page URL, the timestamp, the referrer, the browser, the screen resolution, and the IP address. Over time, the tool built a longitudinal record: this specific person — not “a visitor,” this person — visited your site on these dates, read these articles, came from this search query, and left at this point. That is identification, not just measurement.

From the site owner’s perspective, the extra information looked free. More data seemed like more insight. But the cost was offloaded to the visitor, who contributed detailed behavioral data to a third-party server without being asked and, in many cases, without being aware.

What “Anonymous” Actually Means in Analytics

Privacy-first analytics tools use the word “anonymous” to describe their approach, and it is worth being precise about what they mean. Anonymity in this context is not a binary state — it is a spectrum, and the engineering choices determine where a tool sits on it.

The weakest form of anonymity is IP masking: the last octet of the visitor’s IP address is removed before storage. This prevents exact geolocation to a household but still leaves a partially identifying string that, combined with user-agent data and timestamp, can re-identify visitors with some probability.

A stronger form is cookieless measurement with no persistent cross-session identifier. Each page view is treated as an independent event. The tool can count how many times a page was loaded and in what sequence within a single session, but it cannot link Monday’s visitor to Friday’s visitor. You get traffic counts and session flows; you lose individual longitudinal history.

The strongest form — used by some tools in their default configuration — is aggregation at the collection point. Raw events are never stored individually. The tool receives a page view event and immediately increments a counter; the underlying event record is discarded. There is no data to re-identify because there is no granular data at all.

Each step toward stronger anonymity trades something. Session-level analysis becomes harder. Returning visitor counts become estimates or disappear. Funnel analysis across multiple sessions is no longer possible without additional instrumentation. Whether those trade-offs are acceptable depends on what the site owner actually needs to know, and most site owners need considerably less than they have been collecting.

The Identification Problem Is Structural, Not Accidental

The surveillance architecture that grew out of the 2000s web was not an accident. As covered in the history of how online identity became a privacy problem, the user-centric identity movement proposed a world where you disclosed attributes deliberately and minimally. What the market chose instead was a system where identity was assembled passively, from behavioral signals, without any authentication event at all.

Standard analytics tools plugged into this architecture because it was economically convenient. The vendor offering free analytics was offering it in exchange for behavioral data it could use across its advertising network. The site owner got a dashboard; the advertising network got signal from every page load on every site that embedded the script. The visitor contributed to both data flows without being asked.

This is why the identity question and the analytics question are the same question at different scales. In 2005, the concern was about login intermediaries accumulating identity data. By 2015, the dominant analytics tool had become a kind of passive identity intermediary — running on every page load, no login required, assembling behavioral rather than account-based profiles.

Aggregation vs. Individual-Level Data

One of the cleaner distinctions in privacy-first analytics is between aggregated and individual-level data.

Aggregated data looks like this: 1,240 page views on Tuesday, 18% from organic search, median session length 2 minutes 40 seconds, 63% mobile. No individual record exists. You know what the audience did as a group; you know nothing about any specific visitor.

Individual-level data looks like this: visitor ID 7a3f9c visited the homepage at 14:32, then the pricing page at 14:34, then left. Same visitor returned Thursday at 11:15, went directly to the signup page, and completed registration. That record is useful for funnel analysis. It is also a behavioral profile of a specific person, linkable in principle to other data sources.

The trade-off is not that individual-level data is always wrong to collect. There are legitimate use cases: debugging a broken checkout flow, understanding where users drop off in a multi-step form, detecting fraudulent account creation patterns. The problem is that most analytics deployments collect individual-level data for every visitor as a default, regardless of whether the use case requires it, and retain that data indefinitely.

Privacy-first tools invert the default: start with aggregate collection, add individual-level tracking only where there is a specific, documented reason, and set retention limits accordingly.

Anonymous vs. Identifiable Measurement: What Changes

Dimension	Identifiable measurement	Anonymous measurement
Visitor identifier	Persistent cookie or device fingerprint, cross-session	No persistent ID, or session-scoped only
Data granularity	Individual event stream per visitor	Aggregated counts; no individual records
Cross-site linkage	Possible via third-party network	Not possible by design
Consent requirement (EU)	Required for tracking cookies and non-essential processing	Potentially exempt, depending on implementation and DPA guidance
Data retention risk	High — individual records persist; a breach exposes identifiable data	Low — aggregate counts are non-attributable
What you can measure	Journeys, cohorts, returning visitors, personalization signals	Traffic volume, page popularity, referrer channels, conversion rates
What you lose	Very little — most things are possible with enough data	Individual paths, returning visitor history, cross-session attribution
Impact of consent decline	Significant — declining visitors disappear from the data	Lower — cookieless tools are often blocked or declined less

The Sampling Problem That Affects Your Numbers

A practical argument for anonymous measurement that deserves more attention: identifiable analytics tools frequently undercount. When a consent banner is shown and a significant share of visitors decline tracking — which, when the banner is well-designed and the choice is genuine, is a substantial proportion in many European markets — the analytics tool records nothing about those visitors. The numbers in the dashboard represent the consenting fraction.

A cookieless tool that operates without consent (subject to local regulatory guidance, which varies) counts all visitors, or very nearly all. The traffic numbers are higher and more representative. The granularity is lower, but the denominator is correct. A conversion rate calculated on complete traffic data is a different number from a conversion rate calculated on the self-selected minority who clicked “Accept All.”

Site owners making decisions about content, channel investment, and product changes are working from numbers that, in a high-decline-rate scenario, may reflect a fraction of actual behavior. The dashboard looks authoritative. It may not be.

What Privacy-First Analytics Is Not

The framing here is about identity and anonymity rather than tool comparisons, but it is worth being clear about what privacy-first analytics does not mean, because the term is used loosely.

It does not mean no data. You can measure sessions, page views, referrer sources, on-site conversion events, and geography at a regional level without identifying anyone. The data is useful; it is not surveillance-grade.
It does not automatically mean GDPR-compliant. Whether a specific tool requires consent depends on what it collects, how it processes the data, where the server is located, and the guidance of the relevant data protection authority. Some anonymous tools have been explicitly approved for cookieless use without consent banners in certain jurisdictions; others have not.
It does not mean open-source. Some privacy-first tools are commercial and closed-source. The privacy claim rests on the data model and the infrastructure architecture, not the license.
It does not mean server-side tracking is automatically better. Server-side setups can be privacy-respecting or they can move the same problematic data collection behind a first-party domain. The architecture is neutral; what matters is what data is collected and where it goes.

The Identity Question for Site Owners

Dick Hardt’s original framing was about how much you have to reveal to participate. The user was the person disclosing identity; the website was the relying party asking for it. The privacy-first analytics question inverts the relationship: as a site owner, how much do you demand to know about a visitor in order to count them?

The honest answer for most sites is: not much. You need to know that someone visited, which page they landed on, roughly where they came from, and whether they completed a goal. You do not need a persistent profile. You do not need to know that this specific browser was also on your site three months ago. You do not need to hand that data to a third-party advertising network as a condition of getting it yourself.

The parallel to treating privacy as a design feature rather than a compliance checkbox runs directly through analytics choices. An analytics setup that does not build identity profiles is not a degraded version of one that does — it is a different design choice, one that reflects a different set of values about the relationship between site owner and visitor.

If you are evaluating which measurement approach fits your site’s data needs and regulatory situation, the privacy-first analytics finder walks through your requirements and identifies setups that match — without assuming you need GA4-level data density to run a competent website.

The Protocol Layer and the Measurement Layer

The comparison between Passport, OpenID, and Facebook Connect in the authentication era maps onto the same comparison in the analytics era. You could choose the decentralized, user-respecting option — OpenID in authentication, a cookieless analytics tool in measurement. You could choose the centralized-but-convenient option that came with third-party data sharing built in. The market dynamics pushed in the same direction for the same reasons: convenience and “free” had more obvious short-term value than privacy and minimal disclosure.

What has changed is the regulatory and reputational cost of the surveillance option. Enforcement actions against analytics platforms operating across EU borders — several major data protection authorities ruled in 2022 and 2023 that IP addresses transmitted to US-based servers constituted personal data transfers requiring legal basis — made the cost of the convenient option visible. It was never actually free; the cost was externalized to users and to future regulatory exposure.

The measurement layer has the same structural problem as the authentication layer: there is a strong economic incentive to collect more than you need, hold it longer than necessary, and share it with more parties than the visitor expects. Privacy-first analytics is an engineering response to that incentive — design the tool so that the data that would be problematic is never collected at all. You cannot breach data you do not have, and you cannot be required to hand over to a regulator data that was never stored.

The principle the Identity 2.0 era articulated for authentication — minimal disclosure, user control, no unnecessary intermediaries — did not win at the protocol layer. There is a reasonable case that it is gaining ground, slowly and unevenly, at the measurement layer. The tools exist. The legal framework has moved. The question is whether site owners will make the design choice before it is made for them.