Specifically, it won't work across *sites* -- all major browsers (now) shard the...

cookiengineer · on Feb 8, 2022

Note that CNAME cloaking is also an issue here, because Firefox is the only Browser with a userspace network stack whereas Chrome/Chromium relies on the OS...which means that Chromium or Electron based Browsers cannot protect themselves from CNAME cloaked domains.

Adblockers like uBlock Origin just have a domain list of known CNAME cloaked domains, but that's not based on the DNS entries directly because there's no API for this for web/chrome extensions.

Long story short: Sharding can be tricked with CNAME cloaking.

jefftk · on Feb 8, 2022

CNAME cloaking doesn't allow cross-site tracking via the cache: it's still sharded based on the site you're visiting. Imagine a setup like:

   www.example.com/foo: page you're visiting
   tracker.example.com: CNAMEd to tracker.example
   www.example.org/bar: a page you visit on another site
   tracker.example.org: CNAMEd to tracker.example

This will use two cache partitions: one for example.com, another for example.org. ETags observed by the tracker will be different between those two cases.

cookiengineer · on Feb 8, 2022

Do you maybe have a link to the relevant source so I can take a look at it? I was always under the assumption that Firefox and Chromium only cache the resulting A/AAAA entries.

In WebKit, the DNS entries don't play an active role in enhanced tracking protection, because they only use metrics of content vs data being sent to determine the importance of subdomains.

Theoretically upstream WebKit could be tricked when using spoofing a domain that's part of the Quirks.cpp.

(e.g. microsoft.com or sony.com basically have access to all cookies, technically speaking, because their domains are hardcoded to get cross origin access)

jefftk · on Feb 8, 2022

The way ETag-based tracking works is:

* When you load a resource it may return a "ETag" header

* The browser stores the resource in its cache under a key like [site, resource url]. For example [example.com, https://example.net/foo] means "https://example.net/foo" requested when visiting some page on example.com.

* When the browser tries to use the resource again and sees that it's expired, it asks the server for it, sending along the previous ETag value in the If-None-Match header so the server can avoid sending an updated copy if the resource hasn't actually changed.

* Since the original ETag value is echoed back to the server, the server can tell this is the same browser it was talking to earlier.

Now, a few years ago, when the browser would store resources in its cache under a key like [resource url], without sharding by site, this was occasionally used like a third-party cookie. When visiting any site that is using the relevant tracker, the tracker would request https://example.net/foo, and as long as the resource had not been evicted from cache the tracker was able to reidentify users across sites. With the HTTP cache sharded by domain, however, this no longer works across sites.

This seems to me to be entirely orthogonal to CNAME cloaking; where do you see that fitting in?

cookiengineer · on Feb 8, 2022

The way trackers work these days is that they do not have their own domains (fqdn) anymore.

There is a CNAME tracker.example.com and a CNAME tracker.example.org which both point to the same tracker IP by a third-party.

This allows them to infiltrate both site caches because - as I understand it - the sharded caches do not separate the third party domains there, as the Browser still thinks that tracker.example.com belongs to example.com.

They will probably have two separate ETag values for the same URL, but in both scenarios the cross-site protection mechanisms are basically nullified. Things like cross site cookies aren't necessary anymore because the public suffix list alone mandates that those domains are actually second-party domains by the same first-party origin.

I've seen some trackers going as far as abusing the resource lifetimes (Last-Modified and Pragma/Cache related headers) where they use a timestamp far in the past (aka in the 1980s) and "reserving this millisecond" as a unique identifier for a specific client that they're tracking...in order to bypass the implementations that try to prevent this kind of tracking via HTTP headers.

jefftk · on Feb 8, 2022

No, the cache is keyed by url (https://tracker.example.com/tracker.js vs https://tracker.example.org/tracker.js) and not by IP. So those resources will have separate cache entries with separate ETag/Last-Modified/etc values.