Browsers including Firefox, Safari, Opera, and Chrome have begun providing protections against cross-site tracking methods employing cookies and IP addresses. It’s an encouraging development, but there’s a fear it will push trackers to adopt more opaque, “stateless” tracking like browser fingerprinting, which tracks browsers by the configuration information they make visible.
To combat fingerprinting in particular, in a recent study, researchers at The University of Iowa, Mozilla, and the University of California, Davis investigated a machine learning-based approach called FP-Inspector that trains classifiers to learn fingerprinting. By extracting syntactic and semantic features through a combination of static and dynamic analyses that effectively complement each others’ limitations, FP-Inspector overcomes the coverage issues of dynamic analysis while addressing the inability of static analysis to handle obfuscation, the coauthors say.
Some browsers and privacy tools have tried to mitigate fingerprinting using techniques including API changes and network request blocking. But these require manual analysis, and they struggle to restrict scripts served from first-party domains and dual-purpose third parties like content delivery networks. That’s because each hard-coded heuristic has to be narrowly defined to avoid false positives and continually updated to capture evolving fingerprinting and non-fingerprinting.
To train FP-Inspector, the researchers crawled the homepages of 20,000 websites to compile a list of 17,629 websites with 153,354 distinct executing scripts. (They took the top 10,000 sites from a list of 100,000 of the web’s most-visited sites — Alexa’s Global Rank — and augmented it with random samples of 10,000 sites from the remainder, allowing them to cover both the most popular websites and websites further down the long tail.) In experiments, they say that FP-Inspector performed well, detecting 26% more fingerprinting scripts than manually designed heuristics with 99.9% accuracy and two times less website breakage.
In an effort to measure the prevalence of fingerprinting scripts on the web, the researchers applied FP-Inspector’s detection component to the top 71,112 websites ranked by Alexa. They found that over a quarter of top sites now deploy fingerprinting (10.18% of top-100,000 sites amounting to 2,349 unique domains) and that fingerprinting is used unevenly across different categories of websites. Usage ranged from nearly 14% of news websites to just 1% of credit- and debt-related websites, a disparity the coauthors attribute to the fact that fingerprinting is common on websites relying on advertising and paywalls for monetization.
The researchers say they plan to publish the domains serving fingerprinting scripts to tracking protection lists like Disconnect and EasyPrivacy. “FP-Inspector helped uncover exploitation of several new APIs that were previously not known to be used for browser fingerprinting,” they wrote. “We plan to report the names and statistics of these APIs to privacy-oriented browser vendors and standards bodies. To foster follow-up research, we will release … [our] fingerprinting countermeasures prototype extension, list of newly discovered fingerprinting vendors, and bug reports submitted to tracking protection lists, browser vendors, and standards bodies.”