30% of the XNUMX largest sites use scripts for hidden identification

A team of researchers from Mozilla, the University of Iowa and the University of California ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π»Π° the results of studying the use of code on sites for hidden user identification. Hidden identification refers to the generation of identifiers based on indirect data about the browser, such as screen resolution, list of supported MIME types, header-specific options (HTTP / 2 ΠΈ HTTPS), analysis of established plugins and fonts, availability of certain Web APIs specific to video cards features rendering with WebGL and Canvas, manipulation with css, respecting default values, scanning network ports, analysis of the features of working with mouse ΠΈ keyboard.

A study of the 100 most popular sites by Alexa ranking showed that 9040 of them (10.18%) use a code for hidden visitor identification. At the same time, if we consider the thousand most popular sites, then such a code was detected in 30.60% of cases (266 sites), and among sites ranked from thousandth to ten thousandth, in 24.45% of cases (2010 sites). Basically hidden identification is used in scripts provided by external services for anti-fraud and bot screening, as well as ad networks and user tracking systems.

30% of the XNUMX largest sites use scripts for hidden identification

To identify the code that performs covert identification, a toolkit was developed FP-Inspector, whose code proposed under the MIT license. The toolkit uses machine learning techniques combined with static and dynamic analysis of JavaScript code. It is claimed that the use of machine learning has significantly improved the accuracy of identifying code for covert identification and identifying 26% more problematic scripts.
compared to a manually specified heuristic.

Many of the identified identification scripts were not in the typical block lists Disconnect, adsafe,DuckDuckGo, Justuno ΠΈ EasyPrivacy.
After sending notice the developers of the EasyPrivacy block list was created a separate section for covert identification scripts. In addition, FP-Inspector revealed some new ways to use the Web API for identification that have not been seen in practice before.

For example, the use of residual data in the cache to identify information about the keyboard layout (getLayoutMap) was detected (using the Performance API, delays in returning data are analyzed, which makes it possible to determine whether the user accessed a specific domain or not, and whether the page was previously opened), permissions set in the browser (information about access to Notification, Geolocation and Camera API), the presence of specialized peripheral devices and rare sensors (gamepads, virtual reality helmets, proximity sensors). In addition, when identifying the presence of APIs specialized for certain browsers and differences in API behavior (AudioWorklet, setTimeout, mozRTCSessionDescription), as well as using the AudioContext API to determine the features of the sound system, it was fixed.

The study also examined the issue of violation of the regular functionality of sites in the case of the use of methods of protection against hidden identification, leading to blocking network requests or restricting access to the API. Selectively restricting the API to only scripts detected by FP-Inspector has been shown to result in less disruption than Brave and Tor Browser's more restrictive generic API call restrictions, potentially resulting in data leakage.

Source: opennet.ru

Add a comment