Device fingerprinting - how it works and where it fits in fraud detection?

Device fingerprinting - how it works and where it fits in fraud detection?

16.6.2023

Like a human fingerprint, a device fingerprint is a unique digital representation of a particular device. Same way as a human fingerprint - if you have already seen it or have a database of fingerprints, you can recognize and assign it to its original entity (in our case, a particular device).

Recognizing the individual customer in the online space has been the main requirement for many use cases and is essential for online marketing. Targeted advertising is the "golden goose" of all search engines and social networks. While some web services (e.g. social networks) have it easier as the user is happy to register and log in - providing all the details needed and identifying themselves in every interaction, other web services e.g., search engines, had to devise a unique way of identifying individual customers - even without any registration and login process.

How to identify the device on the web?

In not so distant past, it was possible to collect unique identifiers from the connected devices (MAC address for computers and IMEI number for mobile phones).

MAC address is a unique, 12-character alphanumeric attribute identifying individual electronic devices on a network. An example of a MAC address is 00-B0-D0-63-C2-26. The first 3 pairs (Block ID) 00-B0-D0 identify the manufacturer, in our example, DELL. A single manufacturer can have many - even hundreds of Block IDs assigned. The remaining 3 pairs (63-C2-26) represent the device's unique ID. MAC address identifies the device, but in reality, it is imprinted/pre-assigned to the machine's network card/interface. So if your machine has multiple network interfaces, it will have multiple MACs, one for each. Though the MAC address is pre-assigned by the manufacturer, it is possible to spoof/alter it.

IMEI number is a unique 15-digit serial number given to every mobile phone, which can then be used to check information such as the phone's manufacturer and its model number.

Identification via MAC address or IMEI number is not straightforward anymore. As the general public is getting more conscious about their online privacy, these IDs became the first to be blocked or made unavailable exactly for the reasons for which they actually exist - exact identification of the device and, in the online world, attribution of the online user/device.

How device fingerprinting works

When we type a particular URL into the address bar of the browser, a collection of information is sent from the user to the web server - information like language preference, browser details (so-called user agent), preference for using secured communication, preference for using compression during the data transfer, etc.

Other sets of information can be collected by a purposefully-built script which will execute after the user loads the requested webpage. Through Javascript, a wide range of additional information can be collected.

The above diagram shows the most common ones, but tens or even hundreds of other characteristics could be gathered via the browser when a user loads a web page[1].

Some of these characteristics are not very unique - like OS/Platform, where only 3-4 platforms are being used, but others could be very specific - like Canvas or browser version and details. By combining multiple unique characteristics, we can identify a particular device with a very high accuracy using only a few characteristics.

On the diagram below - with actual sample values for given characteristics - you can see how different characteristics and their combinations can narrow down the pool of possible devices as the final combination becomes extremely specific and, therefore, rare (color distinguishes the uniqueness of the information from green=low to red=high). Based on the example, it is clear that to reach (ideally) a unique fingerprint, we must combine several characteristics.

In the example above - the language preference (per my device) is set for the Slovak language (sk-SK). This narrows the pool of devices significantly, but that is because Slovak language is used by a relatively small number (a few million) of devices by people from my home country. On the other hand, if your preferred language is English, then you have narrowed the pool down only slightly, as more than 75% of users have English as a language preference.

Another aspect that needs to be considered is the frequency of changes in the characteristics. Some characteristics will change only rarely, if at all - like screen resolution or type of graphics card; others can change more often - e.g. Browser/User Agent. If our device fingerprint uses such characteristics, it might impact the accuracy of identifying the device over a longer period.

After we select the relevant characteristics into the mix, we need to create an ID representing each fingerprint. Without digging more into the technical aspect, we could use a one-way hash function (e.g. MD5 or SHA-1) to transform the collected details into a unique Hash ID.

Device ID in the fraud detection

OK, so now we have the Device fingerprint/ID. How can we use it for fraud detection?

The above graph shows the increased use of Bots for malicious activity, but the very first item - an increase of 108% for Account Takeover (ATO) attacks is the one we want to highlight[2].

ATO is one of the most common fraud scenarios that can greatly benefit from using the Device fingerprint/ID. Imagine that you can add a straightforward condition - the user has logged in from the new device - to all your existing rules aiming at ATO. Rules looking into customers' behavioral characteristics via monitoring of financial and non-financial transactions, monitoring common spending patterns, etc., and trying to spot anomalies and potential fraudsters. How would such a condition improve the accuracy of these rules? Quite a bit, right?

Even more exciting graphics was issued by Aite-Novarinca and Outseer[3], which nicely shows how disproportionate the fraud is when we consider the combination of the existing account + new device. The volume is very low, but the ratio of fraud within these few transactions is high.

Another scenario would be a SIM SWAP fraud. Would it help if you could identify that the user has logged in from a different/new device? I bet it would. Another use case is the same Device ID used by multiple customers - some False-Positives might occur but this would certainly lead to some very interesting findings, and the last one - building the Device Fingerprint/ID-based blacklist/watchlist.

And to tease you a bit more - imagine you can go beyond the Device fingerprint/ID and identify patterns like:

Device GPS location says the device is in a high-risk country
Multiple devices in close proximity
Device applied a jailbreak
Device has multiple e-banking applications installed
Customer log-in process was anomalous (e.g. different typing pattern)
etc.

Device and behavioral biometrics solutions add entirely new depth to the data that can be made available for the fraud analyst. By using these details and patterns, new rules can be built to cover many existing gaps as well as by combining them with the existing rules we could reduce their False-Positives (FP) significantly. But this is another topic for some future blog :)

Few important points to remember:

Device fingerprint/ID generated from descriptive characteristics collected from the user might not be 100% accurate, so ensure you consider this in your fraud rules, especially those deciding whether to stop the transaction on the fly (pre-authorization rules)
The fraudsters can indirectly alter device fingerprint/ID by altering the underlying characteristics - screen resolution, fonts, OS version, browser version etc.
There are also tools and browser add-ons to mislead or tamper with characteristics to purposefully manipulate the device fingerprint and increase the user's privacy
Device fingerprint/ID generally doesn't use IP details and relies solely on descriptive characteristics of the device and installed SW or components
GDPR categorizes browser fingerprints as personal data and they have to be treated accordingly (e.g. user consent, transparency, secured access, and storage, etc.)
Device fingerprint/ID provides important details about the point of origin of a given transaction; it's best utilized in conjunction with other techniques = layered approach

References:

[1] To see your device fingerprint and also explore other characteristics, visit: https://amiunique.org
[2] https://www.humansecurity.com/2023-enterprise-bot-fraud-benchmark-report
[3] https://www.outseer.com/aite-faster-payments/