Uncovering Tor users: where anonymity ends in the Darknet

Unlike conventional World Wide Web technologies, the Tor Darknet onion routing technologies give users a real chance to remain anonymous. Many users have jumped at this chance – some did so to protect themselves or out of curiosity, while others developed a false sense of impunity, and saw an opportunity to do clandestine business anonymously: selling banned goods, distributing illegal content, etc. However, further developments, such as the detention of the maker of the Silk Road site, have conclusively demonstrated that these businesses were less anonymous than most assumed.

The research was prepared in cooperation with Maria Garnaeva specifically for Securelist

Intelligence services have not disclosed any technical details of how they detained cybercriminals who created Tor sites to distribute illegal goods; in particular, they are not giving any clues how they identify cybercriminals who act anonymously. This may mean that the implementation of the Tor Darknet contains some vulnerabilities and/or configuration defects that make it possible to unmask any Tor user. In this research, we will present practical examples to demonstrate how Tor users may lose their anonymity and will draw conclusions from those examples.

How are Tor users pinned down?

The history of the Tor Darknet has seen many attempts – theoretical and practical – to identify anonymous users. All of them can be conditionally divided into two groups: attacks on the client’s side (the browser), and attacks on the connection.

Problems in Web Browsers

The leaked NSA documents tell us that intelligence services have no qualms about using exploits to Firefox, which was the basis for Tor Browser. However, as the NSA reports in its presentation, using vulnerability exploitation tools does not allow permanent surveillance over Darknet users. Exploits have a very short life cycle, so at a specific moment of time there are different versions of the browser, some containing a specific vulnerability and other not. This enables surveillance over only a very narrow spectrum of users.

Uncovering Tor users: where anonymity ends in the Darknet

The leaked NSA documents, including a review of how Tor users can be de-anonymized (Source:www.theguardian.com)

As well as these pseudo-official documents, the Tor community is also aware of other more interesting and ingenuous attacks on the client side. For instance, researchers from the Massachusetts Institute of Technology established that Flash creates a dedicated communication channel to the cybercriminal’s special server, which captures the client’s real IP address, and totally discredits the victim. However, Tor Browser’s developers reacted promptly to this problem by excluding Flash content handlers from their product.

Flash as a way to find out the victim’s real IP address (Source: http://web.mit.edu)

Another more recent method of compromising a web browser is implemented using the WebRTC DLL. This DLL is designed to arrange a video stream transmission channel supporting HTML5, and, similarly to the Flash channel described above, it used to enable the victim’s real IP address to be established. WebRTC’s so-called STUN requests are sent in plain text, thus bypassing Tor and all the ensuing consequences. However, this “shortcoming” was also promptly rectified by Tor Browser developers, so now the browser blocks WebRTC by default.

Attacks on the communication channel

Unlike browser attacks, attacks on the channel between the Tor client and a server located within or outside of the Darknet seem unconvincing. So far most of the concepts were presented by researchers in laboratory conditions and no ‘in-the-field’ proofs of concept have been yet presented.

Among these theoretical works, one fundamental text deserve a special mention – it is based on analyzing traffic employing the NetFlow protocol. The authors of the research believe that the attacker side is capable of analyzing NetFlow records on routers that are direct Tor nodes or are located near them. A NetFlow record contains the following information:

Protocol version number;
Record number;
Inbound and outgoing network interface;
Time of stream head and stream end;
Number of bytes and packets in the stream;
Address of source and destination;
Port of source and destination;
IP protocol number;
The value of Type of Service;
All flags observed during TCP connections;
Gatway address;
Masks of source and destination subnets.

In practical terms all of these identify the client.

De-anonymizing a Tor client based on traffic analysis
(Source: https://mice.cs.columbia.edu)

This kind of traffic analysis-based investigation requires a huge number of points of presence within Tor, if the attacker wants to be able to de-anonymize any user at any period of time. For this reason, these studies are of no practical interest to individual researchers unless they have a huge pool of computing resources. Also for this reason, we will take a different tack, and consider more practical methods of analyzing a Tor user’s activity.

Passive monitoring system

Every resident of the network can share his/her computing resources to arrange a Node server. A Node server is a nodal element in the Tor network that plays the role of an intermediary in a network client’s information traffic. In this Darknet, there several types of nodes: relay nodes and exit nodes. An exit node is an end link in traffic decryption operation, so they are an end point which may become the source of leaking interesting information.

Our task is very specific: we need to collect existing and, most importantly, relevant onion resources. We cannot solely rely on internal search engines and/or website catalogs, as these leave much to be required in terms of the relevance and completeness of contained information.

However, there is a straightforward solution to the problem of aggregation of relevant websites. To make a list of the onion resources that were recently visited by a Darknet user, one needs to track each instance of accessing them. As we know, an exit node is the end point of the path that encrypted packets follow within the Darknet, so we can freely intercept HTTP/HTTPS protocol packets at the moment when they are decrypted at the exit node. In other words, if the user uses the Darknet as an intermediary between his/her browser and a web resource located in the regular Internet, then Tor’s exit node is the location where packets travel in unencrypted format and can be intercepted.

We know that an HTTP packet may contain information about web resources, including onion resources that were visited earlier. This data is contained in the ‘Referrer’ request header, which may contain the URL address of the source of the request. In the regular Internet, this information helps web masters determine which search engine requests or sites direct users towards the web resource they manage.

In our case, it is enough to scan the dump of intercepted traffic with a regular expression containing the string ‘onion’.

There is a multitude of articles available on configuring exit nodes, so we will not spend time on how to configure an exit node, but instead point out a few details.

First of all, it is necessary to set an Exit Policy that allows traffic communication across all ports; this should be done in the configuration file torrc, located in the Tor installation catalog. This configuration is not a silver bullet but it does offer a chance of seeing something interesting at a non-trivial port.

>> ExitPolicy accept *:*

The field ‘Nickname’ in the torrc file does not have any special meaning to it, so the only recommendation in this case is not to use any conspicuous (e.g. ‘WeAreCapturingYourTraffic’) node names or those containing numbers (‘NodeNumber3′) that might suggest an entire network of such nodes.

After launching a Tor server, it is necessary to wait until it uploads its coordinates to the server of directories – this will help our node declare itself to all Darknet participants.

An exit node in operation

After we launched the exit mode and it began to pass Tor users’ traffic through itself, we need to launch a traffic packet sniffer and intercept the passing traffic. In this case, tshark acts as a sniffer, listening to interface #1 (occupied by Tor) and putting the dump into the file ‘dump.pcap’:

>>tshark –i 1 –w dump.pcap

Tshark intercepts packets that pass through the exit node in unencrypted format

All the above actions must be done on as many servers as possible to collect as much information of interest as possible. It should be noted that the dump grows quite quickly, and it should be regularly collected for analysis.

Thus, once you receive a huge dump, it should be analyzed for onion resources. Skimming through the dump helps to categorize all resources visited by Tor users by content type.

It should be noted that 24 hours of uninterrupted traffic interception (on a weekday) produce up to 3GB of dump for a single node. Thus, it cannot be simply opened with Wireshark – it won’t be able to process it. To analyze the dump, it must be broken into smaller files, no larger than 200 MB (this value was determined empirically). To do so, the utility ‘editcamp’ is used alongside with Wireshark:

>> editcap – c 200000 input.pcap output.pcap

In this case, 200,000 represents the number of packets in a single file.

While analyzing a dump, the task is to search for strings containing the substring “.onion”. This very likely will find Tor’s internal resources. However, such passive monitoring does not enable us to de-anonymize a user in the full sense of the word, because the researcher can only analyze those data network packets that the users make available ‘of their own will’.

Active monitoring system

To find out more about a Darknet denizen we need to provoke them into giving away some data about their environment. In other words, we need an active data collection system.

An expert at Leviathan Security discovered a multitude of exit nodes and presented a vivid example of an active monitoring system at work in the field. The nodes were different from other exit nodes in that they injected malicious code into that binary files passing through them. While the client downloaded a file from the Internet, using Tor to preserve anonymity, the malicious exit node conducted a MITM-attack and planted malicious code into the binary file being downloaded.

This incident is a good illustration of the concept of an active monitoring system; however, it is also a good illustration of its flipside: any activity at an exit node (such as traffic manipulation) is quickly and easily identified by automatic tools, and the node is promptly blacklisted by the Tor community.

Let’s start over with a clean slate

HTML5 has brought us not only WebRTC, but also the interesting tag ‘canvas’, which is designed to create bitmap images with the help of JavaScript. This tag has a peculiarity in how it renders images: each web-browser renders images differently depending on various factors, such as:

Various graphics drivers and hardware components installed on the client’s side;
Various sets of software in the operating system and various configurations of the software environment.

The parameters of rendered images can uniquely identify a web-browser and its software and hardware environment. Based on this peculiarity, a so-called fingerprint can be created. This technique is not new – it is used, for instance, by some online advertising agencies to track users’ interests. However, not all of its methods can be implemented in Tor Browser. For example, supercookies cannot be used in Tor Browser, Flash and Java is disabled by default, font use is restricted. Some other methods display notifications that may alert the user.

Thus, our first attempts at canvas fingerprinting with the help of the getImageData()function that extracts image data, were blocked by Tor Browser:

However, some loopholes are still open at this moment, with which fingerprinting in Tor can be done without inducing notifications.

By their fonts we shall know them

Tor Browser can be identified with the help of the measureText()function, which measures the width of a text rendered in canvas:

Using measureText() to measure a font size that is unique to the operating system

If the resulting font width has a unique value (it is sometimes a floating point value), then we can identify the browser, including Tor Browser. We acknowledge that in some cases the resulting font width values may be the same for different users.

It should be noted that this is not the only function that can acquire unique values. Another such function is getBoundingClientRect(),which can acquire the height and the width of the text border rectangle.

When the problem of fingerprinting users became known to the community (it is also relevant to Tor Browser users), an appropriate request was created. However, Tor Browser developers are in no haste to patch this drawback in the configuration, stating that blacklisting such functions is ineffective.

Tor developer’s official reply to the font rendering problem

Field trials

This approach was applied by a researcher nicknamed “KOLANICH”. Using both functions, measureText() and getBoundingClientRect(), he wrote a script, tested in locally in different browsers and obtained unique identifiers.

Using the same methodology, we arranged a test bed, aiming at fingerprinting Tor Browser in various software and hardware environments.

To address this problem, we used one author’s blog and embedded a JavaScript that uses the measureText() and getBoundingClientRect()functions to measure fonts rendered in the web browser of the user visiting the web-page. The script sends the measured values in a POST request to the web server which, in turn, saves this request in its logs.

A fragment of a web-server’s log with a visible Tor Browser fingerprint

At this time, we are collecting the results of this script operating. To date, all the returned values are unique. We will publish a report about the results in due course.

Possible practical implications

How can this concept be used in real world conditions to identify Tor Browser users? The JavaScript code described above can be installed on several objects that participate in data traffic in the Darknet:

Exit node. A MITM-attack is implemented, during which JavaScript code is injected into all web pages that a Darknet resident visits in the outside Web.
Internal onion resources and external web sites controlled by the attackers. For example, an attacker launches a ‘doorway’, or a web page specially crafted with a specific audience in view, and fingerprints all visitors.
Internal and external websites that are vulnerable to cross-site scripting (XSS) vulnerabilities (preferably stored XSS, but this is not essential).

Objects that could fingerprint a Tor user

The last item is especially interesting. We have scanned about 100 onion resources for web vulnerabilities (these resources were in the logs of the passive monitoring system) and filtered out ‘false positives’. Thus, we have discovered that about 30% of analyzed Darknet resources are vulnerable to cross-site scripting attacks.

All this means that arranging a farm of exit nodes is not the only method an attacker can use to de-anonymize a user. The attacker can also compromise web-sites and arrange doorways, place a JavaScript code there and collect a database of unique fingerprints.

The process of de-anonymizing a Tor user

So attackers are not restricted to injecting JavaScript code into legal websites. There are more objects where a JavaScript code can be injected, which expands the number of possible points of presence, including those within the Darknet.

Following this approach, the attacker could, in theory, find out, for instance, sites on which topics are of interest to the user with the unique fingerprint ‘c2c91d5b3c4fecd9109afe0e’, and on which sites that user logs in. As a result, the attacker knows the user’s profile on a web resource, and the user’s surfing history.

In place of a conclusion

At Tor project’s official website, the developers posted an answer to the question “why is JavaScript allowed by Tor Browser?”:

Tor Browser’s official answer to a question about JavaScript

It appears from this answer that we should not expect the developers to disable JavaScript code in TorBrowser.