This year saw the first workshop on Freedom of Communications on the Internet, co-located with USENIX Security in San Francisco. My contribution, co-authored with Ian Brown and Tulio de Souza, focused both on the means for mapping censorship in greater detail as well its legal and ethical implications.
The paper was inspired by the realization that censorship at the national level need not, and clearly often is not, applied equally across a country. The riots in Ürümqi, in Xinjiang, resulted in a blanket internet ban for that region that was not extended to the rest of China. The widely-reported shutdown of Egyptian internet service for several days during the 2011 Egyption revolution was not experienced, at least at first, on the ISP that provided service for important financial services. The ability to filter selectively is clearly, in the view of a censor, very useful.
Even when censorship is intended to apply equally, practical considerations can cause localized discrepancies. In large-scale or complex censorship regimes total centralization may be infeasible, resulting in censorship being delegated to local authorities or organizations. These may, in turn, make different choices in how to implement filtering at the local level, with varying results.
All previous major studies of internet censorship have considered filtering at the national level, without investigating the potential for local variation. It is therefore valuable to consider what local and organizational variations in censorship can reveal about how filtering is implemented and how it affects users.
The goal of this research, then, is in determining what filtering is being applied to a given remote computer, identified by its IP address. This IP can then be geolocated with a reasonable level of accuracy using the freely-available MaxMind GeoIP Cities Lite. To this end, we wish to view of the internet as seen by that computer. Tools such as Tor, psiphon and open virtual private networks (VPNs) provide exactly this functionality, but are unfortunately few and far between.
This problem is exacerbated when researching censorship, as redirection services are typically offered specifically to allow users to bypass censorship by routing their connections outside of a filtered area. There seems little incentive for people to offer anti-censorship tool exit points in known filtered locations. Tor, as an example, did not appear to offer any exit relays in mainland China when we conducted our experiments.
Without available services such as Tor, we began to investigate common services that allow connections to be bounced in a similar fashion. We are not interested in using these connections for any significant data transfer, and so even fragmentary information can be useful. With this perspective, DNS, IRC, FTP and others all offer potential informations sources to learn how remote systems see the internet.
It was shortly after beginning to consider these systems, however, that we became concerned with the ethical issues surrounding this type of research. Ignoring more technical approaches, the simplest way to learn how an individual computer’s connection is filtered is by contacting a remote user and asking them to run censorship detection code themselves. Whilst it can be difficult to scale such an approach, a great deal of information can be gained from each experiment run in this way. Unfortunately, probing censorship on the internet almost inevitably involves deliberately triggering a censorship mechanism, by attempting to access a blocked website, by searching for a banned term, or by transmitting data containing filtered keywords. When these attempts are made through a third party’s connection, potentially without their informed consent, consideration must be given as to the level of risk to which that user is exposed.
The nature of the risk can, however, be more subtle than simply coming to the attention of government censors. The Herdict project, a web-based censorship mapping tool, functions by loading potentially blocked pages as an HTML iframe embedded in their webpage, and users report whether the embedded page is visible or not. This embedded page, which loads on screen without warning and which can involve topics such as sexuality and political or religious expression, could cause anything from minor embarrassment to serious social or legal consequences if an unsuspecting user were observed viewing it in certain cultures.
Without an easy answer to these problems, we have limited ourselves to exploring DNS-level censorship. DNS servers are widespread on the internet, are often open to the general internet, and are public services run, in general, by organizations rather than individuals. This allows us to query for sites we believe to be blocked without exposing individuals to any form of risk. Obtaining a reasonable list of DNS servers in China was simple via a request to APNIC. It would also be simple to scan known Chinese IP blocks for open DNS servers, but we felt this to be unnecessary.
With a reasonable list of several hundred DNS servers, we retrieved a list of known blocked domain names from the Herdict project and automated the process of requesting the domain name to IP mapping from the remote DNS servers, comparing the results to those we could obtain from our own unfiltered connections.
Initial results demonstrate a fair amount of DNS poisoning, with fake results reported by several servers for known blocked sites such as facebook.com, twitter.com and wujie.net (the Chinese domain for the UltraSurf anti-censorship product), as well as many others. In a number of cases, DNS servers simply reported fake IP addresses that, on scanning, did not appear to offer any services. In other cases we observed DNS servers forwarding requests to alternate DNS servers, often located in Beijing, that then returned either fake results or no results at all. A number of servers returned no results at all for well-known blocked sites. Despite this, in a good number of cases we did receive genuine, correct responses from DNS servers.
The most interesting result, at a first glance, is the range of responses from the various servers. All possible behaviours, from genuine responses through faked results to no results at all were observed. There does not, from initial examination, seem to be an obvious pattern to the distribution of these different result types. This is doubly interesting in light of the various other methods of censorship, such as deep-packet inspection and TLS resets, that are known to be employed in China and which could be expected to make DNS poisoning unnecessary.
We currently have the raw data gathered from across China and are analyzing it for interesting patterns, we will also be re-running the experiment at regular intervals in order to observe how the patterns of blocking change over time. Of current interest is whether there are significant correlations between the types of filtering employed and the geographical or organizational distribution of the servers; those DNS servers that chose to redirect our requests are also a very interesting avenue of enquiry. Of the faked results received, we have already observed that these are often redirected to a small pool of “sink” IP addresses; whether these sinks are consistent across regions or organizations is not known.
There are many interesting questions to be answered from this line of research, and China is by no means the only country worth investigating. A more general point of interest is how to learn which sites to test for filtering. We have relied, to a large extent, on both the Herdict project’s list of sites, gathered through manual reporting from users around the world, and on our own knowledge of blocked sites. Automating this process of detecting filtered sites is certainly a problem worthy of further attention.
While there are serious legal and ethical limitations to researching censorship directly in this way, means to do so are available and allow scope for much interesting work. I look forward to sharing our results in the future.