A Quick Investigation of EdgeCast CDN Blocking in China

November 18th, 2014 No comments

This morning, GreatFire.org published a story stating that EdgeCast CDN, one of the more popular content distribution networks that handles content for a number of large websites, has been blocked by the Chinese national filter. As a result, a friend emailed me asking what I thought, and pointed out that all we have are a few reports and a link to a status update from EdgeCast themselves.

As usual, my attempt to write a short email failed, and I ended up carrying out an impromptu investigation into this. With minor edits, I’ve reproduced my email detailing how I looked into this below. For reference, this was carried out from an internet connection based in Oxford, UK.

Based on prior knowledge we have evidence that China will man-in-the-middle (UDP) DNS requests for blocked sites, but ignore genuine ones. So first, let’s pick a Chinese IP address almost at random:

joss@kafka:~$ ping baidu.cn
PING baidu.cn ( 56(84) bytes of data.

Check it’s actually in China, using the MaxMind GeoIP database:

joss@kafka:~$ geoiplookup
GeoIP Country Edition: CN, China
GeoIP City Edition, Rev 1: CN, 22, Beijing, Beijing, N/A, 39.928902,
116.388298, 0, 0

Check that it’s not a DNS server:

joss@kafka:~$ dig @ baidu.cn

; <<>> DiG 9.9.2-P2 <<>> @ baidu.cn
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Excellent. No response. Now, perform a DNS lookup from our (presumably uncensored) connection in the UK to an edgecastcdn.net host:

joss@kafka:~$ dig edgecastcdn.net

; <<>> DiG 9.9.2-P2 <<>> edgecastcdn.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 635 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;edgecastcdn.net. IN A ;; ANSWER SECTION: edgecastcdn.net. 3370 IN A

Now let’s see what happens if we look up edgecastcdn.net at the baidu.cn IP, recalling that it is not actually a DNS server:

joss@kafka:~$ dig @ edgecastcdn.net

; <<>> DiG 9.9.2-P2 <<>> @ edgecastcdn.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63357 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;edgecastcdn.net. IN A ;; ANSWER SECTION: edgecastcdn.net. 30640 IN A ;; Query time: 397 msec ;; SERVER: ;; WHEN: Tue Nov 18 12:02:53 2014 ;; MSG SIZE rcvd: 64

Interesting. We get a response, which took 397 milliseconds. Let's look up the returned IP:

joss@kafka:~$ dig -x

; <<>> DiG 9.9.2-P2 <<>> -x
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 7947 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ; IN PTR ;; AUTHORITY SECTION: 7.98.203.in-addr.arpa. 3600 IN SOA ns1.netlink.co.nz. soa.netlink.co.nz. 2008110600 7200 1200 1728000 172800 ;; Query time: 767 msec ;; SERVER: ;; WHEN: Tue Nov 18 12:11:43 2014 ;; MSG SIZE rcvd: 110

That doesn't look like a genuine response! A quick WHOIS:

joss@kafka:~$ whois netlink.co.nz
% New Zealand Domain Name Registry Limited
% Users confirm on submission their agreement to all published Terms
version: 5.00
query_datetime: 2014-11-19T01:12:03+13:00
domain_name: netlink.co.nz
query_status: 200 Active
domain_dateregistered: 1997-03-24T00:00:00+12:00
domain_datebilleduntil: 2014-12-01T00:00:00+13:00
domain_datelastmodified: 2014-11-01T23:37:30+13:00
domain_delegaterequested: yes
domain_signed: no
registrar_name: Vodafone New Zealand Limited (Clear)
registrar_address1: Private Bag 92161
registrar_city: Auckland
registrar_country: NZ (NEW ZEALAND)
registrar_phone: +64 508 888 800
registrar_email: registry@clear.net.nz
registrant_contact_name: NetLink
registrant_contact_address1: PO Box 5358
registrant_contact_city: Wellington
registrant_contact_country: NZ (NEW ZEALAND)
registrant_contact_phone: +64 4 9228499
registrant_contact_fax: +64 4 9228401
registrant_contact_email: dns@netlink.co.nz
admin_contact_name: Netlink Operations Centre
admin_contact_address1: PO Box 5358
admin_contact_city: Wellington
admin_contact_country: NZ (NEW ZEALAND)
admin_contact_phone: +64 4 922 8499
admin_contact_fax: +64 4 922 8401
admin_contact_email: dns@netlink.co.nz
technical_contact_name: Netlink Operations Centre
technical_contact_address1: PO Box 1762
technical_contact_address2: Wellington, New Zealand
technical_contact_phone: +64 4 495 5021
technical_contact_fax: +64 4 495 5197
technical_contact_email: dns@netlink.co.nz
ns_name_01: ns1.netlink.co.nz
ns_name_02: ns2.netlink.co.nz

I'd say that that's pretty solid evidence of interference at the network level -- the request to edgecastcdn.net appears to be redirected to a Vodafone-operated host in New Zealand. I'm interested by the DNS response time of 397 msecs for the lookup, and I have a feeling that that could reveal interesting things about the possible location of the man-in-the-middle attack. A quick whois on baidu.cn gives their genuine DNS server as being located at, so we can use that as our test IP instead:

joss@kafka:~$ dig @ baidu.cn

;; Query time: 313 msec

joss@kafka:~$ dig @ edgecastcdn.net

;; Query time: 281 msec

With one data point it seems that a reply for a censored domain is a few tens of milliseconds quicker than an uncensored one, but one data point does not science make. Here's 100 requests to each:

joss@kafka:~$ for i in $(seq 1 100); do
dig @ edgecastcdn.net | grep "Query time" | sed -e "s/.*\: \(.*\) msec/\1/" >> edgecastresults.txt;

joss@kafka:~$ for i in $(seq 1 100); do
dig @ baidu.cn | grep "Query time" | sed -e "s/.*\: \(.*\) msec/\1/" >> baiduresults.txt;

That gives us two set of statistics that we can summarise quickly in R:

joss@kafka:~$ R
> baidu <- read.csv('baiduresults.txt') > edgecast <- read.csv('edgecastresults.txt') > summary( baidu )

Min. :286.0
1st Qu.:303.0
Median :306.0
Mean :323.2
3rd Qu.:309.0
Max. :652.0

> summary( edgecast )

Min. :249.0
1st Qu.:293.5
Median :306.0
Mean :314.7
3rd Qu.:309.0
Max. :622.0

The timing doesn't seem to be that different overall. I suspect that, with the Baidu DNS being in Beijing, there is a good chance of the man-in-the-middle attacks being sufficiently geographically close that it makes little difference to the output. Maybe we could pick a reasonable DNS server that isn't in Beijing and test against that.

A quick Google for 'china dns server' gives this page: https://sites.google.com/site/kiwi78/public-dns-servers, and we randomly pick one that claims to be in Chengdu.

joss@kafka:~$ dig @ baidu.cn

; <<>> DiG 9.9.2-P2 <<>> @ baidu.cn
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7210 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 5 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;baidu.cn. IN A ;; ANSWER SECTION: baidu.cn. 120 IN A baidu.cn. 120 IN A baidu.cn. 120 IN A ;; AUTHORITY SECTION: baidu.cn. 120 IN NS ns2.baidu.com. baidu.cn. 120 IN NS ns3.baidu.com. baidu.cn. 120 IN NS ns4.baidu.com. baidu.cn. 120 IN NS ns1.baidu.com. ;; ADDITIONAL SECTION: ns1.baidu.com. 75287 IN A ns2.baidu.com. 74049 IN A ns3.baidu.com. 74049 IN A ns4.baidu.com. 74049 IN A ;; Query time: 426 msec ;; SERVER: ;; WHEN: Tue Nov 18 12:29:19 2014 ;; MSG SIZE rcvd: 230 joss@kafka:~/tmp/exp$ geoiplookup GeoIP Country Edition: CN, China GeoIP City Edition, Rev 1: CN, 32, Sichuan, Chengdu, N/A, 30.666700, 104.066704, 0, 0

That responds to our innocuous query, and seems to be in Chengdu according to our geoip database. Let's try the 100 requests trick on it. (I won't repeat the code as it's illustrated above.):

> summary(baidu)
Min. :275.0
1st Qu.:305.0
Median :321.0
Mean :359.8
3rd Qu.:408.8
Max. :868.0

> summary(edgecast)
Min. :247
1st Qu.:307
Median :317
Mean :335
3rd Qu.:335
Max. :628

There still isn't anything particularly damning based on the timing. Of course, there are lots of issues with DNS caching to think about, and my simple shell scripts didn't check for things like timeouts or no replies, so this might not be as simple as it looks, but we've still got some interesting data with which to play. At the very least, I can support GreatFire's claim that China are doing things to the edgecastcdn domain.

The first improvement that I'll make, which will probably run tonight, is to add a `sleep 90` call in the for loop that runs the repeated requests. That should hopefully avoid the most obvious form of rate limiting.

This wasn't intended to be a full and detailed investigation, but certainly threw up some interesting findings and a confirmation of the key points of GreatFire.org's story. In the wider sense, the idea of blocking a content distribution network rather than a website itself is a bold step and has significant collateral implications, as GreatFire point out at length.

For me, looking forward, the most interesting aspects of these stories are to derive the logic and intentions behind the behaviour itself, which will require much more significant analytical and theoretical tools than simply probing what was blocked where and how. Why are these particular networks or sites chosen for filtering, and what can that tell us? How quickly are sites blocked, and unblocked, in relation to political or social events? Filtering provides a fascinating lens into the motivations and through processes of those carrying it out. As blocking policies develop, with governments and populations become more comfortable with using the internet in all aspects of life, the ability to draw inferences from such overt interference in the network will be an incredibly rich seam to mine.

Categories: Censorship, China Tags:

Open Rights Group Report: “Digital Surveillance”

May 16th, 2013 No comments

The Open Rights Group have recently released “Digital Surveillance – Why the Snoopers’ Charter is the wrong approach: A call for targeted and accountable investigatory powers”. The report sets out various arguments as to why the proposed Communications Data Bill in the UK, which aims to massively extend the scope of surveillance over online communications, is a terrible, counterproductive, expensive, and unnecessary idea.

I was asked to contribute a section on the risks of the existing proposals, and some thoughts on where things should go in the future. My contribution is reproduced here, but please do go and download the full report.

Where laws intersect with technology, as is strikingly the case with surveillance, the discrepancy between the pace of technological change and the pace of legal change requires lawmakers to consider carefully the risks that arise from the future development and application of technologies. Crucially, and challengingly, it is necessary to differentiate between the limitations that exist in current technologies, and will disappear as technology develops; and those limitations that are fixed and inherent.

Information technologies, and in particular the Internet, have expanded the potential for surveillance to a degree that would have seemed fantastical in previous decades. Unprecedented levels of data can now be collected, stored, and analysed, and can be combined and controlled with an amazing degree of centrality.

The technical capabilities of the Internet not only allow this surveillance, they encourage us, through convenience, to place more and more of our lives into the spotlight. We now read news, search for information, talk to friends, organize social and business life, bank, and meet potential partners via the Internet. There is no precedent that can even approximate a model for the pervasiveness of the Internet in our lives — not the phone network, not post or telegraph, not CCTV surveillance. Equating the Internet with historical technologies when making policy is not simply wrong, it is dangerously misleading.

From the state’s perspective, the desire for surveillance is easy to understand. Such a wealth of data seems to promise an oracle allowing security services not only to investigate, but also to detect, predict and prevent crimes — and ubiquitous surveillance can, certainly, achieve some of these goals.

The sheer wealth of data that surveillance reveals, however, tips the balance decisively from its power to help towards its power to harm. Vast amounts of information can be handled by faster and faster computers, but the power and accuracy of the predictive algorithms are not so scalable — when applied blindly to entire populations the ability to identify suspicious patterns is lost in the flood and becomes either worthless or actively harmful.

Pervasive and detailed information on individuals is a powerful tool. When investigating a crime the details of a suspect’s activities, communications, and habits can be highly valuable. This tool, however, can be used just as effectively against all those individuals who are not under suspicion — blackmail, fraud, stalking, and simple invasion of privacy are all enabled by such collections of data just as effectively as the investigation of crime. Placing an entire population in handcuffs to ensure that the criminals have been caught is not an acceptable policy.

As such, any legal framework for enabling surveillance must, in the first instance, be based on the notion of targeted gathering of data on well-justified grounds. This precludes the a priori gathering and storage of data — such gathering should only occur in response to justified suspicions. Data that is found not to be useful, particularly where it concerns third parties, must be deleted quickly and verifiably. Further, there should be no institutionalised technical mechanism to surveil communications; instead, surveillance requests should be made directly to service providers who must be free to manage and control their own platforms.

As has been observed with existing laws, such as the UK’s own RIPA, surveillance powers are easily and widely abused. Strict and independent audit, therefore, both of surveillance requests and data handling should be a key feature of any proposed surveillance framework. This must, of course, be supported by stringent penalties for misuse of either powers or data. Transparency, imposed both at a legal level and by the need to interact with private organisations that control infrastructure, is the only hope to mitigate the abuses that inevitably accompany such approaches.

The technological landscape in which we find ourselves is one in which the potential for surveillance is vast and growing. Surveillance law must therefore focus on restraining risks and abuses, without being carried away by false promises of effectiveness. Minimisation, decentralisation, accountability and limitation of access are all necessary steps to ensure that the cure is not worse than the disease.

Categories: Policy, Surveillance Tags:

Chinese Internet Filtering: The Curious Case of the Florida Pet Club

December 23rd, 2012 No comments

Of the various ways to filter the internet, manipulating DNS is probably the simplest and cheapest in terms of resources. DNS, the Domain Name Service, is the mapping between the human-readable URLs that we use, like https://www.pseudonymity.net, and the more machine-friendly IP addresses, like

The Chinese Golden Shield Project, or Great Firewall, famously makes use of a range of techniques. These include keyword filtering, as reported by Clayton et al., as well as active blocking of services such as Tor at the IP level, and more manual censorship and takedown on services like Weibo.

In the past year or so I’ve spent some time tinkering with exactly how China’s internet is filtered. In particular, I’ve been interested in the extent to which the system is centrally-driven, with blanket country-wide decisions and implementation, against how many of its decisions are loose and locally applied by regional authorities and ISPs.

To study this it is more or less useless to fire up a VPN, or a copy of Tor, and run network tests. Filtering conditions may vary by ISP, by province, by city or by ISP. When I see a report that some site ‘is blocked in China’, my immediate response has become to ask where. On which ISP? Using what method?

Instead, we need to study internet filtering with enough resolution to capture a potentially complex and varied filtering landscape. Multiple systems across the country, on different networks, must be collated and compared. From my personal research bias and interests, I’m interested in how this can be achieved remotely. I’ve presented work on this elsewhere, (see my FOCI paper), but I’m concerned with the approach of asking users to install censorship monitoring software on their systems when those users are often not technically qualified to understand the risks. I don’t genuinely understand the risks of accessing http://www.tibet.net from inside China, and I can’t justify asking another, probably less informed, user to do it on my behalf.

As a result, to date I’ve focused largely on looking at how China poisons DNS. DNS poisoning is a common and relatively simple way to stop people reaching web pages, provided that you’re happy to block the entire domain. DNS servers are relatively common, and also are usually open to requests from anywhere, meaning that it’s easy to get wide coverage at a relatively high resolution.

The short version of the results are that poisoning is rife across China, with Twitter being the most widely poisoned domain. The most common type of poisoning is to misdirect by providing an incorrect IP address, rather than to claim that a given domain does not exist.

In general the majority of this IP misdirection seems to be more or less random. Users attempting to reach sites such as http://www.tibet.net or http://www.voanews.com may find themselves redirected to computers in Korea, Azerbaijan, the US or China itself. While these addresses vary across the country, there are cases of correlation. DNS servers in different cities, operated by different ISPs, will quite often redirect to the same incorrect IP addresses. A quick analysis of these shows a range of relatively innocent-looking systems that seem to have been plucked more or less at random.

One of the more interesting examples was looking at the redirection surrounding the Tor Project’s website: https://www.torproject.org. Tor is a well-known target of filtering in China, so it isn’t surprising that they are being DNS poisoned. What was interesting, however, was finding that across China more than 14 separate servers all redirected https://www.torproject.org to a website owned by “The Pet Club”, a Florida-based pet-grooming service. When New Scientist magazine wrote a short article on this work recently, they contacted the webmaster of http://www.thepetclubfl.net to get his thoughts. The most interesting result of that conversation was that The Pet Club do experience a high volume of traffic from Chinese users, showing that the relevant IP address is not blocked. This was by no means the only example of such behaviour.

If we question why this all happens we are clearly moving from facts to speculation, but here are my theories for this strange behaviour:

  1. DNS poisoning by returning incorrect IP addresses results in a connection to the (fake) site. Assuming that these are likely to be outside of China, this means that Chinese border routers will observe connections to these IP addresses, which are unlikely to be visited by average Chinese users. Therefore it will be possible to observe users trying to get to https://www.torproject.org while still blocking their attempts. Simply returning ‘no such domain’ would effectively block many users, but would not reveal the scale of the connection attempts.
  2. The correlation of IP addresses shows some level of central decision-making. My theory is certainly that a large amount of the filtering decisions come in the form of guidelines rather than strict rules, resulting in variations as the guidelines are implemented by local ISPs and organisations. Despite this, there do seem to be situations where stronger rules are sent. The Pet Club case, for example, could feasilbly be explained by an instruction to ‘redirect the Tor Project’s website to thepetclubfl.net’, which was then inserted into local DNS servers.

What I find fascinating in this work is the complexity of blocking. We can study DNS redirection, IP blocking, keyword filtering, BGP manipulation, and social media takedown, but this is just a technical angle. On top of this we have variations from location to location, variation over time, the choice of what method to use for particular blocking targets, and now whether or not to send precise blocking commands or more flexible guidelines.

I have many plans to extend this work to the rest of the world, and to produce similar high-resolution testing for IP reachability and other forms of filtering while still avoiding the need to involve individuals on the ground. What this study of DNS filtering shows is that the overall story of the filtering is far more complicated than simply asking what websites are blocked in which countries.

Just to conclude: a diagram of the IP redirection poisoning seen for China as of September 2012. The scale runs from green to red, with red showing more DNS redirection for censored sites and green showing less. Size of circles indicate the number of results available, in terms of the number of servers, for a given city.

Misleading DNS results

Misleading DNS Results across China — Red shows more misdirection, green shows more genuine results. Larger circles denote more results.

Categories: Censorship, China Tags:

Workshop on Free and Open Communications on the Internet (FOCI’12)

March 31st, 2012 No comments

Following on from the fantastically interesting FOCI workshop last year, I am co-chairing this year’s FOCI workshop along with Roger Dingledine of the Tor Project. The workshop will again be co-located with USENIX Security, which is being held this year in Bellevue, Washington in August.

Although FOCI revolves around USENIX Security, and therefore by default falls on the more technical side of research, we are actively encouraging submissions from any field with something interesting to say on internet censorship. Social science, political science, law, economics, ethics, psychology — if you have something to say then send us your work!

The call for papers is here: https://www.usenix.org/conference/foci12

I hope to see you there!

Categories: Censorship, Conference Tags:

Discussing Online Privacy in the Observer, with Tom Chatfield

March 4th, 2012 No comments

I was recently approached by the Observer to take part in an email-based discussion with Tom Chatfield about online privacy and the direction that companies like Facebook and Google are taking us.

It was a lot of fun to write, over the course of a day, and there were some interesting points raised. 1000 words each isn’t enough to explore very much, but I found it surprisingly useful for clarifying my thoughts on the subject, and quite inspiring for some of the future work that is constantly buzzing around my head.

The original story on the Observer is here.

Categories: Article, Privacy Tags:

Presentation on Mapping Chinese Censorship

December 29th, 2011 No comments

I recently presented my work on censorship mapping to my colleagues at the OII, including a couple of maps with early analysis of DNS manipulation in Chinese cities.

The analysis is very preliminary, and there are considerable caveats even for the early results, but here’s the presentation:

Categories: Censorship, China Tags:

Freedom of Communication on the Internet Workshop (FOCI): Fine-Grained Censorship Mapping — Information Sources, Legality and Ethics

November 2nd, 2011 No comments

This year saw the first workshop on Freedom of Communications on the Internet, co-located with USENIX Security in San Francisco. My contribution, co-authored with Ian Brown and Tulio de Souza, focused both on the means for mapping censorship in greater detail as well its legal and ethical implications.

The paper was inspired by the realization that censorship at the national level need not, and clearly often is not, applied equally across a country. The riots in Ürümqi, in Xinjiang, resulted in a blanket internet ban for that region that was not extended to the rest of China. The widely-reported shutdown of Egyptian internet service for several days during the 2011 Egyption revolution was not experienced, at least at first, on the ISP that provided service for important financial services. The ability to filter selectively is clearly, in the view of a censor, very useful.

Even when censorship is intended to apply equally, practical considerations can cause localized discrepancies. In large-scale or complex censorship regimes total centralization may be infeasible, resulting in censorship being delegated to local authorities or organizations. These may, in turn, make different choices in how to implement filtering at the local level, with varying results.

All previous major studies of internet censorship have considered filtering at the national level, without investigating the potential for local variation. It is therefore valuable to consider what local and organizational variations in censorship can reveal about how filtering is implemented and how it affects users.

The goal of this research, then, is in determining what filtering is being applied to a given remote computer, identified by its IP address. This IP can then be geolocated with a reasonable level of accuracy using the freely-available MaxMind GeoIP Cities Lite. To this end, we wish to view of the internet as seen by that computer. Tools such as Tor, psiphon and open virtual private networks (VPNs) provide exactly this functionality, but are unfortunately few and far between.

This problem is exacerbated when researching censorship, as redirection services are typically offered specifically to allow users to bypass censorship by routing their connections outside of a filtered area. There seems little incentive for people to offer anti-censorship tool exit points in known filtered locations. Tor, as an example, did not appear to offer any exit relays in mainland China when we conducted our experiments.

Without available services such as Tor, we began to investigate common services that allow connections to be bounced in a similar fashion. We are not interested in using these connections for any significant data transfer, and so even fragmentary information can be useful. With this perspective, DNS, IRC, FTP and others all offer potential informations sources to learn how remote systems see the internet.

It was shortly after beginning to consider these systems, however, that we became concerned with the ethical issues surrounding this type of research. Ignoring more technical approaches, the simplest way to learn how an individual computer’s connection is filtered is by contacting a remote user and asking them to run censorship detection code themselves. Whilst it can be difficult to scale such an approach, a great deal of information can be gained from each experiment run in this way. Unfortunately, probing censorship on the internet almost inevitably involves deliberately triggering a censorship mechanism, by attempting to access a blocked website, by searching for a banned term, or by transmitting data containing filtered keywords. When these attempts are made through a third party’s connection, potentially without their informed consent, consideration must be given as to the level of risk to which that user is exposed.

The nature of the risk can, however, be more subtle than simply coming to the attention of government censors. The Herdict project, a web-based censorship mapping tool, functions by loading potentially blocked pages as an HTML iframe embedded in their webpage, and users report whether the embedded page is visible or not. This embedded page, which loads on screen without warning and which can involve topics such as sexuality and political or religious expression, could cause anything from minor embarrassment to serious social or legal consequences if an unsuspecting user were observed viewing it in certain cultures.

Without an easy answer to these problems, we have limited ourselves to exploring DNS-level censorship. DNS servers are widespread on the internet, are often open to the general internet, and are public services run, in general, by organizations rather than individuals. This allows us to query for sites we believe to be blocked without exposing individuals to any form of risk. Obtaining a reasonable list of DNS servers in China was simple via a request to APNIC. It would also be simple to scan known Chinese IP blocks for open DNS servers, but we felt this to be unnecessary.

With a reasonable list of several hundred DNS servers, we retrieved a list of known blocked domain names from the Herdict project and automated the process of requesting the domain name to IP mapping from the remote DNS servers, comparing the results to those we could obtain from our own unfiltered connections.

Initial results demonstrate a fair amount of DNS poisoning, with fake results reported by several servers for known blocked sites such as facebook.com, twitter.com and wujie.net (the Chinese domain for the UltraSurf anti-censorship product), as well as many others. In a number of cases, DNS servers simply reported fake IP addresses that, on scanning, did not appear to offer any services. In other cases we observed DNS servers forwarding requests to alternate DNS servers, often located in Beijing, that then returned either fake results or no results at all. A number of servers returned no results at all for well-known blocked sites. Despite this, in a good number of cases we did receive genuine, correct responses from DNS servers.

The most interesting result, at a first glance, is the range of responses from the various servers. All possible behaviours, from genuine responses through faked results to no results at all were observed. There does not, from initial examination, seem to be an obvious pattern to the distribution of these different result types. This is doubly interesting in light of the various other methods of censorship, such as deep-packet inspection and TLS resets, that are known to be employed in China and which could be expected to make DNS poisoning unnecessary.

We currently have the raw data gathered from across China and are analyzing it for interesting patterns, we will also be re-running the experiment at regular intervals in order to observe how the patterns of blocking change over time. Of current interest is whether there are significant correlations between the types of filtering employed and the geographical or organizational distribution of the servers; those DNS servers that chose to redirect our requests are also a very interesting avenue of enquiry. Of the faked results received, we have already observed that these are often redirected to a small pool of “sink” IP addresses; whether these sinks are consistent across regions or organizations is not known.

There are many interesting questions to be answered from this line of research, and China is by no means the only country worth investigating. A more general point of interest is how to learn which sites to test for filtering. We have relied, to a large extent, on both the Herdict project’s list of sites, gathered through manual reporting from users around the world, and on our own knowledge of blocked sites. Automating this process of detecting filtered sites is certainly a problem worthy of further attention.

While there are serious legal and ethical limitations to researching censorship directly in this way, means to do so are available and allow scope for much interesting work. I look forward to sharing our results in the future.

Paper: Fine-Grained Censorship Mapping: Information Sources, Legality and Ethics

Categories: Censorship, China, Conference Tags:

Experiences of Chinese Internet Censorship

September 12th, 2011 2 comments

I was recently invited to speak at Dalian Technical University, in Liaoning Province in Northern China, and took the opportunity afterwards to spend three weeks travelling around China with my family. (Finally putting several years of studying Mandarin into practice, with a reasonable level of success, and having a fantastic time.)

Being in China, I couldn’t help but poke a little at the limitations imposed on my connection. Travelling with 14-month old twins is a full-time job, albeit one that I can highly recommend, which did not leave me a great deal of time to analyse connections. I will therefore only report on my personal experiences and impressions, although the data that I did gather will hopefully be useful for a future paper based on work that I presented at FOCI’11. As such, anyone who knows a little about Chinese state-level internet censorship is unlikely to find anything new here.

In my time in China, I ran simple filtering tests on all the Internet connections to which I had access, covering locations in Beijing, Dalian, Shanghai and Hangzhou. I also took the chance to run code to test local nameservers for DNS manipulation when requesting known blocked sites.

The most notable observations from my own experiences were:

  • Secondary effects of blocking
  • Twitter and Facebook are some of the more well-known blocked sites in China. In the course of normal usage, it is simple to avoid such sites. (Chinese users, of course, have a variety of alternatives for Facebook, with Sina Weibo in place of Twitter.)

    What is more noticeable, when browsing normal websites and blogs, is the severe slowdown caused by the inability to load Twitter’s “Tweet this” and Facebook’s “Like this” buttons that are now commonly embedded on blogs and news sites. Firefox is unwilling to render the page until these load or, presumably, time out, which cripples many sites.

    (It’s worth mentioning that all connections to which I had access were relatively slow and unreliable by UK standards, adding to this effect.)

  • Tor blocking
  • Tor is a standard presence on my netbook, despite not being used for everyday browsing. As expected, the comforting green onion on my taskbar faded to a sickly yellow for my entire journey. I didn’t, sadly, have time to experiment with Tor bridges.

  • Kindle is still uncensored
  • One of the amusing censorship stories of this year has been the discovery that Twitter, and apparently all other sites, is not blocked when using the Kindle’s built-in browser. This is caused by the Kindle automatically routing all browsing requests through Amazon servers located outside of China. I had predicted that this would not be blocked in China; the number of Kindle users are too low, and the browser is just not practical for day-to-day use. Combined with the effort required to force Amazon to reroute requests, it never seemed likely that China would clamp down on the Kindle.

    As expected, browsing via the Kindle showed no evidence of blocking whatsoever.

  • DNS manipulation is widespread
  • As part of earlier research I have some very basic code to perform DNS lookups for blocked websites, retrieved from the Herdict Project, against remote nameservers. This was run remotely against a list of Chinese DNS servers to compare relative results in different parts of China.

    Being physically located in China added little to the data that I already have, except to add a number of DNS servers that weren’t in my initial list. A deeper analysis of this data, along with the data capture from my earlier experiments, is forthcoming. The few extra data points from this trip confirm only that DNS manipulation is widespread for blocked sites, alongside any other more sophisticated means to filter content.

    (I will be writing up my FOCI’11 paper here in the very near future, which will go into this work in much more detail.)

  • VPNs have a truly significant positive effect
  • On untrusted networks I use VPN software by default where at all possible, for simple security reasons. In almost every location in China, connection to the Oxford University (Cisco) VPN was possible. Where I could not connect, a poor connection is as likely as anything more sinister.

    More noticeable was that to achieve anything close to my normal browsing experience, given the sites that I normally visit and the content that they include, I found truly significant differences when using the VPN.

    As mentioned above this was not simply a matter of being able to access Twitter and Facebook, both of which I rarely visit directly; nor was it a matter of my connection being dropped because I happened to type a politically sensitive term into a search engine. Instead, the most interesting aspect of directly experiencing this form of censorship was a subtle and generalised degradation of the internet — unpredictable connections, failed links, and slow loading times. All of these are a result of the interconnectivity of the web, and the assumption that cross-site links are equally available. (Wikipedia being blocked, however, was surprisingly restrictive. One interesting highlight of restrictions on connectivity is to draw attention to your own browsing habits.)

    In summary, my brief experience of Chinese internet censorship was strikingly different to my expectation. The majority of reports, in my experience, focus on the dramatic blocks of major websites, or on heavy-handed filtering of search results. In practice I was far more struck by the continual, low-level pressure that censorship imposes on normal usage, even though, as a lǎowài, I was largely unaffected by wider social or legal concerns from trying to access blocked sites. Most notably, I was surprised by the level of collateral damage that broad-scale filtering imposes on a wide range of largely unrelated sites.

    While the internet in China is by no means unusable, the restrictions are tangible. The context of my own usage, mainly restricted to English-language websites based in the west, is unlikely to be representative of the experience of a Chinese user. My inability to meaningfully browse and engage in Chinese-language websites prevented me from experiencing the less technical aspects of filtering: self-censorship, pro- and anti-government rhetoric, selective news reporting and others.

    I can say that I was very glad to be back with a nice Clean Feed in the UK.

    Categories: Censorship, China Tags:

    Freedom of Communications on the Internet (FOCI) Workshop

    February 27th, 2011 2 comments

    I’m on my way back from the Workshop on Free and Open Communication on the Internet (FOCI) that was held in the last few days at Georgia Tech in Atlanta. Hosted by Nick Feamster, FOCI brought together a number of computer scientists, activists, lawyers and policy makers to discuss the impact of anti-censorship technologies and to think about future directions from a number of angles.

    It’s always interesting to see experts on the same topic from different fields together, and FOCI was no exception. Despite occasional diversions into policy-speak or tech-talk that left half the room baffled, I came away more impressed with how often we had managed to cross that barrier.

    The technical side of the crowd seemed to have the benefit of more time to present, and so there were thorough discussions on the nature of filtering mechanisms and their technical capabilities as well as details of anti-censorship technologies, particularly Tor. Roger Dingledine gave some interesting, if slightly statistically questionable, numbers regarding Tor usage in various countries during the recent events in the Middle East.

    An estimate from Hal Roberts, based on surveys of activist bloggers, was that 3% of worldwide internet users employed some form of anti-censorship tool, including web-based proxies. Tor’s own estimated usage figures, hampered by the difficulty of monitoring use of an anonymising tool, showed usages ranging from the tens of thousands in Egypt down to tens in Yemen. Within the Tor project, active research is focusing on more effectively calculating real usage data. (See https://metrics.torproject.org if you’re interested.)

    (Tor’s ongoing efforts to bypass filtering and to improve their system of bridges, as well as to improve the performance and security of their network, remain a seemingly endless source of interesting technical challenges.)

    On the legal and policy side it was useful to see the international perspective given substantial time, rather than predicating discussions on the First Amendment and SafeHarbor.

    What the discussion highlighted is that, despite the existence of tools such as Tor and their increasing use, censorship is a complex and multi-faceted issue. Tor has done an excellent job on the technical side in combating censorship at many levels of the stack, and has extended that to user education, social awareness and discussions with policy makers. In general, though, it seems that it is at the social level that both filtering and anti-filtering will begin to move.

    One observation that I’ve heard elsewhere is that “hard” filtering, such as China’s Golden Shield, are being extensively supplemented or replaced with “softer” filtering that aims to drowns out dissenting views with waves of government-sponsored information. This can take the form of sponsored pro-government views, such as China’s 50 Cent Party posting on blogs, to legitimate pro-government sites. Approaching this from a technical angle is relatively ineffective, although technologies such as authentication and private access still have their role. Means to combat the resources of a major player, such as a state or government, in order to level the playing field of online debate will be an important question in the future.

    For me, one of the most important facts to come out of the day is that we need more effective ways of measuring censorship around the world, in terms of methods used, type and extent of filtering and usage of circumvention tools. Existing approaches to measuring censorship require significant human effort, and often report only relatively crude results. Improving and automating the gathering of this information raises some interesting, and very useful, open questions.

    FOCI was a good starting point for interdisciplinary work in this area, and I hope it will lead on to similar events in the future.

    Categories: Censorship, Conference Tags:

    Contentious Connections

    January 6th, 2011 No comments

    I have a comment piece in the Guardian today about network neutrality and BT’s Content Connect service. The online version is here.

    I’ll let the article stand largely by itself, whilst pleading the difficulty of putting the net neutrality debate across in 800 words whilst simultaneously linking in BT’s Content Connect.

    One point I would like to add, for anyone who finds this, is that the term “net neutrality” can be, and often is, very misleading; if you’re new to the subject then “neutrality” almost certainly means something different to what you think it means! Common terms combined with complicated technical subject matter are a recipe for disaster. Tim Wu’s excellent “Network Neutrality FAQ” should be required reading for this subject.

    The Guardian article in full:

    The desire for high-bandwidth internet services, such as internet TV is placing ever greater demands on the internet’s infrastructure. New technologies are being developed to meet these demands, but companies are increasingly considering new business models. With its Content Connect service, BT has brought itself into conflict with a fundamental design principle of the internet, raising concerns that the drive for profit could lead to changes that will harm consumers and content producers.

    The principle in question is that of net neutrality, which broadly states that data passing over the internet should be treated equally regardless of whose data it is. From a user’s perspective this means that your ISP should not, for example, prioritise Google’s traffic to you over Facebook’s. Net neutrality is the cause of much debate and confusion. It is accepted that prioritising one type of data over another is necessary for the internet to function. An ISP will therefore give preference to voice or streaming video data, as these rely on swift delivery to be useful; however, preferentially treating one content provider’s videos over another is considered unacceptable. Differentiation of service should therefore be made solely for engineering or quality-of-service considerations, and not for commercial exploitation.

    Proponents of net neutrality, such as the UK’s Open Rights Group, argue that, by treating all content providers equally, the internet provides a level playing field that stimulates innovation and competition. If Google could pay to have their content delivered more quickly than Facebook’s they would have a significant advantage, and smaller companies could be squeezed out of the market. This could result in a higher level of market domination by large companies, and in a “tiered” internet in which access to certain content requires extra payment for premium services.

    BT’s Content Connect service is a direct response to a demand for internet TV, and works by reducing the amount of data transferred across the internet by temporarily storing popular content close to end users. From a technical perspective, this is an excellent way to improve content delivery. The controversy is the business model that drives the service. Rather than agnostically storing popular content, such as the latest digital episode of Coronation Street, access is offered by ISPs to content providers such as Google, who must pay to have their content delivered at higher speeds and qualities. This allows those providers that can afford the service a significant advantage over those who cannot, these being relegated to the slower traditional network.

    The US has recently passed legislation supporting net neutrality, although the EU has indicated that it views such laws as unnecessary “at this time”. But is net neutrality, as a principle, necessary or even desirable? Opponents have argued that, given the essentially democratic nature of the internet, market forces should be sufficient to regulate companies. If ISPs choose not to carry certain content then their customers will leave them for more content-rich providers. Indeed, by preventing commercial differentiation of services, opponents argue innovation by companies seeking profit will be stifled. BT itself has claimed to support net neutrality as a principle, but stated that “service providers should also be free to strike commercial deals should content owners want a higher quality or assured service delivery”.

    As the debate continues, there is increasing pressure from companies to maximise profit while meeting the increasing demands of users. We can hope, and must ensure, that the factors driving the development of the internet sustain it as a free and open medium of exchange, and that the drive for profit is not allowed to override this ideal.

    Categories: Article, Net Neutrality Tags: