CacheBrowser Experiment: Bypassing the Chinese Firewall Without a Proxy Using Content Caching

CacheBrowser Experiment: Bypassing the Chinese Firewall Without a Proxy Using Content Caching

Image: Unsplash

Today, a significant part of all content on the Internet is distributed using CDN networks. At the same time, studies of how various censors extend their influence to such networks. Scientists at the University of Massachusetts analyzed possible methods for blocking CDN content on the example of the practices of the Chinese authorities, and also developed a tool to bypass such blocking.

We have prepared a review material with the main conclusions and results of this experiment.

Introduction

Censorship is a global threat to freedom of speech on the Internet and free access to information. This is largely possible due to the fact that the Internet borrowed the end-to-end communication model from the telephone networks of the 70s of the last century. This allows you to block access to content or communication of users without major effort or cost, simply based on the IP address. There are several ways here, from blocking the address itself with prohibited content to blocking the ability of users to even recognize it through DNS manipulation.

However, the development of the Internet has led to the emergence of new ways of disseminating information. One of them is the use of cached content to improve performance and speed up communications. Today, CDN providers handle a significant amount of all traffic in the world - only Akamai, the leader in this segment, accounts for up to 30% of global static web traffic.

CDN is a distributed system for delivering Internet content at maximum speed. A typical CDN network consists of servers in various geographical locations that cache content in order to "give" it to those users who are closest to this server. This allows you to significantly increase the speed of online communication.

In addition to improving the quality of service for end users, CDN hosting helps content creators scale their projects by reducing the load on infrastructure.

Censoring CDN content

Despite the fact that CDN traffic already makes up a significant proportion of all information transmitted over the Internet, there is still almost no research on how censors in the real world go about controlling it.

The authors of the study started by researching censoring techniques that can be applied to CDNs. Then they studied the real mechanisms used by the Chinese authorities.

First, let's talk about the possible methods of censoring and the possibility of using them to control the CDN.

Filtering by IP

This is the simplest and most inexpensive technique for censoring the Internet. Using this approach, the censor determines and blacklists the IP addresses of resources hosting prohibited content. Controlled ISPs then stop delivering packets sent to such addresses.

IP-based blocking is one of the most common methods for censoring the internet. Most commercial network devices are equipped with features to implement such locks without significant computational costs.

However, this method is not very suitable for blocking CDN traffic due to some properties of the technology itself:

  • Distributed caching – To ensure the best content availability and performance optimization, CDNs cache user content on a large number of edge servers located in geographically dispersed locations. To filter such content based on IP, the censor would need to know the addresses of all edge servers and blacklist them. This will hit the main properties of the method, because its main advantage is that in the usual scheme, blocking one server allows you to β€œcut off” access to prohibited content for a large number of people at once.
  • Shared IP – commercial CDN providers share their infrastructure (i.e. edge servers, mapping system, etc.) among many clients. As a result, banned CDN content is loaded from the same IP addresses as non-banned content. As a result, any attempt at IP filtering will lead to the blocking of a huge number of sites and content that are not of interest to censors.
  • Highly dynamic IP assignment - To optimize load balancing and improve the quality of service, the mapping of edge servers and end users is very fast and dynamic. For example, Akamai updates the returned IP addresses every minute. This will make it almost impossible to associate addresses with prohibited content.

DNS interference

In addition to IP filtering, another popular form of censorship is DNS interference. This approach involves the actions of censors aimed at ensuring that users do not recognize the IP addresses of resources with prohibited content at all. That is, the intervention is at the level of domain name resolution. There are several ways to do this, including hijacking DNS connections, using DNS poisoning, and blocking DNS requests to banned sites.

This is a very effective blocking method, but it can be bypassed by using non-standard DNS resolution methods, such as out-of-band channels. Therefore, censors usually combine DNS blocking with IP filtering. But as mentioned above, IP filtering is not effective for censoring CDN content.

URL/Keyword filtering with DPI

Modern network activity monitoring equipment can be used to analyze specific URLs and keywords in transmitted data packets. This technology is called DPI (deep packet inspection). Such systems find mentions of prohibited words and resources, after which they interfere with online communication. As a result, the packets are simply dropped.

This method is efficient, but more complex and resource intensive, since it requires defragmentation of all data packets sent within certain streams.

CDN content can be protected from such filtering in the same way as "regular" content - in both cases, the use of encryption (i.e. HTTPS) helps.

In addition to using DPI to look up keywords or banned URLs, these tools can be used for more advanced analysis. Such methods include statistical analysis of online/offline traffic and analysis of identification protocols. These methods are extremely resource-intensive and at the moment there is simply no evidence of their use by censors in a sufficiently serious volume.

Self-censorship of CDN providers

If the censor is the state, then it has every opportunity to prohibit the work in the country of those CDN providers that do not obey local laws governing access to content. There is no way to resist self-censorship - so if a CDN provider company is interested in working in a country, then it will be forced to comply with local laws, even if they restrict freedom of speech.

How China censors CDN content

The Great Firewall of China is not without reason considered the most effective and advanced system for enforcing Internet censorship.

Research methodology

Scientists performed experiments using a Linux node located inside China. They also had access to several computers outside the country. First, the researchers checked that the node was subject to censorship similar to that applied to other Chinese users - for this they tried to open various prohibited sites from this machine. So the presence of the same level of censorship was confirmed.

The list of CDN sites blocked in China was taken from GreatFire.org. Then, the method of blocking was analyzed in each case.

According to open data, the only major player in the CDN market with its own infrastructure in China is Akamai. Other providers included in the study include CloudFlare, Amazon CloudFront, EdgeCast, Fastly, and SoftLayer.

During the experiments, the researchers found out the addresses of the Akamai edge servers within the country, and then tried to get cached allowed content through them. It was not possible to get access to prohibited content (an HTTP 403 Forbidden error was returned) - apparently, the company is self-censoring in order to maintain the ability to work in the country. At the same time, outside the country, access to these resources remained open.

Providers without infrastructure in China do not use self-censorship for local users.

In the case of other providers, DNS filtering has become the most commonly used blocking method - requests to blocked sites are resolved to incorrect IP addresses. At the same time, the firewall does not block the CDN edge servers themselves, since they store both forbidden and allowed information.

And if, in the case of unencrypted traffic, the authorities have the ability to block individual pages of sites using DPI, then when using HTTPS, they can only deny access to the entire domain as a whole. This leads, among other things, to the blocking of permitted content.

In addition, China has its own CDN providers, such as ChinaCache, ChinaNetCenter, and CDNetworks. All these companies fully comply with the laws of the country and block prohibited content.

CacheBrowser: Block Bypass Tool Using CDN

As the analysis showed, it is quite difficult for censors to block CDN content. Therefore, the researchers decided to go further and develop an online blocking bypass tool that would not use proxy technology.

The main idea of ​​the tool is that censors have to interfere with DNS to block CDNs, but downloading CDN content doesn't really require domain name resolution. Thus, the user can get the content he needs by directly contacting the edge server, on which it is already cached.

The diagram below shows the structure of the system.

CacheBrowser Experiment: Bypassing the Chinese Firewall Without a Proxy Using Content Caching

Client software is installed on the user's computer, and a regular browser is used to access the content.

When requesting a URL or a piece of content already requested, the browser sends a query to the local DNS system (LocalDNS) to obtain the hosting IP address. Regular DNS is only queried for domains that are not already in the LocalDNS database. The Scraper module constantly traverses the requested URLs and looks for potentially blocked domain names in the list. The Scraper then calls the Resolver module to resolve the newly discovered blocked domains, which module performs the task and adds an entry to LocalDNS. The browser's DNS cache is then cleared to remove the existing DNS records for the blocked domain.

If the Resolver module cannot figure out which CDN provider the domain belongs to, then it will ask the Bootstrapper module for help.

How it works in practice

The client software of the product was implemented for Linux, but it can be easily ported to Windows as well. The default browser is Mozilla.
Firefox. The Scraper and Resolver modules are written in Python, and the Customer-to-CDN and CDN-toIP databases are stored in .txt files. The LocalDNS database is the usual Linux /etc/hosts file.

As a result, for a blocked URL of the form blocked.com the script will get the IP address of the edge server from the /etc/hosts file and send an HTTP GET request to access BlockedURL.html with the Host HTTP header fields:

blocked.com/ and User-Agent: Mozilla/5.0 (Windows
NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1

The Bootstrapper module is implemented using the free tool digwebinterface.com. This DNS resolver cannot be blocked and answers DNS queries on behalf of many geographically distributed DNS servers in different network regions.

Using this tool, the researchers were able to access Facebook from their Chinese node - although the social network has long been blocked in China.

CacheBrowser Experiment: Bypassing the Chinese Firewall Without a Proxy Using Content Caching

Conclusion

The experiment showed that the problems that censors experience when trying to block CDN content can be used to create a block bypass system. Such a tool allows you to bypass blocking even in China, where one of the most powerful online censorship systems operates.

Other related articles residential proxies for business:

Source: habr.com

Add a comment