Content paths are inscrutable or let's say a word about CDN

Content paths are inscrutable or let's say a word about CDN

Disclaimer:
This article does not contain information previously unknown to readers familiar with the concept of CDN, but is in the nature of a technology overview

The first web page appeared in 1990 and had a size of a few bytes. Since then, content has scaled both qualitatively and quantitatively. The development of the IT ecosystem has led to the fact that modern web pages are measured in megabytes and the trend towards increasing network bandwidth is only getting stronger every year. How can content providers cover large geographical scales and provide users everywhere with high speed access to information? These tasks should be handled by content delivery and distribution networks, they are also Content Delivery Network or simply CDN.

There is more and more "heavy" content on the Internet. At the same time, numerous studies show that users do not want to deal with web services if they are loaded for longer than 4-5 seconds. Too low site loading speed is fraught with a loss of audience, which will certainly lead to a decrease in traffic, conversion, and hence profits. Content Delivery Networks (CDNs), in theory, get rid of these problems and their consequences. But in reality, as usual, everything is decided by the details and nuances of a particular case, of which there are plenty in this area.

Where did the idea of ​​distributed networks come from?

Let's start with a brief excursion into the history and definitions of terms. CDN is a network of a group of server machines located in different places to provide access to Internet content covering a large number of users. The idea of ​​distributed networks is the presence of several points of presence (PoP) at once, which are outside the source server. Such a system will process the array of incoming requests faster, increasing the response and speed of transferring any data.

The problem with delivering content to users arose acutely at the peak of the development of the Internet, i.e. in the mid 90s. The servers of the time, whose performance was not even up to today's flagship laptops, could hardly withstand the load and could not cope with the ever-increasing traffic. Microsoft spent hundreds of millions of dollars every year on research related to the information highway (just think of the famous 640 KB from Bill Gates). To solve these issues, it was necessary to use hierarchical caching, switch from modems to fiber optics, and analyze the network topology in detail. The situation was reminiscent of an old locomotive, which rushes along the rails and is modernized by all possible means to increase speed along the way.

Already in the late 90s, the owners of web portals realized that in order to reduce the load and provide the required requests, intermediary servers should be used. This is how the first CDN appeared, distributing static content from different servers geographically scattered around the world. At about the same time, the business of distributed networks appeared. The largest (at least one of the largest) CDN provider in the world, Akamai, has been a pioneer in this field since 1998. A couple of years later, CDN became mainstream, and revenues from content delivery and indemnity amounted to tens of millions of dollars every month.

Today, we encounter a CDN every time we go to a high-traffic commercial page or communicate on social networks. The service is provided by: Amazon, Cloudflare, Akamai, as well as many other transnational providers. Moreover, large companies tend to use their own CDN, which brings them a number of advantages in the speed and quality of content delivery. If Facebook didn't have distributed networks, but was content with its origin server located in the US, users in Eastern Europe might take much longer to load their profile.

A few words about CDN and streaming

The FutureSource Consulting agency conducted an analysis of the music industry and concluded that in 2023 the number of subscriptions to music streaming services will reach almost half a billion people. Moreover, services will receive more than 90% of their income from streaming audio. With video, the situation is similar, in the popular lexicon such terms as: letsplay, online concert and online cinema have already been fixed. Apple, Google, YouTube and many other companies have their own streaming services.

Early on, the CDN was used primarily for sites with static content. Static information is called information that does not change depending on user actions, time and other factors, i.e. is not personalized. But the development of streaming video and audio services has added another common scenario for the use of distributed networks. Intermediary servers, being close to the target audience around the world, allow you to provide stable access to content during peak periods, eliminating the lack of Internet bottlenecks.

How it works

The essence of all CDNs is approximately the same: to use intermediaries in order to be able to deliver content to the end consumer faster. It works as follows: the user sends a request to download a file, it is received by the CDN server, which one-time accesses the original server and returns the content to the user. In parallel with this, the CDN caches files for a given period of time and processes all subsequent requests from its own cache. Optionally, they can also preload files from the source server, adjust cache expiration, compress heavy files, and more. In the most ideal situation, the host passes the entire flow to the CDN node, which already uses its own resources to deliver content to users. It goes without saying that effective caching of information, as well as the distribution of requests not to one server, but to the network, will lead to a more balanced traffic load.

Content paths are inscrutable or let's say a word about CDN
The second important feature of the CDN is the reduction of delays in data transfer (they are also RTT - round trip time). Establishing a TCP connection, downloading media content, a JS file, starting a TLS session, it all depends on the ping. Obviously, the closer you are to the source, the faster you can get a response from it. After all, even the speed of light has its own limit: about 200 km/s via optical fiber. This means that from Moscow to Washington, the delay will be about 75 ms in RTT, and this is without the influence of intermediate equipment.

To better understand what tasks content distribution networks solve, here is a list of solutions that are relevant today:

  • Google, Yandex, MaxCDN (they use free CDNs to distribute JS libraries, have more than 90 points of presence in most countries of the world);
  • Cloudinary, Cloudimage, Google (client-side optimization services and libraries: images, videos, fonts, etc.);
  • Jetpack, Incapsula, Swarmify, etc. (optimization of resources in content management systems: bitrix, wordpress, etc.);
  • CDNVideo, StackPath, NGENIX, Megafon (CDN for distributing static content, used as general purpose networks);
  • Imperva, Cloudflare (solutions to speed up website loading).

The first 3 CDN types from the above list are designed to transfer only part of the traffic from the main server. The remaining 2 are used as full proxy servers with full channeling from the source host.

To whom and what benefits does the technology provide?

In theory, any site that sells its products/services to corporate clients or individuals (B2B or B2C) can benefit from CDN implementation. It is important that its target audience, ie. the user base was outside of their geographic location. But even if this is not the case, distribution networks will help with load balancing for large volumes of content.

It's no secret that a couple of thousand threads are enough to fill a server channel. Therefore, the distribution of video broadcasting to the general public will inevitably lead to the formation of a bottleneck - the bandwidth of the Internet channel. We see the same thing when there are a lot of small non-glued pictures on the site (previews of goods, for example). The origin server uses one TCP connection to process any number of requests, which will queue the download. Adding a CDN leads to the need to distribute requests to several domains and use several TCP connections, offloading the channel. And the round-trip formula, even in the saddest cases, gives a value of 6-7 RRT and takes the form: TCP + TLS + DNS. It is also fair to include here the delays associated with the activation of the radio channel on the device and signal transmission on cell towers.

Summarizing the strengths of technology for business on the Internet, experts highlight the following points:

  1. Rapid infrastructure scaling + bandwidth reduction. More servers = more points where information is stored. As a result, one point processes less traffic per unit of time, which means it may have less bandwidth. Additionally, optimization tools come into play, which allows you to cope with peak loads without wasting time.
  2. Less ping. We have already mentioned that people do not like to wait long on the Internet. Therefore, high ping contributes to high bounce rates. The delay can be caused by problems with processing data on the server, the use of old equipment, or simply ill-conceived network topology. Most of these problems are partly solved by content distribution networks. Although it is important to note here that the real benefit from the introduction of technology will be visible only when the "consumer ping" exceeds 80-90 ms, and this is the distance from Moscow to New York.

    Content paths are inscrutable or let's say a word about CDN

  3. Data security. DDos (denial of service virus attacks) are aimed at crashing the server in order to gain some benefit. A single server is much more vulnerable to information security vulnerabilities than a distributed network (putting the infrastructure of such a giant as CloudFlare is not an easy task). Thanks to the use of filters and the competent distribution of requests over the network, artificially created difficulties with access to legitimate traffic can be easily prevented.
  4. Rapid content distribution and additional service functions. The distribution of large amounts of information to the server network will allow you to quickly convey the offer to the end consumer. For examples, again, you don’t need to go far - just remember Amazon and Aliexpress.
  5. The ability to "mask" problems with the main site. There is no need to wait until the DNS is updated, you can transfer it to a new location with the distribution of previously cached content. This in turn can improve fault tolerance.

Understood the benefits. And now let's look at what niches it is beneficial.

Advertising business

Advertising is the engine of progress. And so that the engine does not burn out, it must be loaded in moderation. So the advertising business, trying to match the modern digital world, is facing the problems of “heavy content”. Heavy refers to multimedia advertising (mostly animated banners and videos) that requires high network bandwidth. A website with multimedia takes a long time to load and can freeze, testing the nerves of users. Most refuse such resources even before they download all the information available. Advertising companies can take advantage of CDNs to solve these problems.

Sales

E-commerce needs a constant expansion of geographic coverage. Another important point is the fight against competitors, of which there are plenty in each market segment. If a website does not meet the user's requirements (including taking a long time to load), it will not be popular and will not be able to bring consistently high conversions. CDN implementation should show its advantage in handling requests for data from different locations. Also, the distribution of traffic will help prevent its bursts and subsequent failures in the server.

Entertainment content sites

All sorts of entertainment platforms are suitable here, ranging from downloading movies and games, ending with streaming video. Despite the fact that the technology works with static, streaming data can get to the user faster through repeaters. Again, caching CDN information is a lifesaver for owners of large media storage portals.

Online Games

Internet games must be taken out in a separate paragraph. If advertising needs more bandwidth, then online projects are even more resource-demanding. Providers face a problem that has two sides: the speed of access to servers + the provision of high gaming performance with beautiful graphics. CDN for online games is an opportunity to have so-called "push zones" where developers can store games on servers located near users. This allows you to reduce the impact of the speed of access to the source server, and therefore provide comfortable gameplay everywhere.

Why CDN is not a panacea

Content paths are inscrutable or let's say a word about CDN
Despite the obvious advantages, not everyone and not always strive to introduce technology into their business. Why is that? Paradoxically, some of the disadvantages follow from the advantages, plus a couple more points are added related to the deployment of the network. Marketers will beautifully talk about all the advantages of technology, forgetting to mention that they all lose their meaning in a wide range of conditions. If we consider in more detail the disadvantages of CDN, then it is worth highlighting:

  • Work only with static. Yes, most modern sites have a low percentage of dynamic content. But where the pages are personalized, the CDN will not be able to help in any way (unless it unloads a large amount of traffic);
  • Caching delay. Optimization itself is one of the main advantages of distribution networks. But when a change is made on the origin server, it takes time before the CDN re-cache it on all its servers;
  • Mass blocking. If for any reason the IP address of the CDN is banned, then all sites that are hosted on it are closed;
  • In most cases, the browser will make two connections (to the origin server and the CDN). And these are additional milliseconds of waiting;
  • Binding to the IP address of projects (including non-existent ones) that were previously assigned to it. As a result, we get a complicated ranking from Google search bots and difficulties with bringing the site to the top during SEO promotion;
  • A CDN node is a potential point of failure. If you use them, it is important to understand in advance how the system's routing works, and what errors may occur while working with the site;
  • Trite, but you have to pay for content delivery services. Basically, the costs are proportional to the volume of traffic, which means that control may be required to plan the budget.

An important fact: even the proximity of the CDN to the user does not guarantee low ping. Route building can be done from a client to a host located in another country or even on another continent. It depends on the routing policy of a particular network and its relationship with telecom operators (peering). Many large CDN providers have several tariffs, where the cost directly affects the proximity of the point of presence when transferring content to target users.

There are opportunities - launch your own CDN

Dissatisfied with the policy of companies providing content distribution network services, but the business needs to expand? If possible, why not try launching your own CDN. This makes sense in the following cases:

  • The current costs of content distribution do not justify their expectations and are not economically justified;
  • You need a permanent cache, without neighbors with other sites on the server and channel;
  • The target audience is located in a region where there are no points of presence of CDNs available to you;
  • The need to personalize settings when delivering content;
  • It is required to speed up the delivery of dynamic content;
  • Suspicions of violating user privacy and other illegal actions by third-party services.

Running a CDN will require you to have a domain name, multiple servers in different regions (virtual or dedicated), and a request processing tool. Do not forget about installing SLL certificates, setting up and editing programs for serving static content (Nginx or Apache), and effective monitoring of the entire system.

The correct configuration of caching proxies is the subject of a separate article, so we will not describe in detail here: where and what parameter should be set correctly. Given the start-up costs and time to deploy the network, the use of ready-made solutions may be more promising. But it is necessary to be guided by the current situation and planning several steps ahead.

With the result that

CDN is a set of additional capacities for relaying your traffic to the masses. Are they necessary for business on the Internet? Yes and no, it all depends on what audience the content is intended for and what goals the business owner pursues.

Regional and highly specialized projects will receive more disadvantages from the implementation of CDN than advantages. Requests will still come first to the source server, but through an intermediary. Hence the dubious decrease in ping, but quite certain monthly costs for using the service. If you have good network equipment, you can easily improve existing information security algorithms, place your servers closer to users and receive optimizations and profits for free on an ongoing basis.

But who really should think about intermediary servers is large companies whose infrastructure cannot cope with the ever-growing traffic flow. CDN shows itself as a technology that allows you to quickly deploy a network to a wide geography of users, provide comfortable cloud gaming or sell goods on a large commercial platform.

But even with a wide audience geography, it is important to understand in advance what exactly content distribution networks are needed for. Website acceleration still remains a complex task that cannot be magically solved by implementing a CDN. Do not forget about such important features as: cross-platform, adaptability, optimization of the server side, code, rendering, etc. A preliminary technical audit and adequate corrective action are still the best solution for any online project, regardless of its direction and scale.

As advertising

Right now you can order powerful serversthat use the latest processors AMD Epyc. Flexible rates - from 1 CPU core to insane 128 CPU cores, 512 GB RAM, 4000 GB NVMe.

Content paths are inscrutable or let's say a word about CDN

Source: habr.com

Add a comment