Alibaba’s DragonFly – A New P2P File Distribution Framework

0
85
Alibaba's DragonFly

We’ve all heard of peer-to-peer file sharing protocols, with BitTorrent being the most popular. There’s another similar file sharing framework that was open-sourced to the public by Alibaba in late 2017. Alibaba’s DragonFly is specifically aimed at making container image distributions easy to handle and share using peer-to-peer file sharing.

What Is A Container?

Container images are pieces of software that include everything the software needs to run, including code, packages, dependencies, etc. This ensures that the software doesn’t need anything from its computing environment to run, making it easy for developers to package, run, test and deploy code. This is useful, for example, when a software is developed in one language and deployed in another. It allows for seamless switching between development, testing, and deployment of software. By using containers, you make sure that everything the software needs to work is present within the container along with the software, so you won’t have to install any extra libraries or dependencies on the computing environment in which it is run. Container systems have become massively popular these days, with more and more companies relying on them to deploy software. Some popular examples of container technologies are Docker, Kubernetes (by Google), and Alibaba’s very own Pouch.

However, when working with software, it often becomes essential to download the containers on a large number of computing environments. Even though the container images may be small in size, the number of environments in which they are needed makes the downloading of the containers difficult. So, the mass-distribution of the container images becomes a challenge. This is where DragonFly comes in.

Alibaba’s DragonFly: P2P File Distribution For Containers

DragonFly is a universal file distribution system that solves the problem of mass-distributing container images by using peer-to-peer sharing protocols as opposed to other standard protocols. This means that essentially, the computing environment downloading the file gets parts of the file from ‘peers’, that is, other computing environments that already have the file. This makes it possible to use the full bandwidth of each peer, allowing for greater speed in downloading the file, because you no longer have to rely on a single source. This improves download efficiency dramatically.

In a network where a centralized server provides the files for download, if the central server has some failure, that affects all the clients that are connected to it. In other words, if you download a file from a single source, then anything that affects that source will also affect your download. However, that is not the case with p2p networks. In p2p network framework, such as Alibaba’s DragonFly, because all the files are decentralized, a failure to one peer doesn’t affect any other peer. This allows for lesser inconsistencies in downloads.

DragonFly offers native support and works seamlessly with Docker, the most popular container technology, and Pouch, Alibaba’s own container technology. It also supports many other container technologies with minimal modifications. It uses a CDN (Content Delivery Network) mechanism, which makes sure that repetitive downloads don’t occur. It also makes sure that downloaded files are consistent without requiring any check code from the user.

It works by using, what is called, a cluster manager (also called as a super-node), to schedule the downloads from each peer. The file that is to be distributed is divided into multiple parts and transmitted among the peers. The cluster manager is responsible for deciding whether to download a particular part of the file or not by checking if that part is already present in the computer environment we want to download it in. Then multiple parts of the file are downloaded simultaneously and then put back together where it is needed. It also schedules the downloads of the chunks to make sure the whole process is optimized.

By using peer-to-peer sharing, DragonFly remarkably improves download speeds. Because many parts of the files are downloaded simultaneously, the rate at which the file is transmitted is no longer limited by the rate of transfer at the source, but rather only by the rate of transfer at the client (the environment to which the file is downloading). This puts no pressure on the source to transfer files out of it at a higher rate.

DragonFly download speeds

In fact, according to Alibaba, using peer-to-peer sharing made the average time of a file download 12 seconds, regardless of the size of the file. This is a huge improvement on other methods such as Wget, which don’t work particularly well with large files. The download time in other file distribution systems increases with the file size but stays the same in DragonFly. This is because files are downloaded in a sort of parallel way in DragonFly, as opposed to a serial way that is used in other methods. This results in the download time being independent of the size of the file, as shown in the figure below.

While peer-to-peer networks have their advantages, they do carry some disadvantages with them. The main disadvantage is regarding security. In a peer-to-peer network, the need for all peers to be connected makes it easy to access all the peers from any given peer. This is a concern from a security point of view. You have to ensure that each peer is well-protected and doesn’t have any vulnerabilities, otherwise, you could potentially put the entire network at risk. Hence, extra precautions have to be taken when sharing files over a peer-to-peer network. Furthermore, backing up files takes time in peer-to-peer networks due to the absence of a centralized server.

That being said, you cannot undermine the importance of this technology in file distribution and management. While the open-sourced version (released under the Apache 2.0 license) has some good features, Alibaba also provides an enterprise version of DragonFly that has many more advanced capabilities like network flow control, intelligent scheduling, and dynamic compression of files to name a few. In conclusion, releasing DragonFly to the public might have been one of the best decisions Alibaba has made, by making file distribution easier and convenient. But whether it can garner the same popularity as BitTorrent remains to be seen.

LEAVE A REPLY

Please enter your comment!
Please enter your name here