Storacle - decentralized file storage

Storacle - decentralized file storage

Before I begin, I must leave link to previous articleto make it clear what is being said.

In this article, I would like to analyze the layer that is responsible for storing files, and how it can be used by anyone. Storacle is a standalone library, there is no direct connection with music. You can organize the storage of any files.

In the previous article, I "rolled a barrel" a little on ipfs, but it happened exactly in the context of the task I was solving. In general, I think this project is cool. I just like the ability to create different networks for different tasks. This allows you to better organize the structure and reduce the load on individual nodes and the network as a whole. It is possible even within the framework of one project, if necessary, to split the network into pieces according to some criteria, reducing the overall load.

So storacle uses the mechanism spreadable for networking. Key Features:

  • Files can be added to the vault through any node.
  • Files are saved as a whole, not in blocks.
  • Each file has its own unique content hash for further work with it.
  • Files can be duplicated for greater reliability
  • The number of files on one node is limited only by the file system (there is an exception, it will be discussed below)
  • The number of files on the network is limited by the spreadable capabilities of the number of valid nodes on the network, which in the second version will be able to work with an infinite number of nodes (more on this in another article)

A simple example of how this generally works from a program:

Server:

const  Node = require('storacle').Node;

(async () => {
  try {
    const node = new Node({
      port: 4000,
      hostname: 'localhost'
    });
    await node.init();
  }
  catch(err) {
    console.error(err.stack);
    process.exit(1);
  }
})();

Client:

const  Client = require('storacle').Client;

(async () => {
  try {
    const client = new  Client({
      address: 'localhost:4000'
    });
    await client.init();
    const hash = await client.storeFile('./my-file');
    const link = await client.getFileLink(hash); 
    await client.removeFile(hash);
  }
  catch(err) {
    console.error(err.stack);
    process.exit(1);
  }
})();

View from the inside

Nothing fancy under the hood. Information about the number of files, their total size, and other points are stored in the in-memory database and are updated when files are deleted and added, so there is no need to frequently access the file system. An exception is the inclusion of the garbage collector, when files need to be circulated when certain storage sizes are reached, and not a ban on adding new ones. In this case, you have to bypass the storage, and working with a large number of files (> a million, let's say) can lead to significant loads. And it's better to store fewer files and run more nodes. If the "cleaner" is disabled, then there is no such problem.

The file storage consists of 256 folders and 2 levels of nesting. Files are stored in second level folders. That is, if there are 1 million there will be about 62500 files in each such folder (1000000 / sqrt(256)).

Folder names are formed from the hash of the file so that you can quickly access it by knowing the hash.

This structure was chosen based on a large number of different storage requirements: support for weak file systems, where it is not desirable to have many files in one folder, fast crawling of all folders if necessary, etc. Some golden mean.

caching

When files are added, as well as when they are received, links to files are written to the cache.
Thanks to this, very often there is no need to go around the entire network in search of a file. This speeds up the acquisition of links and reduces the load on the network. Also caching occurs through http headers.

isomorphism

The client is written in javascript and is isomorphic, it can be used directly from the browser. 
You can download the file https://github.com/ortexx/storacle/blob/master/dist/storacle.client.js as a script and access window.ClientStoracle or import through the assembly system, etc.

Deferred Links

An interesting feature is also the "deferred link". This is a link to the file, which can be obtained synchronously, here and now, and the file will be pulled up when it is already found in the repository. This is very convenient when, for example, you need to show some pictures on the site. We just put a deferred link in src and that's it. There are many cases you can come up with.

client api

  • async Client.prototype.storeFile() - saving the file
  • async Client.prototype.getFileLink() - getting a direct link to the file
  • async Client.prototype.getFileLinks() - getting a list of direct links to a file from all nodes where it exists
  • async Client.prototype.getFileToBuffer() - get the file into the buffer
  • async Client.prototype.getFileToPath() - get the file into the file system
  • async Client.prototype.getFileToBlob() - get the file in a blob (for the browser version)
  • async Client.prototype.removeFile() - delete a file
  • Client.prototype.createRequestedFileLink() - create a snooze link

Export files to another server

In order to transfer files to another node, you can:

  • Just copy the entire storage folder along with the settings. (this may not work in the future)
  • Copy only the folder with the files. But, in this case, you will need to run the function once node.normalizeFilesInfo()to recalculate all the data and enter it into the database.
  • Use function node.exportFiles()which will start copying the files.

Basic node settings
When starting the storage node, you can specify all the necessary settings.
I will describe the most basic ones, the rest can be found on github.

  • storage.dataSize - file folder size
  • storage.tempSize - temporary folder size
  • storage.autoCleanSize β€” the minimum storage size to hold. If you specify this parameter, then as soon as there is not enough space, the least used files will be deleted.
  • file.maxSize - maximum file size
  • file.minSize - minimum file size
  • file.preferredDuplicates - the preferred number of file duplicates in the network
  • file.mimeWhitelist - valid file types
  • file.mimeBlacklist - invalid file types
  • file.extWhitelist - valid file extensions
  • file.extBlacklist - invalid file extensions
  • file.linkCache - various link caching settings

Almost all parameters related to dimensions can be entered in both absolute and relative values.

Working via command line
The library can be used through the command line. To do this, you need to install it globally: npm i -g storacle. After that, you can run the necessary actions from the directory with the project, where the node. For example, storacle -a storeFile -f ./file.txt -c ./config.jsto add a file. All actions can be found in https://github.com/ortexx/storacle/blob/master/bin/actions.js

Why might you need it

  • If you want to create some kind of decentralized project in which you plan to store and work with files using convenient methods. For example, the music project described in the link at the beginning of the article uses storacle.
  • If you are working on any other projects where you need to store files distributed. You can easily build your own closed network, flexibly configure nodes and add new ones when needed.
  • If you just need somewhere to store the files of your site and you can write everything yourself. Perhaps this library will suit better than others, in your case.
  • If you have a project in which you work with files, but you want to do all the manipulations from the browser. You can avoid writing server code.

My contacts:

Source: habr.com

Add a comment