Riak Cloud Storage. Part 1. Setting up the Riak KV

Riak CS (Cloud Storage) is an easy-to-use object storage software that runs on top of the Riak KV. Riak (KV) is a distributed NoSQL key-value database. Riak CS is designed to provide simplicity, availability, distribution of cloud storage of any scale, and can be used to build cloud architectures - both public and private - or as infrastructure storage for highly loaded applications and services. Riak CS API is compatible with Amazon S3 and supports the ability to receive reports on various situations.

Riak Cloud Storage. Part 1. Setting up the Riak KV
This article is a free translation of the official manual for the Riak CS system version 2.1.1

In the Riak CS storage system, three components work together, which means that each component must be configured to work with other components:

  • Riak (K.V.) – a database system that acts as an end system.
  • Riak CS is a cloud storage layer on top of Riak that provides storage and API capabilities, stores files and metadata in Riak, and then transfers them to end users.
  • stanchion - manages queries involving globally unique entities such as buckets and users in a Riak instance. For example, creating users, creating or deleting buckets.

Additionally, you can also set up an S3 client for use in messaging with the Riak CS system.

You should plan to have one Riak node for each Riak CS node in your system. Riak and Riak CS nodes can be run on different physical machines, but in most cases it is preferable to run one Riak node and one Riak CS node on the same physical machine. Assuming one physical machine has enough power to meet the needs of both Riak and Riak CS nodes, you will generally see better performance due to reduced network latency.

If your system consists of several nodes, the configuration is primarily a communication setting between components. Other settings, such as where and how the log files will be stored, have default values ​​and only need to be changed if you want to use non-standard values.

Setting up system components. Riak KV setup for CS

Since Riak CS is an application built on top of Riak, it is very important to pay attention to your Riak configuration when starting Riak CS. This document is both a Riak configuration guide and a reference document for describing important configuration parameters.

Before setting up, make sure Riak KV and Riak CS are installed on every node in your cluster. Stanchion, in contrast, should only be installed on one node in the entire cluster.

Riak Cloud Storage. Part 1. Setting up the Riak KV

Backends for Riak CS

The default backend used by Riak is Bitcask, but the Riak CS package includes a special backend that must be used by the Riak cluster that is part of the Riak CS system. The regular version has the standard Multi backend that comes with Riak.

The same Riak buckets used internally by Riak's CS use secondary indexes that now require a LevelDB backend. Other parts of the Riak CS system can benefit from using the Bticask backend. The use of an exemplary Multi backend is included in Riak CS to take advantage of both of these backends to achieve the best combination of performance and functionality. The following section describes how to properly configure Riak to use this Multi-backend.

Backend is what Riak will use to store the data. Riak KV has several backends in its arsenal: Bitcask, LevelDB, Memory and Multi.

Additionally, the storage calculation system uses Riak MapReduse to summarize files into buckets. This means that you must tell all Riak nodes where to look for prepared Riak CS files before calculating storage.

A few other parameters must be changed to configure the Riak node as part of the Riak CS system, such as IP address and IP address and port for messaging via Protocol Buffers. Other settings can be changed if needed. The following sections describe how to set up a Riak node to work as part of a Riak CS system.

Riak backend setup

First, riak.conf or advanced.config/app.config configuration files are edited. These files may be located in the /etc/riak or /opt/riak/etc directories. By default, Riak uses the Bitcask backend. The first thing we need to do is change the configuration file by removing the following line:

RIAK.CONF

## Delete this line:
storage_backend = bitcask

ADVANCED.CONFIG

{riak_kv,
 [ %% Delete this line: 
{storage_backend, riak_kv_bitcask_backend},
 ]}

APP.CONFIG

{riak_kv, 
  [ %% Delete this line:
    {storage_backend, riak_kv_bitcask_backend},
]}

Next, we need to show the need for RiakCS modules for Riak and tell Riak to use the configured Riak CS provisioning backend. We need to use the advanced.config or app.config file for this and add the following options:

ADVANCED.CONFIG

{eleveldb, [
    {total_leveldb_mem_percent, 30}
    ]},
{riak_kv, [
    %% Other configs
    {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
    {storage_backend, riak_cs_kv_multi_backend},
    {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
    {multi_backend_default, be_default},
    {multi_backend, [
        {be_default, riak_kv_eleveldb_backend, [
            {data_root, "/var/lib/riak/leveldb"}
        ]},
        {be_blocks, riak_kv_bitcask_backend, [
            {data_root, "/var/lib/riak/bitcask"}
        ]}
    ]},
    %% Other configs
]}

APP.CONFIG

{eleveldb, [
    {total_leveldb_mem_percent, 30}
    ]},
{riak_kv, [
    %% Other configs
    {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
    {storage_backend, riak_cs_kv_multi_backend},
    {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
    {multi_backend_default, be_default},
    {multi_backend, [
        {be_default, riak_kv_eleveldb_backend, [
            {data_root, "/var/lib/riak/leveldb"}
        ]},
        {be_blocks, riak_kv_bitcask_backend, [
            {data_root, "/var/lib/riak/bitcask"}
        ]}
    ]},
    %% Other configs
]}

It is very important to note that many of these values ​​will depend on variations of directories specific to your operating system, so follow the instructions accordingly. For example, the add_paths option assumes Riak CS is installed in /usr/lib/riak-cs, while the data_root options assumes Riak is installed in /var/lib. (Note In my case it was add_paths - /usr/lib64/riak-cs/).

This configuration assumes that Riak CS is installed on the same machine as Riak. If not, then the package needs to be copied to a separate host.

Setting up the creation of siblings

Now, we need to set the allow_mult parameter to true. We can add a line in the riak.conf configuration file, or a riak_core section in advanced.config or app.config.

RIAK.CONF

buckets.default.allow_mult = true

ADVANCED.CONFIG

{riak_core, [
    %% Other configs
    {default_bucket_props, [{allow_mult, true}]},
    %% Other configs
]}

APP.CONFIG

{riak_core, [
    %% Other configs
    {default_bucket_props, [{allow_mult, true}]},
    %% Other configs
]}

This will allow Riak to create siblings that are necessary for Riak CS to function. If you're connecting to Riak CS using the client library, don't worry: you won't have to resolve conflicts, since all Riak CS operations are strictly consistent according to their definition.

sibling is a way to store multiple objects in the same key so that the object has different values ​​on different nodes.

Note: allow_mult
Any Riak node that also supports Riak CS will have allow_mult set to true at all times. Riak CS will reset startup if the value is false.

Setting the hostname and IP address

Each Riak node has a name, which can be specified in the nodename option in riak.conf. If you are using the app.config config file, you need to create a file named vm.args in the same directory as app.config and specify the host name using the -name flag. We recommend naming hosts using the @ format. So if you have three nodes running on the same host 100.0.0.1, you can name them [email protected], [email protected]and [email protected] or you can give more specific names like [email protected], [email protected] and so on. The example below demonstrates changing the node name to the name [email protected], which will run on localhost.

RIAK.CONF

 nodename = [email protected] 

VM.ARGS

 -name [email protected]

You must name all nodes before starting and including them in the cluster.

Tuning test

Now that all the necessary node configurations have been completed, we can try to start Riak:

SHELL

 riak start 

Note. Answer in my case:

Riak Cloud Storage. Part 1. Setting up the Riak KV

Here you have to wait a little. Then you can start testing the running node.

SHELL

 riak ping

If the response is pong, then Riak is running; if the response is Node not responding to pings, then something went wrong.

Note. Answer in my case:

Riak Cloud Storage. Part 1. Setting up the Riak KV

If the node did not start correctly, look at the erlang.log.1 log in the node's /log directory if the problem can be identified. One of the most common errors is invalid_storage_backend. Which indicates that the path to the Riak CS library in advanced.config or app.config is incorrect (or Riak CS is not installed on the server). Despite this error, make sure you don't change from riak_cs_kv_multi_backend to riak_kv_multi_backend.

Configuring Riak to use protocol buffers

Riak protocol buffer settings are found in riak.conf or in the riak_api section of the advanced.config or app.config files located in the /etc/riak/ directory. By default, the host has an IP address of 127.0.0.1 and a port of 8087. You need to change these if you plan to run Riak and Riak CS outside of the local environment. Replace 127.0.0.1 with the IP address of the Riak node and port 8087 with the correct one.

RIAK.CONF

 listener.protobuf.internal = 10.0.2.10:10001

ADVANCED.CONF

{riak_api, [
    %% Other configs
    {pb, ["10.0.2.10", 10001]},
    %% Other configs
]}

APP.CONFIG

riak_api, [
    %% Other configs
    {pb, ["10.0.2.10", 10001]},
    %% Other configs
]}

Note:The value of the listener.protobuf.internal parameter in riak.conf (or the value of the pb parameter in advanced.conf / app.config) file must match the values ​​for riak_host in Riak CS riak-cs.config and Stanchion stanchion.conf (or riak_host respectively in advanced .config/app.config) files.

A note about the port number
You may need a different port number if the port conflicts with ports used by another application, or if you are using a load balancer or proxy server.

It is also recommended that users ensure that the size of the Riak protobuf.backlog (or pb_backlog in advanced.config/app.config files) is equal to or greater than the pool.request.size specified for Riak CS in riak-cs.config (or request_pool_size in advanced.config/ app.conf files).

If the value of pool.request.size has been changed in Riak CS, then the value of protobuf.backlog must also be updated in Riak.

Other Riak settings

The riak.conf and advanced.config files include other settings that control how log files are generated and how they are saved. These settings have default values ​​and should work in most cases. For more information, we recommend reading our documentation on configuration files.

Setting an IP address for Riak

When setting up an IP address for Riak, you need to make sure that the Riak nodes have a unique IP address, whether you're running just one node or adding more nodes to the system. The Riak IP address is contained in riak.conf or - if you are using the app.config file - in the vm.args configuration file, which is also located in the /etc/riak directory (or /opt/riak/etc/ on other operating systems). ).

Initially, the string containing Riak's IP address points to the local host at this location:

RIAK.CONF

 nodename = [email protected]

VM.ARGS

 -name [email protected]

Replace 127.0.0.1 with your preferred IP address or hostname of the Riak node.

Performance and bandwidth settings

Due to performance reasons, we strongly recommend adding values ​​to the Riak configuration files riak.conf or vm.args located in the /etc/riak/ or /opt/riak/etc directory.

RIAK.CONF

 erlang.max_ports = 65536

VM.ARGS

## This setting should already be present for recent Riak installs.
 -env ERL_MAX_PORTS 65536

Disabling JavaScript MapReduce

It is recommended not to use legacy JavaScript MapReduce in conjunction with any version of Riak CS. Due to performance reasons, you should disable the virtual machine that performs JavaScript MapReduce operations by setting in the riak.conf config file, or in the riak_kv advanced.conf or app.config section:

RIAK.CONF

 javascript.map_pool_size = 0
 javascript.reduce_pool_size = 0
 javascript.hook_pool_size = 0 

ADVANCED.CONFIG

{riak_kv, [
    %% Other configs
    {map_js_vm_count, 0},
    {reduce_js_vm_count, 0},
    {hook_js_vm_count, 0}
    %% Other configs
]}

APP.CONFIG

{riak_kv, [
    %% Other configs
    {map_js_vm_count, 0},
    {reduce_js_vm_count, 0},
    {hook_js_vm_count, 0}
    %% Other configs
]}

Next, we need to configure the remaining components of the Riak CS system.

Original guide.

Source: habr.com

Add a comment