Event-driven architecture increases the cost efficiency of the resources used, because they are activated only when they are needed. There are many options for how to implement this and not create additional cloud entities as worker applications. And today I will not talk about FaaS, but about webhooks. I'll show a case study of event handling with object storage webhooks.
A few words about object storage and webhooks. Object storage allows you to store any data in the cloud in the form of objects available via S3 or another API (depending on implementation) via HTTP/HTTPS. Webhooks are generally custom HTTP callbacks. They are usually triggered by an event, such as a code push to a repository or a comment posted on a blog. When an event occurs, the originating site sends an HTTP request to the URL specified for the webhook. As a result, you can make events on one site trigger actions on another (wiki). In the case when the source site is an object storage, the changes in its content act as events.
Examples of simple cases where such automation can be used:
Create copies of all objects in another cloud storage. Copies should be created "on the fly", with any addition or change of files.
Automatic creation of a series of thumbnails of graphic files, adding watermarks to photos, other image modifications.
Notification of the arrival of new documents (for example, a distributed accounting service uploads reports to the cloud, and financial monitoring receives notifications about new reports, checks and analyzes them).
Slightly more complex cases involve, for example, generating a request to Kubernetes, which creates a pod with the necessary containers, passes task parameters to it, and collapses the container after processing.
As an example, we will make a variant of task 1, when changes in the Mail.ru Cloud Solutions (MCS) object storage bucket using webhooks are synchronized in the AWS object storage. In a real loaded case, asynchronous work should be provided by registering webhooks in a queue, but for the training task we will do the implementation without it.
Way of working
The interaction protocol is described in detail in guide to S3 webhooks on MCS. The scheme of work has the following elements:
Publishing Service, which is on the side of the S3 storage and publishes HTTP requests when the webnhook fires.
Webhook serverA that listens for requests from the publishing service over HTTP and takes the appropriate action. The server can be written in any language, in our example we will write the server in Go.
A feature of the implementation of webhooks in the S3 API is the registration of the webhook receiving server on the publishing service. In particular, the webhook receiver server must acknowledge the subscription to the publish service messages (in other webhook implementations, it is usually not required to acknowledge the subscription).
Accordingly, the webhook receiving server must support two main operations:
respond to a publishing service request for registration confirmation,
handle incoming events.
Setting up a webhook receiver
To run the webhook receiving server, you need a Linux server. In this article, we use a virtual instance as an example, which we deploy on the MCS.
Install the necessary software and start the webhook receiving server.
ubuntu@ubuntu-basic-1-2-10gb:~$ sudo apt-get install git
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
bc dns-root-data dnsmasq-base ebtables landscape-common liblxc-common
liblxc1 libuv1 lxcfs lxd lxd-client python3-attr python3-automat
python3-click python3-constantly python3-hyperlink
python3-incremental python3-pam python3-pyasn1-modules
python3-service-identity python3-twisted python3-twisted-bin
python3-zope.interface uidmap xdelta3
Use 'sudo apt autoremove' to remove them.
Suggested packages:
git-daemon-run | git-daemon-sysvinit git-doc git-el git-email git-gui
gitk gitweb git-cvs git-mediawiki git-svn
The following NEW packages will be installed:
git
0 upgraded, 1 newly installed, 0 to remove and 46 not upgraded.
Need to get 3915 kB of archives.
After this operation, 32.3 MB of additional disk space will be used.
Get:1 http://MS1.clouds.archive.ubuntu.com/ubuntu bionic-updates/main
amd64 git amd64 1:2.17.1-1ubuntu0.7 [3915 kB]
Fetched 3915 kB in 1s (5639 kB/s)
Selecting previously unselected package git.
(Reading database ... 53932 files and directories currently installed.)
Preparing to unpack .../git_1%3a2.17.1-1ubuntu0.7_amd64.deb ...
Unpacking git (1:2.17.1-1ubuntu0.7) ...
Setting up git (1:2.17.1-1ubuntu0.7) ...
Clone the folder with the webhook receiving server:
We go into the bucket, for which we will configure webhooks, and click on the gear:
Go to the Webhooks tab and click Add:
Fill in the fields:
ID - the name of the webhook.
Event - what events to send. We set the transmission of all events that occur when working with files (adding and deleting).
URL - the address of the webhook receiving server.
Filter prefix/suffix - a filter that allows you to generate webhooks only for objects whose names match certain rules. For example, in order for the webhook to work only on files with a .png extension, in Filter suffix you need to write "png".
Currently, only ports 80 and 443 are supported for accessing the webhook receiving server.
Let's press Add hook and we will see the following:
Hook added.
The webhook receiving server in the logs shows the progress of the hook registration process:
Ping() - a route that responds to a URL / ping, the simplest implementation of a liveness probe.
Webhook() - main route, URL/webhook handler:
confirms registration on the publishing service (going to the SubscriptionConfirmation function),
handles incoming webhooks (Gorecords function).
The HmacSha256 and HmacSha256hex functions are implementations of the HMAC-SHA256 and HMAC-SHA256 encryption algorithms with output as a string of hexadecimal numbers to calculate the signature.
main is the main function, handles command line options and registers URL handlers.
Command line options accepted by the server:
-port is the port on which the server will listen.
-address is the IP address that the server will listen on.
-script - an external program that is called on each incoming hook.
Let's take a closer look at some of the features:
//Webhook
func Webhook(w http.ResponseWriter, req *http.Request) {
// Read body
body, err := ioutil.ReadAll(req.Body)
defer req.Body.Close()
if err != nil {
http.Error(w, err.Error(), 500)
return
}
// log request
log.Printf("[%s] incoming HTTP request from %sn", req.Method, req.RemoteAddr)
// check if we got subscription confirmation request
if strings.Contains(string(body),
""Type":"SubscriptionConfirmation"") {
SubscriptionConfirmation(w, req, body)
} else {
GotRecords(w, req, body)
}
}
This function determines whether it is a registration confirmation request or a webhook. As follows from documentation, in case of registration confirmation, the following Json structure is received in the Post request:
POST http://test.com HTTP/1.1
x-amz-sns-messages-type: SubscriptionConfirmation
content-type: application/json
{
"Timestamp":"2019-12-26T19:29:12+03:00",
"Type":"SubscriptionConfirmation",
"Message":"You have chosen to subscribe to the topic $topic. To confirm the subscription you need to response with calculated signature",
"TopicArn":"mcs2883541269|bucketA|s3:ObjectCreated:Put",
"SignatureVersion":1,
"Token":Β«RPE5UuG94rGgBH6kHXN9FUPugFxj1hs2aUQc99btJp3E49tAΒ»
}
Accordingly, depending on the request, you need to understand how to process the data. I chose the record as the indicator "Type":"SubscriptionConfirmation", since it is present in the subscription confirmation request and not present in the webhook. Based on the presence/absence of this entry in the POST request, further execution of the program goes either to the function SubscriptionConfirmation, or into a function GotRecords.
We will not consider the SubscriptionConfirmation function in detail; it is implemented according to the principles set forth in documentation. You can view the source code for this function at project git repositories.
The GotRecords function parses the incoming request and for each Record object calls an external script (whose name was passed in the -script parameter) with the following parameters:
bucket name
object key
action:
copy - if in the original request EventName = ObjectCreated | PutObject | PutObjectCopy
delete - if in the original request EventName = ObjectRemoved | DeleteObject
Thus, if a hook arrives with a Post request, as described above, and parameter -script=script.sh then the script will be called as follows:
script.sh bucketA some-file-to-bucket copy
It should be understood that this webhook receiving server is not a complete production solution, but a simplified example of a possible implementation.
Work example
Let's synchronize the files of the main bucket in MCS to the backup bucket in AWS. The main bucket is called myfiles-ash, the backup bucket is called myfiles-backup (configuring the bucket in AWS is outside the scope of this article). Accordingly, when a file is placed in the main bucket, its copy should appear in the backup, when it is removed from the main one, it should be deleted in the backup.
We will work with buckets using the awscli utility, which is compatible with both MCS cloud storage and AWS cloud storage.
ubuntu@ubuntu-basic-1-2-10gb:~$ sudo apt-get install awscli
Reading package lists... Done
Building dependency tree
Reading state information... Done
After this operation, 34.4 MB of additional disk space will be used.
Unpacking awscli (1.14.44-1ubuntu1) ...
Setting up awscli (1.14.44-1ubuntu1) ...
Let's configure access to the S3 MCS API:
ubuntu@ubuntu-basic-1-2-10gb:~$ aws configure --profile mcs
AWS Access Key ID [None]: hdywEPtuuJTExxxxxxxxxxxxxx
AWS Secret Access Key [None]: hDz3SgxKwXoxxxxxxxxxxxxxxxxxx
Default region name [None]:
Default output format [None]:
Let's configure access to the AWS S3 API:
ubuntu@ubuntu-basic-1-2-10gb:~$ aws configure --profile aws
AWS Access Key ID [None]: AKIAJXXXXXXXXXXXX
AWS Secret Access Key [None]: dfuerphOLQwu0CreP5Z8l5fuXXXXXXXXXXXXXXXX
Default region name [None]:
Default output format [None]:
Let's check the access:
To AWS:
ubuntu@ubuntu-basic-1-2-10gb:~$ aws s3 ls --profile aws
2020-07-06 08:44:11 myfiles-backup
For MCS, when running the command, add --endpoint-url:
Let's check how it works. Through MCS web interface add the test.txt file to the myfiles-ash bucket. The logs in the console show that a request was made to the webhook server:
2020/07/06 09:43:08 [POST] incoming HTTP request from
95.163.216.92:56612
download: s3://myfiles-ash/test.txt to ../../../tmp/myfiles-ash/test.txt
upload: ../../../tmp/myfiles-ash/test.txt to
s3://myfiles-backup/test.txt
Let's check the contents of the myfiles-backup bucket in AWS:
Now, through the web interface, delete the file from the myfiles-ash bucket.
Server logs:
2020/07/06 09:44:46 [POST] incoming HTTP request from
95.163.216.92:58224
delete: s3://myfiles-backup/test.txt
Bucket content:
ubuntu@ubuntu-basic-1-2-10gb:~/s3-webhook$ aws s3 --profile aws ls
myfiles-backup
ubuntu@ubuntu-basic-1-2-10gb:~$
File deleted, problem solved.
Conclusion and ToDo
All code used in this article is in my repository. There are also examples of scripts and examples of counting signatures for registering webhooks.
This code is nothing more than an example of how you can use S3 webhooks in your activities. As I said at the beginning, if you plan to use such a server in production, you need to at least rewrite the server for asynchronous work: register incoming webhooks in a queue (RabbitMQ or NATS), and from there they can be parsed and processed by worker applications. Otherwise, with a massive arrival of webhooks, you may encounter a lack of server resources to complete tasks. The presence of queues allows you to spread the server and workers, as well as solve issues with repeating tasks in case of failures. It is also desirable to change the logging to a more detailed and more standardized one.