ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืฉืœื•ื, ื”ื‘ืจ! ื”ื™ื•ื ื ื‘ื ื” ืžืขืจื›ืช ืฉืชืขื‘ื“ ืืช ื–ืจืžื™ ื”ื”ื•ื“ืขื•ืช ืฉืœ Apache Kafka ื‘ืืžืฆืขื•ืช Spark Streaming ื•ืชื›ืชื•ื‘ ืืช ืชื•ืฆืื•ืช ื”ืขื™ื‘ื•ื“ ืœืžืกื“ ื”ื ืชื•ื ื™ื ื‘ืขื ืŸ AWS RDS.

ื‘ื•ืื• ื ื“ืžื™ื™ืŸ ืฉืžื•ืกื“ ืืฉืจืื™ ืžืกื•ื™ื ืžื˜ื™ืœ ืขืœื™ื ื• ืืช ื”ืžืฉื™ืžื” ืฉืœ ืขื™ื‘ื•ื“ ืขืกืงืื•ืช ื ื›ื ืกื•ืช "ื‘ืชื ื•ืขื”" ื‘ื›ืœ ืกื ื™ืคื™ื•. ื ื™ืชืŸ ืœืขืฉื•ืช ื–ืืช ืœืฆื•ืจืš ื—ื™ืฉื•ื‘ ืžื™ื™ื“ื™ ืฉืœ ืคื•ื–ื™ืฆื™ื™ืช ืžื˜ื‘ืข ืคืชื•ื—ื” ืœืื•ืฆืจ, ืžื’ื‘ืœื•ืช ืื• ืชื•ืฆืื•ืช ืคื™ื ื ืกื™ื•ืช ืœืขืกืงืื•ืช ื•ื›ื•'.

ืื™ืš ืœื™ื™ืฉื ืืช ื”ืžืงืจื” ื”ื–ื” ืœืœื ืฉื™ืžื•ืฉ ื‘ื›ื™ืฉื•ืคื™ ืงืกืžื™ื ื•ื›ืฉืคื™ื - ืงืจื ืžืชื—ืช ืœื’ื–ื™ืจื”! ืœืœื›ืช!

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming
(ืžืงื•ืจ ืชืžื•ื ื”)

ืžื‘ื•ื

ื›ืžื•ื‘ืŸ, ืขื™ื‘ื•ื“ ื›ืžื•ืช ื’ื“ื•ืœื” ืฉืœ ื ืชื•ื ื™ื ื‘ื–ืžืŸ ืืžืช ืžืกืคืง ื”ื–ื“ืžื ื•ื™ื•ืช ืจื‘ื•ืช ืœืฉื™ืžื•ืฉ ื‘ืžืขืจื›ื•ืช ืžื•ื“ืจื ื™ื•ืช. ืื—ื“ ื”ืฉื™ืœื•ื‘ื™ื ื”ืคื•ืคื•ืœืจื™ื™ื ื‘ื™ื•ืชืจ ืœื›ืš ื”ื•ื ื”ื˜ื ื“ื ืฉืœ Apache Kafka ื•-Spark Streaming, ืฉื‘ื• ืงืคืงื ื™ื•ืฆืจ ื–ืจื ืฉืœ ื—ื‘ื™ืœื•ืช ื”ื•ื“ืขื•ืช ื ื›ื ืกื•ืช, ื•-Spark Streaming ืžืขื‘ื“ ืžื ื•ืช ืืœื• ื‘ืžืจื•ื•ื— ื–ืžืŸ ื ืชื•ืŸ.

ื›ื“ื™ ืœื”ื’ื‘ื™ืจ ืืช ืกื‘ื™ืœื•ืช ื”ืชืงืœื•ืช ืฉืœ ื”ืืคืœื™ืงืฆื™ื”, ื ืฉืชืžืฉ ื‘ื ืงื•ื“ื•ืช ื‘ื™ืงื•ืจืช. ืขื ื”ืžื ื’ื ื•ืŸ ื”ื–ื”, ื›ืืฉืจ ืžื ื•ืข ื”-Spark Streaming ืฆืจื™ืš ืœืฉื—ื–ืจ ื ืชื•ื ื™ื ืฉืื‘ื“ื•, ื”ื•ื ืจืง ืฆืจื™ืš ืœื—ื–ื•ืจ ืœื ืงื•ื“ืช ื”ืžื—ืกื•ื ื”ืื—ืจื•ื ื” ื•ืœื—ื“ืฉ ืืช ื”ื—ื™ืฉื•ื‘ื™ื ืžืฉื.

ืืจื›ื™ื˜ืงื˜ื•ืจื” ืฉืœ ื”ืžืขืจื›ืช ืฉืคื•ืชื—ื”

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืจื›ื™ื‘ื™ื ื‘ืฉื™ืžื•ืฉ:

  • ืืคืืฆ'ื™ ืงืคืงื ื”ื™ื ืžืขืจื›ืช ืžื‘ื•ื–ืจืช ืœืคืจืกื•ื-ื”ืจืฉืžื” ืœื”ื•ื“ืขื•ืช. ืžืชืื™ื ื’ื ืœืฆืจื™ื›ืช ื”ื•ื“ืขื•ืช ื‘ืžืฆื‘ ืœื ืžืงื•ื•ืŸ ื•ื’ื ืœืฆืจื™ื›ืช ื”ื•ื“ืขื•ืช ืžืงื•ื•ื ืช. ื›ื“ื™ ืœืžื ื•ืข ืื•ื‘ื“ืŸ ื ืชื•ื ื™ื, ื”ื•ื“ืขื•ืช ืงืคืงื ืžืื•ื—ืกื ื•ืช ื‘ื“ื™ืกืง ื•ืžืฉื•ื›ืคืœื•ืช ื‘ืชื•ืš ื”ืืฉื›ื•ืœ. ืžืขืจื›ืช ืงืคืงื ื‘ื ื•ื™ื” ืขืœ ื’ื‘ื™ ืฉื™ืจื•ืช ื”ืกื ื›ืจื•ืŸ ZooKeeper;
  • ื”ื–ืจืžืช ืืคืืฆ'ื™ ืกืคืืจืง - ืจื›ื™ื‘ Spark ืœืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื–ื•ืจืžื™ื. ืžื•ื“ื•ืœ Spark Streaming ื‘ื ื•ื™ ื‘ืืžืฆืขื•ืช ืืจื›ื™ื˜ืงื˜ื•ืจืช ืžื™ืงืจื•-ืืฆื•ื•ื”, ืฉื‘ื” ื–ืจื ื”ื ืชื•ื ื™ื ืžืชืคืจืฉ ื›ืจืฆืฃ ืจืฆื™ืฃ ืฉืœ ืžื ื•ืช ื ืชื•ื ื™ื ืงื˜ื ื•ืช. Spark Streaming ืœื•ืงื— ื ืชื•ื ื™ื ืžืžืงื•ืจื•ืช ืฉื•ื ื™ื ื•ืžืฉืœื‘ ืื•ืชื ื‘ื—ื‘ื™ืœื•ืช ืงื˜ื ื•ืช. ื—ื‘ื™ืœื•ืช ื—ื“ืฉื•ืช ื ื•ืฆืจื•ืช ื‘ืžืจื•ื•ื—ื™ ื–ืžืŸ ืงื‘ื•ืขื™ื. ื‘ืชื—ื™ืœืช ื›ืœ ืžืจื•ื•ื— ื–ืžืŸ, ื ื•ืฆืจืช ื—ื‘ื™ืœื” ื—ื“ืฉื”, ื•ื›ืœ ื”ื ืชื•ื ื™ื ื”ืžืชืงื‘ืœื™ื ื‘ืžื”ืœืš ืื•ืชื• ืžืจื•ื•ื— ื–ืžืŸ ื ื›ืœืœื™ื ื‘ื—ื‘ื™ืœื”. ื‘ืชื•ื ื”ืžืจื•ื•ื—, ืฆืžื™ื—ืช ื”ื—ื‘ื™ืœื” ื ืขืฆืจืช. ื’ื•ื“ืœ ื”ืžืจื•ื•ื— ื ืงื‘ืข ืขืœ ื™ื“ื™ ืคืจืžื˜ืจ ื”ื ืงืจื ืžืจื•ื•ื— ืืฆื•ื•ื”;
  • Apache Spark SQL - ืžืฉืœื‘ ืขื™ื‘ื•ื“ ื™ื—ืกื™ ืขื ืชื›ื ื•ืช ืคื•ื ืงืฆื™ื•ื ืœื™ ืฉืœ Spark. ื ืชื•ื ื™ื ืžื•ื‘ื ื™ื ืคื™ืจื•ืฉื ื ืชื•ื ื™ื ืฉื™ืฉ ืœื”ื ืกื›ื™ืžื”, ื›ืœื•ืžืจ ืงื‘ื•ืฆื” ืื—ืช ืฉืœ ืฉื“ื•ืช ืขื‘ื•ืจ ื›ืœ ื”ืจืฉื•ืžื•ืช. Spark SQL ืชื•ืžืš ื‘ืงืœื˜ ืžืžื’ื•ื•ืŸ ืžืงื•ืจื•ืช ื ืชื•ื ื™ื ืžื•ื‘ื ื™ื, ื•ื‘ื–ื›ื•ืช ื”ื–ืžื™ื ื•ืช ืฉืœ ืžื™ื“ืข ืกื›ื™ืžื”, ื”ื•ื ื™ื›ื•ืœ ืœืื—ื–ืจ ื‘ื™ืขื™ืœื•ืช ืจืง ืืช ืฉื“ื•ืช ื”ืจืฉื•ืžื•ืช ื”ื ื“ืจืฉื™ื, ื•ื›ืŸ ืžืกืคืง ืžืžืฉืงื™ DataFrame API;
  • AWS RDS ื”ื•ื ืžืกื“ ื ืชื•ื ื™ื ื™ื—ืกื™ ืžื‘ื•ืกืก ืขื ืŸ ื–ื•ืœ ื™ื—ืกื™ืช, ืฉื™ืจื•ืช ืื™ื ื˜ืจื ื˜ ื”ืžืคืฉื˜ ื”ื’ื“ืจื”, ืชืคืขื•ืœ ื•ืงื ื” ืžื™ื“ื”, ื•ืžื ื•ื”ืœ ื™ืฉื™ืจื•ืช ืขืœ ื™ื“ื™ ืืžื–ื•ืŸ.

ื”ืชืงื ื” ื•ื”ืจืฆื” ืฉืœ ืฉืจืช ืงืคืงื

ืœืคื ื™ ื”ืฉื™ืžื•ืฉ ื™ืฉื™ืจื•ืช ื‘ืงืคืงื, ืขืœื™ืš ืœื•ื•ื“ื ืฉื™ืฉ ืœืš Java, ื›ื™... JVM ืžืฉืžืฉ ืœืขื‘ื•ื“ื”:

sudo apt-get update 
sudo apt-get install default-jre
java -version

ื‘ื•ืื• ื ื™ืฆื•ืจ ืžืฉืชืžืฉ ื—ื“ืฉ ืฉื™ืขื‘ื•ื“ ืขื ืงืคืงื:

sudo useradd kafka -m
sudo passwd kafka
sudo adduser kafka sudo

ืœืื—ืจ ืžื›ืŸ, ื”ื•ืจื“ ืืช ื”ื”ืคืฆื” ืžื”ืืชืจ ื”ืจืฉืžื™ ืฉืœ Apache Kafka:

wget -P /YOUR_PATH "http://apache-mirror.rbc.ru/pub/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz"

ืคืจืง ืืช ื”ืืจื›ื™ื•ืŸ ืฉื”ื•ืจื“:

tar -xvzf /YOUR_PATH/kafka_2.12-2.2.0.tgz
ln -s /YOUR_PATH/kafka_2.12-2.2.0 kafka

ื”ืฉืœื‘ ื”ื‘ื ื”ื•ื ืื•ืคืฆื™ื•ื ืœื™. ื”ืขื•ื‘ื“ื” ื”ื™ื ืฉื”ื’ื“ืจื•ืช ื‘ืจื™ืจืช ื”ืžื—ื“ืœ ืื™ื ืŸ ืžืืคืฉืจื•ืช ืœืš ืœื”ืฉืชืžืฉ ื‘ืžืœื•ืื• ื‘ื›ืœ ื”ืชื›ื•ื ื•ืช ืฉืœ Apache Kafka. ืœื“ื•ื’ืžื”, ืžื—ืง ื ื•ืฉื, ืงื˜ื’ื•ืจื™ื”, ืงื‘ื•ืฆื” ืฉืืœื™ื” ื ื™ืชืŸ ืœืคืจืกื ื”ื•ื“ืขื•ืช. ื›ื“ื™ ืœืฉื ื•ืช ื–ืืช, ื‘ื•ืื• ื ืขืจื•ืš ืืช ืงื•ื‘ืฅ ื”ืชืฆื•ืจื”:

vim ~/kafka/config/server.properties

ื”ื•ืกืฃ ืืช ื”ื“ื‘ืจื™ื ื”ื‘ืื™ื ืœืกื•ืฃ ื”ืงื•ื‘ืฅ:

delete.topic.enable = true

ืœืคื ื™ ื”ืคืขืœืช ืฉืจืช Kafka, ืขืœื™ืš ืœื”ืคืขื™ืœ ืืช ืฉืจืช ZooKeeper; ืื ื• ื ืฉืชืžืฉ ื‘ืกืงืจื™ืคื˜ ื”ืขื–ืจ ืฉืžื’ื™ืข ืขื ื”ืคืฆืช Kafka:

Cd ~/kafka
bin/zookeeper-server-start.sh config/zookeeper.properties

ืœืื—ืจ ื”ืคืขืœืช ZooKeeper ื‘ื”ืฆืœื—ื”, ื”ืคืขืœ ืืช ืฉืจืช Kafka ื‘ืžืกื•ืฃ ื ืคืจื“:

bin/kafka-server-start.sh config/server.properties

ื‘ื•ืื• ื ื™ืฆื•ืจ ื ื•ืฉื ื—ื“ืฉ ื‘ืฉื ืขืกืงื”:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic transaction

ื‘ื•ืื• ื ื•ื•ื“ื ืฉื ื•ืฆืจ ื ื•ืฉื ืขื ื”ืžืกืคืจ ื”ื ื“ืจืฉ ืฉืœ ืžื—ื™ืฆื•ืช ื•ืฉื›ืคื•ืœ:

bin/kafka-topics.sh --describe --zookeeper localhost:2181

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื‘ื•ืื• ื ืชื’ืขื’ืข ืœืจื’ืขื™ื ืฉืœ ื‘ื“ื™ืงืช ื”ื™ืฆืจืŸ ื•ื”ืฆืจื›ืŸ ืœื’ื‘ื™ ื”ื ื•ืฉื ื”ื—ื“ืฉ ืฉื ื•ืฆืจ. ืคืจื˜ื™ื ื ื•ืกืคื™ื ืขืœ ืื•ืคืŸ ื”ื‘ื“ื™ืงื” ืฉืœ ืฉืœื™ื—ืช ื•ืงื‘ืœืช ื”ื•ื“ืขื•ืช ื›ืชื•ื‘ื™ื ื‘ืชื™ืขื•ื“ ื”ืจืฉืžื™ - ืฉืœื— ื›ืžื” ื”ื•ื“ืขื•ืช. ื•ื‘ื›ืŸ, ืื ื—ื ื• ืขื•ื‘ืจื™ื ืœื›ืชื™ื‘ืช ืžืคื™ืง ื‘-Python ื‘ืืžืฆืขื•ืช ื”-API ืฉืœ KafkaProducer.

ื›ื•ืชื‘ ืžืคื™ืง

ื”ืžืคื™ืง ื™ืคื™ืง ื ืชื•ื ื™ื ืืงืจืื™ื™ื - 100 ื”ื•ื“ืขื•ืช ื‘ื›ืœ ืฉื ื™ื™ื”. ื‘ื ืชื•ื ื™ื ืืงืจืื™ื™ื ืื ื• ืžืชื›ื•ื•ื ื™ื ืœืžื™ืœื•ืŸ ื”ืžื•ืจื›ื‘ ืžืฉืœื•ืฉื” ืฉื“ื•ืช:

  • ืกื ื™ืฃ - ืฉื ื ืงื•ื“ืช ื”ืžื›ื™ืจื” ืฉืœ ืžื•ืกื“ ื”ืืฉืจืื™;
  • ืžึทื˜ึฐื‘ึผึตืขึท - ืžื˜ื‘ืข ืขืกืงื”;
  • ื›ืžื•ืช - ืกื›ื•ื ื”ืขืกืงื”. ื”ืกื›ื•ื ื™ื”ื™ื” ืžืกืคืจ ื—ื™ื•ื‘ื™ ืื ืžื“ื•ื‘ืจ ื‘ืจื›ื™ืฉืช ืžื˜ื‘ืข ืขืœ ื™ื“ื™ ื”ื‘ื ืง, ื•ืžืกืคืจ ืฉืœื™ืœื™ ืื ืžื“ื•ื‘ืจ ื‘ืžื›ื™ืจื”.

ื”ืงื•ื“ ืฉืœ ื”ืžืคื™ืง ื ืจืื” ื›ืš:

from numpy.random import choice, randint

def get_random_value():
    new_dict = {}

    branch_list = ['Kazan', 'SPB', 'Novosibirsk', 'Surgut']
    currency_list = ['RUB', 'USD', 'EUR', 'GBP']

    new_dict['branch'] = choice(branch_list)
    new_dict['currency'] = choice(currency_list)
    new_dict['amount'] = randint(-100, 100)

    return new_dict

ืœืื—ืจ ืžื›ืŸ, ื‘ืืžืฆืขื•ืช ืฉื™ื˜ืช ื”ืฉืœื™ื—ื”, ืื ื• ืฉื•ืœื—ื™ื ื”ื•ื“ืขื” ืœืฉืจืช, ืœื ื•ืฉื ืฉืื ื• ืฆืจื™ื›ื™ื, ื‘ืคื•ืจืžื˜ JSON:

from kafka import KafkaProducer    

producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                             value_serializer=lambda x:dumps(x).encode('utf-8'),
                             compression_type='gzip')
my_topic = 'transaction'
data = get_random_value()

try:
    future = producer.send(topic = my_topic, value = data)
    record_metadata = future.get(timeout=10)
    
    print('--> The message has been sent to a topic: 
            {}, partition: {}, offset: {}' 
            .format(record_metadata.topic,
                record_metadata.partition,
                record_metadata.offset ))   
                             
except Exception as e:
    print('--> It seems an Error occurred: {}'.format(e))

finally:
    producer.flush()

ื‘ืขืช ื”ืคืขืœืช ื”ืกืงืจื™ืคื˜, ืื ื• ืžืงื‘ืœื™ื ืืช ื”ื”ื•ื“ืขื•ืช ื”ื‘ืื•ืช ื‘ื˜ืจืžื™ื ืœ:

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื–ื” ืื•ืžืจ ืฉื”ื›ืœ ืขื•ื‘ื“ ื›ืžื• ืฉืจืฆื™ื ื• โ€“ ื”ืžืคื™ืง ืžื™ื™ืฆืจ ื•ืฉื•ืœื— ืžืกืจื™ื ืœื ื•ืฉื ืฉืื ื—ื ื• ืฆืจื™ื›ื™ื.
ื”ืฉืœื‘ ื”ื‘ื ื”ื•ื ื”ืชืงื ืช Spark ื•ืขื™ื‘ื•ื“ ื–ืจื ื”ื”ื•ื“ืขื•ืช ื”ื–ื”.

ื”ืชืงื ืช Apache Spark

ืืคืืฆ 'ื™ ืกืคืืจืง ื”ื™ื ืคืœื˜ืคื•ืจืžืช ืžื—ืฉื•ื‘ ืืฉื›ื•ืœื•ืช ืื•ื ื™ื‘ืจืกืœื™ืช ื•ื‘ืขืœืช ื‘ื™ืฆื•ืขื™ื ื’ื‘ื•ื”ื™ื.

Spark ืžืชืคืงื“ ื˜ื•ื‘ ื™ื•ืชืจ ืžื”ื˜ืžืขื•ืช ืคื•ืคื•ืœืจื™ื•ืช ืฉืœ ืžื•ื“ืœ MapReduce ืชื•ืš ืชืžื™ื›ื” ื‘ืžื’ื•ื•ืŸ ืจื—ื‘ ื™ื•ืชืจ ืฉืœ ืกื•ื’ื™ ื—ื™ืฉื•ื‘, ื›ื•ืœืœ ืฉืื™ืœืชื•ืช ืื™ื ื˜ืจืืงื˜ื™ื‘ื™ื•ืช ื•ืขื™ื‘ื•ื“ ื–ืจื. ืœืžื”ื™ืจื•ืช ื™ืฉ ืชืคืงื™ื“ ื—ืฉื•ื‘ ื‘ืขื™ื‘ื•ื“ ื›ืžื•ื™ื•ืช ื’ื“ื•ืœื•ืช ืฉืœ ื ืชื•ื ื™ื, ืฉื›ืŸ ื”ืžื”ื™ืจื•ืช ื”ื™ื ื”ืžืืคืฉืจืช ืœืš ืœืขื‘ื•ื“ ื‘ืื•ืคืŸ ืื™ื ื˜ืจืืงื˜ื™ื‘ื™ ืžื‘ืœื™ ืœื‘ื–ื‘ื– ื“ืงื•ืช ืื• ืฉืขื•ืช ื”ืžืชื ื”. ืื—ืช ื”ื™ืชืจื•ื ื•ืช ื”ื’ื“ื•ืœื™ื ืฉืœ Spark ืฉื”ื•ืคื›ืช ืื•ืชื• ืœืžื”ื™ืจ ื›ืœ ื›ืš ื”ื•ื ื”ื™ื›ื•ืœืช ืฉืœื• ืœื‘ืฆืข ื—ื™ืฉื•ื‘ื™ื ื‘ื–ื™ื›ืจื•ืŸ.

ื”ืžืกื’ืจืช ื”ื–ื• ื›ืชื•ื‘ื” ื‘-Scala, ืื– ืชื—ื™ืœื” ืขืœื™ืš ืœื”ืชืงื™ืŸ ืื•ืชื”:

sudo apt-get install scala

ื”ื•ืจื“ ืืช ื”ืคืฆืช Spark ืžื”ืืชืจ ื”ืจืฉืžื™:

wget "http://mirror.linux-ia64.org/apache/spark/spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz"

ืคืจืง ืืช ื”ืืจื›ื™ื•ืŸ:

sudo tar xvf spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz -C /usr/local/spark

ื”ื•ืกืฃ ืืช ื”ื ืชื™ื‘ ืœ-Spark ืœืงื•ื‘ืฅ bash:

vim ~/.bashrc

ื”ื•ืกืฃ ืืช ื”ืฉื•ืจื•ืช ื”ื‘ืื•ืช ื“ืจืš ื”ืขื•ืจืš:

SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin:$PATH

ื”ืคืขืœ ืืช ื”ืคืงื•ื“ื” ืœืžื˜ื” ืœืื—ืจ ื‘ื™ืฆื•ืข ืฉื™ื ื•ื™ื™ื ื‘-bashrc:

source ~/.bashrc

ืคืจื™ืกืช AWS PostgreSQL

ื›ืœ ืฉื ื•ืชืจ ื”ื•ื ืœืคืจื•ืก ืืช ื‘ืกื™ืก ื”ื ืชื•ื ื™ื ืืœื™ื• ื ืขืœื” ืืช ื”ืžื™ื“ืข ื”ืžืขื•ื‘ื“ ืžื”ื–ืจืžื™ื. ืœืฉื ื›ืš ื ืฉืชืžืฉ ื‘ืฉื™ืจื•ืช AWS RDS.

ืขื‘ื•ืจ ืืœ ืžืกื•ืฃ AWS -> AWS RDS -> ืžืกื“ื™ ื ืชื•ื ื™ื -> ืฆื•ืจ ืžืกื“ ื ืชื•ื ื™ื:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื‘ื—ืจ PostgreSQL ื•ืœื—ืฅ ืขืœ ื”ื‘ื:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื›ื™ ื“ื•ื’ืžื” ื–ื• ื”ื™ื ืœืžื˜ืจื•ืช ื—ื™ื ื•ื›ื™ื•ืช ื‘ืœื‘ื“; ืื ื• ื ืฉืชืžืฉ ื‘ืฉืจืช ื—ื™ื ืžื™ "ืœืคื—ื•ืช" (Free Tier):
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืœืื—ืจ ืžื›ืŸ, ืฉืžื™ื ืกื™ืžื•ืŸ ื‘ื’ื•ืฉ Free Tier, ื•ืœืื—ืจ ืžื›ืŸ ื™ื•ืฆืข ืœื ื• ืื•ื˜ื•ืžื˜ื™ืช ืžื•ืคืข ืฉืœ ืžื—ืœืงืช t2.micro - ืœืžืจื•ืช ืฉื”ื™ื ื—ืœืฉื”, ื”ื™ื ื—ื™ื ืžื™ืช ื•ื“ื™ ืžืชืื™ืžื” ืœืžืฉื™ืžื” ืฉืœื ื•:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื‘ื”ืžืฉืš ืžื’ื™ืขื™ื ื“ื‘ืจื™ื ื—ืฉื•ื‘ื™ื ืžืื•ื“: ืฉื ืžื•ืคืข ื‘ืกื™ืก ื”ื ืชื•ื ื™ื, ืฉื ื”ืžืฉืชืžืฉ ื”ืจืืฉื™ ื•ื”ืกื™ืกืžื” ืฉืœื•. ื‘ื•ืื• ื ืงืจื ืœืžื•ืคืข: myHabrTest, ืžืฉืชืžืฉ ืจืืฉื™: ื”ืื‘ืจ, ืกื™ืกืžื”: habr12345 ื•ืœื—ืฅ ืขืœ ื›ืคืชื•ืจ ื”ื‘ื:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื‘ืขืžื•ื“ ื”ื‘ื ื™ืฉ ืคืจืžื˜ืจื™ื ื”ืื—ืจืื™ื ืขืœ ื”ื ื’ื™ืฉื•ืช ืฉืœ ืฉืจืช ืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœื ื• ืžื‘ื—ื•ืฅ (Public accessibility) ื•ื–ืžื™ื ื•ืช ื”ืคื•ืจื˜ื™ื:

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ื‘ื•ืื• ื ื™ืฆื•ืจ ื”ื’ื“ืจื” ื—ื“ืฉื” ืœืงื‘ื•ืฆืช ื”ืื‘ื˜ื—ื” VPC, ืฉืชืืคืฉืจ ื’ื™ืฉื” ื—ื™ืฆื•ื ื™ืช ืœืฉืจืช ืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœื ื• ื“ืจืš ื™ืฆื™ืื” 5432 (PostgreSQL).
ื‘ื•ืื• ื ืขื‘ื•ืจ ืœืžืกื•ืฃ AWS ื‘ื—ืœื•ืŸ ื“ืคื“ืคืŸ ื ืคืจื“ ืœืœื•ื— ื”ืžื—ื•ื•ื ื™ื ืฉืœ VPC -> ืงื‘ื•ืฆื•ืช ืื‘ื˜ื—ื” -> ื™ืฆื™ืจืช ืงื‘ื•ืฆืช ืื‘ื˜ื—ื”:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืื ื• ืžื’ื“ื™ืจื™ื ืืช ื”ืฉื ืœืงื‘ื•ืฆืช ื”ืื‘ื˜ื—ื” - PostgreSQL, ืชื™ืื•ืจ, ืžืฆื™ื™ื ื™ื ืœืื™ื–ื” VPC ืงื‘ื•ืฆื” ื–ื• ืฆืจื™ื›ื” ืœื”ื™ื•ืช ืžืฉื•ื™ื›ืช ื•ืœื—ืฅ ืขืœ ื›ืคืชื•ืจ ืฆื•ืจ:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืžืœื ืืช ื”ื›ืœืœื™ื ื”ื ื›ื ืกื™ื ืขื‘ื•ืจ ื™ืฆื™ืื” 5432 ืขื‘ื•ืจ ื”ืงื‘ื•ืฆื” ื”ื—ื“ืฉื” ืฉื ื•ืฆืจื”, ื›ืคื™ ืฉืžื•ืฆื’ ื‘ืชืžื•ื ื” ืœืžื˜ื”. ืืชื” ืœื ื™ื›ื•ืœ ืœืฆื™ื™ืŸ ืืช ื”ื™ืฆื™ืื” ื‘ืื•ืคืŸ ื™ื“ื ื™, ืื‘ืœ ื‘ื—ืจ PostgreSQL ืžื”ืจืฉื™ืžื” ื”ื ืคืชื—ืช ืกื•ื’.

ื‘ืื•ืคืŸ ืงืคื“ื ื™, ื”ืขืจืš ::/0 ืคื™ืจื•ืฉื• ื–ืžื™ื ื•ืช ืฉืœ ืชืขื‘ื•ืจื” ื ื›ื ืกืช ืœืฉืจืช ืžื›ืœ ืจื—ื‘ื™ ื”ืขื•ืœื, ืžื” ืฉืงื ื•ื ื™ ืœื ืœื’ืžืจื™ ื ื›ื•ืŸ, ืื‘ืœ ื›ื“ื™ ืœื ืชื— ืืช ื”ื“ื•ื’ืžื”, ื‘ื•ืื• ื ืืคืฉืจ ืœืขืฆืžื ื• ืœื”ืฉืชืžืฉ ื‘ื’ื™ืฉื” ื”ื–ื•:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืื ื• ื—ื•ื–ืจื™ื ืœื“ืฃ ื”ื“ืคื“ืคืŸ, ืฉื ื ืคืชื— "ื”ื’ื“ืจ ื”ื’ื“ืจื•ืช ืžืชืงื“ืžื•ืช" ื•ื‘ื—ืจ ื‘ืกืขื™ืฃ ืงื‘ื•ืฆื•ืช ื”ืื‘ื˜ื—ื” ืฉืœ VPC -> ื‘ื—ืจ ืงื‘ื•ืฆื•ืช ืื‘ื˜ื—ื” ืงื™ื™ืžื•ืช ืฉืœ VPC -> PostgreSQL:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืœืื—ืจ ืžื›ืŸ, ื‘ืืคืฉืจื•ื™ื•ืช ืžืกื“ ื”ื ืชื•ื ื™ื -> ืฉื ืžืกื“ ื”ื ืชื•ื ื™ื -> ื”ื’ื“ืจ ืืช ื”ืฉื - habrDB.

ืื ื• ื™ื›ื•ืœื™ื ืœื”ืฉืื™ืจ ืืช ืฉืืจ ื”ืคืจืžื˜ืจื™ื, ืœืžืขื˜ ื‘ื™ื˜ื•ืœ ื’ื™ื‘ื•ื™ (ืชืงื•ืคืช ืฉืžื™ืจืช ื’ื™ื‘ื•ื™ - 0 ื™ืžื™ื), ื ื™ื˜ื•ืจ ื•ืชื•ื‘ื ื•ืช ื‘ื™ืฆื•ืขื™ื, ื›ื‘ืจื™ืจืช ืžื—ื“ืœ. ืœื—ืฅ ืขืœ ื”ื›ืคืชื•ืจ ืฆื•ืจ ื‘ืกื™ืก ื ืชื•ื ื™ื:
ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืžื˜ืคืœ ื‘ื—ื•ื˜ื™ื

ื”ืฉืœื‘ ื”ืื—ืจื•ืŸ ื™ื”ื™ื” ืคื™ืชื•ื— ืขื‘ื•ื“ื” ืฉืœ Spark, ืฉืชืขื‘ื“ ื ืชื•ื ื™ื ื—ื“ืฉื™ื ื”ืžื’ื™ืขื™ื ืžืงืคืงื ื›ืœ ืฉืชื™ ืฉื ื™ื•ืช ื•ืชื›ื ื™ืก ืืช ื”ืชื•ืฆืื” ืœืžืกื“ ื”ื ืชื•ื ื™ื.

ื›ืคื™ ืฉืฆื•ื™ืŸ ืœืขื™ืœ, ืžื—ืกื•ืžื™ื ื”ื ืžื ื’ื ื•ืŸ ืœื™ื‘ื” ื‘-SparkStreaming ืฉื™ืฉ ืœื”ื’ื“ื™ืจ ืื•ืชื• ื›ื“ื™ ืœื”ื‘ื˜ื™ื— ืกื•ื‘ืœื ื•ืช ืœืชืงืœื•ืช. ืื ื• ื ืฉืชืžืฉ ื‘ื ืงื•ื“ื•ืช ื‘ื™ืงื•ืจืช, ื•ืื ื”ื”ืœื™ืš ื ื›ืฉืœ, ืžื•ื“ื•ืœ Spark Streaming ื™ืฆื˜ืจืš ืจืง ืœื—ื–ื•ืจ ืœื ืงื•ื“ืช ื”ื‘ื™ื“ื•ืง ื”ืื—ืจื•ื ื” ื•ืœื—ื“ืฉ ืžืžื ื• ื—ื™ืฉื•ื‘ื™ื ื›ื“ื™ ืœืฉื—ื–ืจ ืืช ื”ื ืชื•ื ื™ื ืฉืื‘ื“ื•.

ื ื™ืชืŸ ืœื”ืคืขื™ืœ ืืช ื”ืžื—ืกื•ื ืขืœ ื™ื“ื™ ื”ื’ื“ืจืช ืกืคืจื™ื™ื” ื‘ืžืขืจื›ืช ืงื‘ืฆื™ื ืขืžื™ื“ื” ืœืชืงืœื•ืช ื•ืืžื™ื ื” (ื›ื’ื•ืŸ HDFS, S3 ื•ื›ื•') ืฉื‘ื” ื™ื™ืฉืžืจ ืคืจื˜ื™ ื”ืžื—ืกื•ื. ื–ื” ื ืขืฉื” ื‘ืืžืฆืขื•ืช, ืœืžืฉืœ:

streamingContext.checkpoint(checkpointDirectory)

ื‘ื“ื•ื’ืžื” ืฉืœื ื•, ื ืฉืชืžืฉ ื‘ื’ื™ืฉื” ื”ื‘ืื”, ื›ืœื•ืžืจ, ืื checkpointDirectory ืงื™ื™ื, ืื– ื”ื”ืงืฉืจ ื™ื™ื•ื•ืฆืจ ืžื—ื“ืฉ ืžื ืชื•ื ื™ ื”ืžื—ืกื•ื. ืื ื”ืกืคืจื™ื™ื” ืœื ืงื™ื™ืžืช (ื›ืœื•ืžืจ ืžื‘ื•ืฆืขืช ื‘ืคืขื ื”ืจืืฉื•ื ื”), ืื–ื™ ื ืงืจื functionToCreateContext ื›ื“ื™ ืœื™ืฆื•ืจ ื”ืงืฉืจ ื—ื“ืฉ ื•ืœื”ื’ื“ื™ืจ ืืช DStreams:

from pyspark.streaming import StreamingContext

context = StreamingContext.getOrCreate(checkpointDirectory, functionToCreateContext)

ืื ื• ื™ื•ืฆืจื™ื ืื•ื‘ื™ื™ืงื˜ DirectStream ื›ื“ื™ ืœื”ืชื—ื‘ืจ ืœื ื•ืฉื "ื”ืขืกืงื”" ื‘ืืžืฆืขื•ืช ืฉื™ื˜ืช createDirectStream ืฉืœ ืกืคืจื™ื™ืช KafkaUtils:

from pyspark.streaming.kafka import KafkaUtils
    
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 2)

broker_list = 'localhost:9092'
topic = 'transaction'

directKafkaStream = KafkaUtils.createDirectStream(ssc,
                                [topic],
                                {"metadata.broker.list": broker_list})

ื ื™ืชื•ื— ื ืชื•ื ื™ื ื ื›ื ืกื™ื ื‘ืคื•ืจืžื˜ JSON:

rowRdd = rdd.map(lambda w: Row(branch=w['branch'],
                                       currency=w['currency'],
                                       amount=w['amount']))
                                       
testDataFrame = spark.createDataFrame(rowRdd)
testDataFrame.createOrReplaceTempView("treasury_stream")

ื‘ืืžืฆืขื•ืช Spark SQL, ืื ื• ืขื•ืฉื™ื ืงื™ื‘ื•ืฅ ืคืฉื•ื˜ ื•ืžืฆื™ื’ื™ื ืืช ื”ืชื•ืฆืื” ื‘ืžืกื•ืฃ:

select 
    from_unixtime(unix_timestamp()) as curr_time,
    t.branch                        as branch_name,
    t.currency                      as currency_code,
    sum(amount)                     as batch_value
from treasury_stream t
group by
    t.branch,
    t.currency

ืงื‘ืœืช ื˜ืงืกื˜ ื”ืฉืื™ืœืชื” ื•ื”ืจืฆืชื• ื‘ืืžืฆืขื•ืช Spark SQL:

sql_query = get_sql_query()
testResultDataFrame = spark.sql(sql_query)
testResultDataFrame.show(n=5)

ื•ืื– ืื ื• ืฉื•ืžืจื™ื ืืช ื”ื ืชื•ื ื™ื ื”ืžืฆื˜ื‘ืจื™ื ื”ืžืชืงื‘ืœื™ื ื‘ื˜ื‘ืœื” ื‘-AWS RDS. ื›ื“ื™ ืœืฉืžื•ืจ ืืช ืชื•ืฆืื•ืช ื”ืฆื‘ื™ืจื” ืœื˜ื‘ืœืช ืžืกื“ ื ืชื•ื ื™ื, ื ืฉืชืžืฉ ื‘ืฉื™ื˜ืช ื”ื›ืชื™ื‘ื” ืฉืœ ื”ืื•ื‘ื™ื™ืงื˜ DataFrame:

testResultDataFrame.write 
    .format("jdbc") 
    .mode("append") 
    .option("driver", 'org.postgresql.Driver') 
    .option("url","jdbc:postgresql://myhabrtest.ciny8bykwxeg.us-east-1.rds.amazonaws.com:5432/habrDB") 
    .option("dbtable", "transaction_flow") 
    .option("user", "habr") 
    .option("password", "habr12345") 
    .save()

ื›ืžื” ืžื™ืœื™ื ืขืœ ื”ื’ื“ืจืช ื—ื™ื‘ื•ืจ ืœ-AWS RDS. ื™ืฆืจื ื• ืืช ื”ืžืฉืชืžืฉ ื•ื”ืกื™ืกืžื” ืขื‘ื•ืจื• ื‘ืฉืœื‘ "ืคืจื™ืกืช AWS PostgreSQL". ืขืœื™ืš ืœื”ืฉืชืžืฉ ื‘- Endpoint ื›ื›ืชื•ื‘ืช ื”ืืชืจ ืฉืœ ืฉืจืช ืžืกื“ ื”ื ืชื•ื ื™ื, ื”ืžื•ืฆื’ืช ื‘ืกืขื™ืฃ ืงื™ืฉื•ืจื™ื•ืช ื•ืื‘ื˜ื—ื”:

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืขืœ ืžื ืช ืœื—ื‘ืจ ื ื›ื•ืŸ ืืช ืกืคืืจืง ื•ืงืคืงื, ืขืœื™ืš ืœื”ืจื™ืฅ ืืช ื”ืขื‘ื•ื“ื” ื‘ืืžืฆืขื•ืช smark-submit ื‘ืืžืฆืขื•ืช ื”ื—ืคืฅ spark-streaming-kafka-0-8_2.11. ื‘ื ื•ืกืฃ, ื ืฉืชืžืฉ ื’ื ื‘ื—ืคืฅ ืœืื™ื ื˜ืจืืงืฆื™ื” ืขื ืžืกื“ ื”ื ืชื•ื ื™ื PostgreSQL; ื ืขื‘ื™ืจ ืื•ืชื ื‘ืืžืฆืขื•ืช --packages.

ืœืฆื•ืจืš ื’ืžื™ืฉื•ืช ื”ืกืงืจื™ืคื˜ ื ื›ืœื•ืœ ื›ืคืจืžื˜ืจื™ ืงืœื˜ ื’ื ืืช ืฉื ืฉืจืช ื”ื”ื•ื“ืขื•ืช ื•ื”ื ื•ืฉื ืžืžื ื• ื ืจืฆื” ืœืงื‘ืœ ื ืชื•ื ื™ื.

ืื–, ื”ื’ื™ืข ื”ื–ืžืŸ ืœื”ืคืขื™ืœ ื•ืœื‘ื“ื•ืง ืืช ื”ืคื•ื ืงืฆื™ื•ื ืœื™ื•ืช ืฉืœ ื”ืžืขืจื›ืช:

spark-submit 
--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2,
org.postgresql:postgresql:9.4.1207 
spark_job.py localhost:9092 transaction

ื”ื›ืœ ื”ืกืชื“ืจ! ื›ืคื™ ืฉื ื™ืชืŸ ืœืจืื•ืช ื‘ืชืžื•ื ื” ืœืžื˜ื”, ื‘ื–ืžืŸ ืฉื”ืืคืœื™ืงืฆื™ื” ืคื•ืขืœืช, ืชื•ืฆืื•ืช ืฆื‘ื™ืจื” ื—ื“ืฉื•ืช ื™ื•ืฆืื•ืช ื›ืœ 2 ืฉื ื™ื•ืช, ืžื›ื™ื•ื•ืŸ ืฉื”ื’ื“ืจื ื• ืืช ืžืจื•ื•ื— ื”ืืฆื•ื•ื” ืœ-2 ืฉื ื™ื•ืช ื›ืืฉืจ ื™ืฆืจื ื• ืืช ื”ืื•ื‘ื™ื™ืงื˜ StreamingContext:

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืœืื—ืจ ืžื›ืŸ, ืื ื• ืžื‘ืฆืขื™ื ืฉืื™ืœืชื” ืคืฉื•ื˜ื” ืœืžืกื“ ื”ื ืชื•ื ื™ื ื›ื“ื™ ืœื‘ื“ื•ืง ืืช ื ื•ื›ื—ื•ืชืŸ ืฉืœ ืจืฉื•ืžื•ืช ื‘ื˜ื‘ืœื” ื–ืจื™ืžืช_ืขืกืงื”:

ืืคืืฆ'ื™ ืงืคืงื ื•ืขื™ื‘ื•ื“ ื ืชื•ื ื™ื ื‘ืกื˜ืจื™ืžื™ื ื’ ืขื Spark Streaming

ืžืกืงื ื”

ืžืืžืจ ื–ื” ื‘ื—ืŸ ื“ื•ื’ืžื” ืœืขื™ื‘ื•ื“ ื–ืจื ืฉืœ ืžื™ื“ืข ื‘ืืžืฆืขื•ืช Spark Streaming ื‘ืฉื™ืœื•ื‘ ืขื Apache Kafka ื•- PostgreSQL. ืขื ืฆืžื™ื—ืช ื”ื ืชื•ื ื™ื ืžืžืงื•ืจื•ืช ืฉื•ื ื™ื, ืงืฉื” ืœื”ืคืจื™ื– ื‘ืขืจื›ื• ื”ืžืขืฉื™ ืฉืœ Spark Streaming ืœื™ืฆื™ืจืช ืกื˜ืจื™ืžื™ื ื’ ื•ืืคืœื™ืงืฆื™ื•ืช ื‘ื–ืžืŸ ืืžืช.

ืืชื” ื™ื›ื•ืœ ืœืžืฆื•ื ืืช ืงื•ื“ ื”ืžืงื•ืจ ื”ืžืœื ื‘ืžืื’ืจ ืฉืœื™ ื‘ื›ืชื•ื‘ืช GitHub.

ืื ื™ ืฉืžื— ืœื“ื•ืŸ ื‘ืžืืžืจ ื–ื”, ืื ื™ ืžืฆืคื” ืœื”ืขืจื•ืชื™ื›ื, ื•ืžืงื•ื•ื” ื’ื ืœื‘ื™ืงื•ืจืช ื‘ื•ื ื” ืžื›ืœ ื”ืงื•ืจืื™ื ื”ืื›ืคืชื™ื™ื.

ืื ื™ ืžืื—ืœืช ืœื›ื ื”ืฆืœื—ื”!

ืชื”ืœื™ื. ื‘ืชื—ื™ืœื” ืชื•ื›ื ืŸ ืœื”ืฉืชืžืฉ ื‘ืžืกื“ ื ืชื•ื ื™ื ืžืงื•ืžื™ PostgreSQL, ืืš ืœืื•ืจ ืื”ื‘ืชื™ ืœ-AWS, ื”ื—ืœื˜ืชื™ ืœื”ืขื‘ื™ืจ ืืช ืžืกื“ ื”ื ืชื•ื ื™ื ืœืขื ืŸ. ื‘ืžืืžืจ ื”ื‘ื ื‘ื ื•ืฉื ื–ื”, ืืจืื” ื›ื™ืฆื“ ืœื™ื™ืฉื ืืช ื›ืœ ื”ืžืขืจื›ืช ื”ืžืชื•ืืจืช ืœืขื™ืœ ื‘-AWS ื‘ืืžืฆืขื•ืช AWS Kinesis ื•-AWS EMR. ืขืงื‘ื• ืื—ืจ ื”ื—ื“ืฉื•ืช!

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”