ืืขืื, ืืืืจ! ืืืึทื ื ืืืจ ืืืขืื ืืืืขื ืึท ืกืืกืืขื ืืืึธืก ืืืขื ืคึผืจืึธืฆืขืก Apache Kafka ืึธื ืืึธื ืกืืจืืื ื ืืฆื Spark Streaming ืืื ืฉืจืืึทืื ืื ืคึผืจืึทืกืขืกืื ื ืจืขืืืืืึทืื ืฆื ืื AWS RDS ืืืึธืืงื ืืึทืืึทืืืืก.
ืืื ืก ืืืึทืืืฉืึทื ืึทื ืึท ืืืืขืจ ืงืจืขืืื ืื ืกืืืืืฉืึทื ืฉืืขืื ืืื ืื ืื ืึทืจืืขื ืคืื ืคึผืจืึทืกืขืกืื ื ืื ืงืึทืืื ื ืืจืึทื ืืึทืงืฉืึทื ื "ืืืืฃ ืื ืคืืืขื" ืึทืจืืืขืจ ืึทืืข ืคืื โโืืืึทื ืฆืืืืืื. ืืขื ืงืขื ืขื ืืืื ืืขืืื ืคึฟืึทืจ ืื ืฆืื ืคืื ืคึผืื ืงื ืงืึทืืงืืึทืืืืืื ื ืึท ืขืคืขื ืขื ืงืจืึทื ืืงืืึทื ืฉืืขืืข ืคึฟืึทืจ ืื ืฉืึทืฆืงืึทืืขืจ, ืืืืึทืฅ ืึธืืขืจ ืคืื ืึทื ืฆืืขื ืจืขืืืืืึทืื ืคึฟืึทืจ ืืจืึทื ืืึทืงืฉืึทื ื, ืืื"ื ื.
ืืื ืฆื ืื ืกืืจืืืขื ื ืืขื ืคืึทื ืึธื ืื ื ืืฆื ืคืื ืืึทืืืฉ ืืื ืืึทืืืฉ ืกืคึผืขืื - ืืืืขื ืขื ืืื ืืขืจ ืื ืฉื ืืึทืื! ืืื!
ืืงืืื
ืคืื ืงืืจืก, ืคึผืจืึทืกืขืกืื ื ืึท ืืจืืืก ืกืืืข ืคืื โโืืึทืื ืืื ืคืึทืงืืืฉ ืฆืืื ืืื ืืขื ืืืืง ืึทืคึผืขืจืืื ืึทืืื ืคึฟืึทืจ ื ืืฆื ืืื ืืึธืืขืจื ืกืืกืืขืืขื. ืืืื ืขืจ ืคืื ืื ืืขืจืกื ืคืึธืืงืก ืงืึทืืืึทื ืืืฉืึทื ื ืคึฟืึทืจ ืืขื ืืื ืื ืืึทื ืืึทื ืคืื Apache Kafka ืืื Spark Streaming, ืืื Kafka ืงืจืืืืฅ ืึท ืืืึทื ืคืื ืื ืงืึทืืื ื ืึธื ืืึธื ืคึผืึทืงืืฅ, ืืื Spark Streaming ืคึผืจืึทืกืขืกืึทื ืื ืคึผืึทืงืืฅ ืืื ืึท ืืขืืขืื ืฆืืื ืืขืืึทืืขื.
ืฆื ืคืึทืจืืจืขืกืขืจื ืื ืฉืืื ืืึธืืขืจืึทื ืฅ ืคืื ืื ืึทืคึผืืึทืงืืืฉืึทื, ืืืจ ืืืขืื ื ืืฆื ืืฉืขืงืคึผืืื ืฅ. ืืื ืืขื ืืขืงืึทื ืืืึทื, ืืืขื ืื Spark Streaming ืืึธืืึธืจ ืืึทืจืฃ ืฆืืจืืงืงืจืืื ืคืึทืจืคืึทืื ืืึทืื, ืขืก ื ืึธืจ ืืึทืจืฃ ืฆื ืืืื ืฆืืจืืง ืฆื ืื ืืขืฆืืข ืืฉืขืงืคึผืืื ื ืืื ื ืขืืขื ืืื ืืืืืขืจ ืืฉืืื ืืช ืคืื ืืึธืจื.
ืึทืจืงืึทืืขืงืืฉืขืจ ืคืื ืื ืืขืืืขืืึธืคึผืขื ืกืืกืืขื
ืืื ืืฆื ืงืึทืืคึผืึธืื ืึทื ืฅ:
ืึทืคึผืึทืืฉื ืงืึทืคืงืึท ืืื ืึท ืคืื ืื ืืขืจืืขืืืืื ืึทืจืืืกืืขืื-ืึทืืึธื ืืจื ืืขืกืืืืฉืื ื ืกืืกืืขื. ืคึผืึทืกืืง ืคึฟืึทืจ ืืืืืข ืึธืคืคืืื ืข ืืื ืึธื ืืืื ืึธื ืืึธื ืงืึทื ืกืึทืืฉืึทื. ืฆื ืคืึทืจืืืึทืื ืืึทืื ืึธื ืืืขืจ, Kafka ืึทืจืืืงืืขื ืืขื ืขื ืกืืึธืจื ืืืืฃ ืืืกืง ืืื ืจืขืคึผืืืงืืืืื ืืื ืืขื ืงื ืืื. ืื ืงืึทืคืงืึท ืกืืกืืขื ืืื ืืขืืืื ืืืืฃ ืฉืคึผืืฅ ืคืื ืื ZooKeeper ืกืื ืืงืจืึทื ืึทืืืืฉืึทื ืืื ืกื;ืึทืคึผืึทืืฉื ืกืคึผืึทืจืง ืกืืจืืืื ื - ืกืคึผืึทืจืง ืงืึธืืคึผืึธื ืขื ื ืคึฟืึทืจ ืคึผืจืึทืกืขืกืื ื ืกืืจืืืื ื ืืึทืื. ืื Spark Streaming ืืึธืืืืข ืืื ืืขืืืื ืืื ืึท ืืืงืจืึธ-ืคึผืขืงื ืึทืจืงืึทืืขืงืืฉืขืจ, ืืื ืื ืืึทืื ืืืึทื ืืื ืื ืืขืจืคึผืจืึทืืึทื ืืื ืึท ืงืขืกืืืืขืจืืืง ืกืืงืืืึทื ืก ืคืื ืงืืืื ืืึทืื ืคึผืึทืงืืฅ. Spark Streaming ื ืขืื ืืึทืื ืคืื ืคืึทืจืฉืืืขื ืข ืงืืืืื ืืื ืงืึทืืืืื ื ืขืก ืืื ืงืืืื ืคึผืึทืงืึทืืืฉืึทื. ื ืื ืคึผืึทืงืึทืืืฉืึทื ืืขื ืขื ืืืฉืืคื ืืื ืจืขืืืืขืจ ืื ืืขืจืืืึทืื. ืืื ืื ืึธื ืืืื ืคืื ืืขืืขืจ ืฆืืื ืืขืืึทืืขื, ืึท ื ืืึทืข ืคึผืึทืงืึทื ืืื ืืืฉืืคื, ืืื ืงืืื ืืึทืื ืืืงืืืขื ืืขืฉืึทืก ืืขื ืืขืืึทืืขื ืืื ืึทืจืืึทื ืืขืจืขืื ื ืืื ืื ืคึผืึทืงืึทื. ืืื ืื ืกืืฃ ืคืื ืื ืืขืืึทืืขื, ืคึผืึทืงืึทื ืืจืึธืื ืกืืึทืคึผืก. ืื ืืจืืืก ืคืื ืืขื ืื ืืขืจืืืึทื ืืื ืืืฉืืืกื ืืืจื ืึท ืคึผืึทืจืึทืืขืืขืจ ืืขืจืืคื ืื ืคึผืขืงื ืืขืืึทืืขื;ืึทืคึผืึทืืฉื ืกืคึผืึทืจืง ืกืงื - ืงืึทืืืืื ื ืจืืืืืฉืึทื ืึทื ืคึผืจืึทืกืขืกืื ื ืืื Spark ืคืึทื ืืงืฉืึทื ืึทื ืคึผืจืึธืืจืึทืืืื ื. ืกืืจืึทืงืืฉืขืจื ืืึทืื ืืืื ืืึทืื ืืืึธืก ืืึธืื ืึท ืกืืฉืขืืึท, ืืึธืก ืืื, ืึท ืืืื ืืึทื ื ืคืื ืคืขืืืขืจ ืคึฟืึทืจ ืึทืืข ืจืขืงืึธืจืืก. Spark SQL ืฉืืืฆื ืึทืจืืึทื ืฉืจืืึทื ืคึฟืื ืคืึทืจืฉืืื ืกืืจืึทืงืืฉืขืจื ืืึทืื ืงืืืืื ืืื, ืืึทื ืง ืฆื ืื ืึทืืืืืืึทืืืืึทืื ืคืื ืกืืฉืขืืึท ืืื ืคึฟืึธืจืืึทืฆืืข, ืขืก ืงืขื ืขื ืืคืืฉืึทื ืืื ืฆืืจืืงืงืจืืื ืืืืื ืื ืคืืจืืื ืื ืคืขืื ืคืื ืจืขืงืึธืจืืก, ืืื ืืืื ืืื ืืึทืืึทืคืจืึทืืข ืึทืคึผืืก;AWS RDS ืืื ืึท ืืขืคืืขืจืขื ืืืืืง ืืืึธืืงื-ืืืืืจื ืจืืืืืฉืึทื ืึทื ืืึทืืึทืืืืก, ืืืขื ืกืขืจืืืืก ืืืึธืก ืกืืืคึผืืึทืคืืื ืกืขืืึทืคึผ, ืึธืคึผืขืจืึทืฆืืข ืืื ืกืงืืืืื ื, ืืื ืืื ืึทืืืื ืึทืกืืขืจื ืืืืึทื ืืืจื ืึทืืึทืืึธื.
ืื ืกืืึทืืืจื ืืื ืืืืคื ืื Kafka ืกืขืจืืืขืจ
ืืืืืขืจ ืืืจ ื ืืฆื Kafka ืืืืื, ืืืจ ืืึทืจืคึฟื ืฆื ืืึทืื ืืืืขืจ ืึทื ืืืจ ืืึธืื Java, ืืืืึทื ... JVM ืืื ืืขื ืืฆื ืคึฟืึทืจ ืึทืจืืขื:
sudo apt-get update
sudo apt-get install default-jre
java -version
ืืึธืืืจ ืฉืึทืคึฟื ืึท ื ืืึทืข ืืึทื ืืฆืขืจ ืฆื ืึทืจืืขืื ืืื Kafka:
sudo useradd kafka -m
sudo passwd kafka
sudo adduser kafka sudo
ืืขืจื ืึธื ืืจืืคืงืืคืืข ืื ืคืึทืจืฉืคึผืจืืืืื ื ืคึฟืื ืืขืจ ืืึทืึทืืืขืจ Apache Kafka ืืืขืืืืืื:
wget -P /YOUR_PATH "http://apache-mirror.rbc.ru/pub/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz"
ืืคืฉืืืกื ืื ืืึทืื ืืึธืืืื ืึทืจืงืืืื:
tar -xvzf /YOUR_PATH/kafka_2.12-2.2.0.tgz
ln -s /YOUR_PATH/kafka_2.12-2.2.0 kafka
ืืขืจ ืืืืึทืืขืจ ืฉืจืื ืืื ืึทืคึผืฉืึทื ืึทื. ืืขืจ ืคืึทืงื ืืื ืึทื ืื ืคืขืืืงืืึทื ืกืขืืืื ืืก ืืึธื ื ืื ืืึธืื ืืืจ ืฆื ื ืืฆื ืึทืืข ืื ืงืืืคึผืึทืืืืึทืืื ืคืื Apache Kafka. ืคึฟืึทืจ ืืืึทืฉืคึผืื, ืืืกืืขืงื ืึท ืืขืืข, ืงืึทืืขืืึธืจืืข, ืืจืืคึผืข ืฆื ืืืึธืก ืึทืจืืืงืืขื ืงืขื ืขื ืืืื ืืจืืืก. ืฆื ืืืืฉื ืืขื, ืืึธืื ืืื ืื ืจืขืืึทืืืจื ืื ืงืึทื ืคืืืืขืจืืืฉืึทื ืืขืงืข:
vim ~/kafka/config/server.properties
ืืืื ืื ืคืืืืขื ืืข ืฆื ืื ืกืืฃ ืคืื ืืขืจ ืืขืงืข:
delete.topic.enable = true
ืืืืืขืจ ืืืจ ืึธื ืืืืื ืื Kafka ืกืขืจืืืขืจ, ืืืจ ืืึทืจืคึฟื ืฆื ืึธื ืืืืื ืืขื ZooKeeper ืกืขืจืืืขืจ; ืืืจ ืืืขืื ื ืืฆื ืื ืึทืืืืืืขืจื ืฉืจืืคื ืืืึธืก ืงืืื ืืื ืื Kafka ืคืึทืจืฉืคึผืจืืืืื ื:
Cd ~/kafka
bin/zookeeper-server-start.sh config/zookeeper.properties
ื ืึธื ืืขืจ ืืฆืืื ืคืื ZooKeeper, ืงืึทืืขืจ ืื Kafka ืกืขืจืืืขืจ ืืื ืึท ืืึทืืื ืืขืจ ืืืึธืงืืึทื:
bin/kafka-server-start.sh config/server.properties
ืืึธืืืจ ืฉืึทืคึฟื ืึท ื ืืึทืข ืืขืืข ืืขืจืืคื ืืจืึทื ืกืึทืงืืืึธื:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic transaction
ืืึธืืืจ ืืึทืื ืืืืขืจ ืึทื ืึท ืืขืืข ืืื ืื ืคืืจืืื ืื ื ืืืขืจ ืคืื ืคึผืึทืจืืืฉืึทื ื ืืื ืจืขืคึผืืึทืงืืืฉืึทื ืืื ืืืฉืืคื:
bin/kafka-topics.sh --describe --zookeeper localhost:2181
ืืื ืก ืคืึทืจืคืืจื ืื ืืึธืืืึทื ืฅ ืคืื ืืขืกืืื ื ืื ืคึผืจืึธืืืฆืืจืขืจ ืืื ืงืึทื ืกืืืขืจ ืคึฟืึทืจ ืื ื ืื ืืืฉืืคื ืืขืืข. ืืขืจ ืืขืืึทืืืก ืืืขืื ืืื ืืืจ ืงืขื ืขื ืคึผืจืืืืจื ืฉืืงื ืืื ืจืืกืืืืื ื ืึทืจืืืงืืขื ืืขื ืขื ืืขืฉืจืืื ืืื ืืขืจ ืืึทืึทืืืขืจ ืืึทืงืืืืขื ืืืืฉืึทื -
ืคึผืจืึธืืืฆืืจืขืจ ืฉืจืืืื
ืืขืจ ืคึผืจืึธืืืฆืืจืขืจ ืืืขื ืืืฉืขื ืขืจืืื ืืจืึทืค - 100 ืึทืจืืืงืืขื ืืขืืขืจ ืจืืข. ืืื ืจืึทื ืืึธื ืืึทืื ืืืจ ืืืื ืขื ืึท ืืืขืจืืขืจืืื ืืืึธืก ืืืฉืืืื ืคืื ืืจืื ืคืขืืืขืจ:
- ืฆืืืืึทื - ื ืึธืืขื ืคืื ืื ืงืจืขืืื ืื ืกืืืืืฉืึทื ืก ืคืื ื ืคืื ืคืึทืจืงืืืฃ;
- ืงืจืึทื ืืงืืึทื - ืืจืึทื ืกืึทืงืืืึธื ืงืจืึทื ืืงืืึทื;
- ืกืืืข - ืืจืึทื ืกืึทืงืืืึธื ืกืืืข. ืื ืกืืืข ืืืขื ืืืื ืึท positive ื ืืืขืจ ืืืื ืขืก ืืื ืึท ืงืืืคื ืคืื ืงืจืึทื ืืงืืึทื ืืืจื ืื ืืึทื ืง, ืืื ืึท ื ืขืืึทืืืื ื ืืืขืจ ืืืื ืขืก ืืื ืึท ืคืึทืจืงืืืฃ.
ืืขืจ ืงืึธื ืคึฟืึทืจ ืื ืคึผืจืึธืืืฆืืจืขืจ ืงืืงื ืืื ืืึธืก:
from numpy.random import choice, randint
def get_random_value():
new_dict = {}
branch_list = ['Kazan', 'SPB', 'Novosibirsk', 'Surgut']
currency_list = ['RUB', 'USD', 'EUR', 'GBP']
new_dict['branch'] = choice(branch_list)
new_dict['currency'] = choice(currency_list)
new_dict['amount'] = randint(-100, 100)
return new_dict
ืืขืจื ืึธื, ืืื ืื ืฉืืงื ืืืคึฟื, ืืืจ ืฉืืงื ืึท ืึธื ืืึธื ืฆื ืื ืกืขืจืืืขืจ, ืฆื ืืขืจ ืืขืืข ืืืึธืก ืืืจ ืืึทืจืคึฟื, ืืื JSON ืคึฟืึธืจืืึทื:
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
value_serializer=lambda x:dumps(x).encode('utf-8'),
compression_type='gzip')
my_topic = 'transaction'
data = get_random_value()
try:
future = producer.send(topic = my_topic, value = data)
record_metadata = future.get(timeout=10)
print('--> The message has been sent to a topic:
{}, partition: {}, offset: {}'
.format(record_metadata.topic,
record_metadata.partition,
record_metadata.offset ))
except Exception as e:
print('--> It seems an Error occurred: {}'.format(e))
finally:
producer.flush()
ืืืขื ืคืืืกื ืืืง ืืขื ืฉืจืืคื, ืืืจ ืืึทืงืืืขื ืื ืคืืืืขื ืืข ืึทืจืืืงืืขื ืืื ืื ืืืึธืงืืึทื:
ืืึธืก ืืืื ื ืึทื ืึทืืฅ ืึทืจืืขื ืืื ืืืจ ืืขืืืืื - ืืขืจ ืคึผืจืึธืืืฆืืจืขืจ ืืืฉืขื ืขืจืืืฅ ืืื ืกืขื ืื ืึทืจืืืงืืขื ืฆื ืื ืืขืืข ืืืึธืก ืืืจ ืืึทืจืคึฟื.
ืืขืจ ืืืืึทืืขืจ ืฉืจืื ืืื ืฆื ืื ืกืืึทืืืจื ืกืคึผืึทืจืง ืืื ืคึผืจืึธืฆืขืก ืืขื ืึธื ืืึธื ืืืึทื.
ืื ืกืืึทืืืจื Apache Spark
ืึทืคึผืึทืืฉื ืกืคึผืึทืจืง ืืื ืึท ืื ืืืืขืจืกืึทื ืืื ืืืื-ืคืึธืจืฉืืขืืื ื ืงื ืืื ืงืึทืืคึผืืืืื ื ืคึผืืึทืืคืึธืจืืข.
ืกืคึผืึทืจืง ืคึผืขืจืคืึธืจืื ืืขืกืขืจ ืืื ืคืึธืืงืก ืืืคึผืืึทืืึทื ืฅ ืคืื ืื MapReduce ืืึธืืขื, ืืฉืขืช ืขืก ืฉืืืฆื ืึท ืืจืืื ืงืืื ืคืื ืงืึทืืคึผืืืืื ื ืืืืคึผืก, ืึทืจืืึทื ืืขืจืขืื ื ืื ืืขืจืึทืงืืืื ืคึฟืจืืื ืืื ืกืืจืื ืคึผืจืึทืกืขืกืื ื. ืกืคึผืื ืคืืขืกืขืก ืึท ืืืืืืืง ืจืึธืืข ืืืขื ืคึผืจืึทืกืขืกืื ื ืืจืืืก ืึทืืึทืื ืฅ ืคืื ืืึทืื, ืืืืึทื ืืึธืก ืืื ืื ืืืืงืืึทื ืืืึธืก ืึทืืึทืื ืืืจ ืฆื ืึทืจืืขืื ืื ืืขืจืึทืงืืืืืื ืึธื ืกืคึผืขื ืืื ื ืืื ืื ืึธืืขืจ ืฉืขื ืืืืจืื. ืืืื ืขืจ ืคืื ืกืคึผืึทืจืง ืก ืืืืึทืกื ืกืืจืขื ืืงืืก ืืืึธืก ืืืื ืขืก ืึทืืื ืฉื ืขื ืืื ืื ืคืืืืงืืื ืฆื ืืืจืืคืืจื ืืืงืึธืจื ืืฉืืื ืืช.
ืืขืจ ืคืจืืืืืืขืจืง ืืื ืืขืฉืจืืื ืืื ืกืงืึทืืึท, ืึทืืื ืืืจ ืืึทืจืคึฟื ืฆื ืื ืกืืึทืืืจื ืขืก ืขืจืฉืืขืจ:
sudo apt-get install scala
ืืจืืคืงืืคืืข ืื Spark ืคืึทืจืฉืคึผืจืืืืื ื ืคืื ืืขืจ ืืึทืึทืืืขืจ ืืืขืืืืืื:
wget "http://mirror.linux-ia64.org/apache/spark/spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz"
ืึธืคึผืืืืื ืืขื ืึทืจืงืืืื:
sudo tar xvf spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz -C /usr/local/spark
ืืืื ืืขื ืืจื ืฆื Spark ืฆื ืื ืืึทืฉ ืืขืงืข:
vim ~/.bashrc
ืืืื ืื ืคืืืืขื ืืข ืฉืืจืืช ืืืจื ืืขื ืจืขืืึทืงืืึธืจ:
SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin:$PATH
ืืืืคื ืื ืืึทืคึฟืขื ืืื ืื ื ืึธื ืืืื ืขื ืืขืจืื ืืขื ืฆื bashrc:
source ~/.bashrc
ืืืคึผืืืืื ื AWS PostgreSQL
ืึทืืข ืืืึธืก ืืืืืื ืืื ืฆื ืฆืขืืืืงืืขื ืื ืืึทืืึทืืืืก ืืื ืืืึธืก ืืืจ ืืืขืื ืฆืืคึฟืขืืืงืขืจ ืื ืคึผืจืึทืกืขืกื ืืื ืคึฟืึธืจืืึทืฆืืข ืคึฟืื ืื ืกืืจืืื. ืคึฟืึทืจ ืืขื ืืืจ ืืืขืื ื ืืฆื ืื AWS RDS ืืื ืกื.
ืืืื ืฆื ืื AWS ืงืึทื ืกืึธืื -> AWS RDS -> ืืึทืืึทืืืืกืื -> ืฉืึทืคึฟื ืืึทืืึทืืืืก:
ืืืืกืงืืืึทืื PostgreSQL ืืื ืืื ืืืืึทืืขืจ:
ืืืืึทื ืืขืจ ืืืืฉืคึผืื ืืื ืืืืื ืคึฟืึทืจ ืืืืืื ืืงืจืืื ืฆืืืขืงื; ืืืจ ืืืขืื ื ืืฆื ืึท ืคืจืื ืกืขืจืืืขืจ "ืืื ืืื ืืืื" (ืคืจืื ืจืื):
ืืขืจื ืึธื, ืืืจ ืฉืืขืื ืึท ืืืงืขื ืืื ืื Free Tier ืืืึธืง, ืืื ืืขืจื ืึธื ืืืจ ืืืขื ืืืื ืืืืืึธืืึทืืืฉ ืืขืคึฟืื ื ืึท ืืืึทืฉืคึผืื ืคืื ืื t2.micro ืงืืึทืก - ืืึธืืฉ ืฉืืืึทื, ืขืก ืืื ืคืจืื ืืื ืืึทื ืฅ ืคึผืึทืกืืง ืคึฟืึทืจ ืืื ืืืขืจ ืึทืจืืขื:
ืืืืึทืืขืจ ืงืืืขื ืืืืขืจ ืืืืืืืง ืืืื: ืื ื ืึธืืขื ืคืื ืื ืืึทืืึทืืืืก ืืืึทืฉืคึผืื, ืื ื ืึธืืขื ืคืื ืื ืืขื ืืึทื ืืฆืขืจ ืืื ืืืื ืคึผืึทืจืึธื. ืืื ืก ื ืึธืืขื ืืขื ืืืึทืฉืคึผืื: myHabrTest, ืืขื ืืึทื ืืฆืขืจ: habr, ืคึผืึทืจืึธื: habr12345 ืืื ืืื ืืืืฃ ืื ืืืืึทืืขืจ ืงื ืขืคึผื:
ืืืืฃ ืืขืจ ืืืืึทืืขืจ ืืืึทื ืขืก ืืขื ืขื ืคึผืึทืจืึทืืขืืขืจืก ืคืึทืจืึทื ืืืืึธืจืืืขื ืคึฟืึทืจ ืื ืึทืงืกืขืกืึทืืืืืื ืคืื ืืื ืืืขืจ ืืึทืืึทืืืืก ืกืขืจืืืขืจ ืคืื ืื ืึทืจืืืก (ืฆืืืืจ ืึทืงืกืขืกืึทืืืืืื) ืืื ืคึผืึธืจื ืึทืืืืืืึทืืืืึทืื:
ืืึธืืืจ ืืึทืื ืึท ื ืืึทืข ืืึทืฉืืขืืืงื ืคึฟืึทืจ ืื VPC ืืืืขืจืืืื ืืจืืคึผืข, ืืืึธืก ืืืขื ืืึธืื ืคืื ืืจืืืกื ืืืง ืึทืงืกืขืก ืฆื ืืื ืืืขืจ ืืึทืืึทืืืืก ืกืขืจืืืขืจ ืืืจื ืคึผืึธืจื 5432 (PostgreSQL).
ืืึธืืืจ ืืืื ืฆื ืื AWS ืงืึทื ืกืึธืื ืืื ืึท ืืึทืืื ืืขืจ ืืืขืืขืจืขืจ ืคึฟืขื ืฆืืขืจ ืฆื ืื VPC ืืึทืฉืืึธืจื -> ืืืืขืจืืืื ืืจืืคึผืขืก -> ืฉืึทืคึฟื ืืืืขืจืืืื ืืจืืคึผืข ืึธืคึผืืืืืื ื:
ืืืจ ืฉืืขืื ืืขื ื ืึธืืขื ืคึฟืึทืจ ืื ืืืืขืจืืืื ืืจืืคึผืข - PostgreSQL, ืึท ืืึทืฉืจืืึทืืื ื, ืึธื ืืืืึทืื ืืื ืืืึธืก ืืืคึผืง ืื ืืจืืคึผืข ืืึธื ืืืื ืคืืจืืื ืื ืืื ืืื ืืื ืื ืฉืึทืคึฟื ืงื ืขืคึผื:
ืคึผืืึธืืืืจื ืื ืื ืืึทืื ื ืึผืืืื ืคึฟืึทืจ ืคึผืึธืจื 5432 ืคึฟืึทืจ ืื ื ืื ืืืฉืืคื ืืจืืคึผืข, ืืื ืืขืืืืื ืืื ืื ืืืื ืืื ืื. ืืืจ ืงืขื ื ื ืืฉื ืกืคึผืขืฆืืคืืฆืืจื ืื ืคึผืึธืจื ืืึทื ืืืึทืื, ืึธืืขืจ ืกืขืืขืงืืืจื PostgreSQL ืคืื ืื ืืืคึผ ืืจืึธืคึผ-ืึทืจืึธืคึผ ืจืฉืืื.
ืฉืืจืขื ื ืืขืจืขืื, ืื ืืืขืจื :: / 0 ืืืื ืื ืึทืืืืืืึทืืืืึทืื ืคืื ืื ืงืึทืืื ื ืคืึทืจืงืขืจ ืฆื ืื ืกืขืจืืืขืจ ืคืื ืึทืืข ืืืืขืจ ืื ืืืขืื, ืืืึธืก ืืื ืงืึทื ืึทื ืึทืงืื ื ืืฉื ืืขืืึทืืจืข ืืืช, ืึธืืขืจ ืฆื ืึทื ืึทืืืื ืื ืืืืฉืคึผืื, ืืึธืื ืืื ืื ืืึธืื ืืื ืฆื ื ืืฆื ืืขื ืฆืืืึทื ื:
ืืืจ ืฆืืจืืงืงืืืขื ืฆื ืื ืืืขืืขืจืขืจ ืืืึทื, ืืื ืืืจ ืืึธืื "ืงืึธื ืคืืืืจืข ืึทืืืึทื ืกืืจืืข ืกืขืืืื ืืก" ืขืคืขื ืขื ืืื ืกืขืืขืงืืืจื ืืื ืื VPC ืืืืขืจืืืื ืืจืืคึผืขืก ืึธืคึผืืืืืื ื -> ืงืืืึทืื ืืืืืกืืื ื VPC ืืืืขืจืืืื ืืจืืคึผืขืก -> PostgreSQL:
ืืืืึทืืขืจ, ืืื ืื ืืึทืืึทืืึทืกืข ืึธืคึผืฆืืขืก -> ืืึทืืึทืืึทืกืข ื ืึธืืขื -> ืฉืืขืื ืืขื ื ืึธืืขื - habrDB.
ืืืจ ืงืขื ืขื ืืึธืื ืื ืจืืขื ืคึผืึทืจืึทืืขืืขืจืก, ืืื ืื ืืืกื ืขื ืคืื ืืืกืืืืึทืืื ื ืืึทืงืึทืคึผ (ืืึทืืึทืืื ืจืืืขื ืฉืึทื ืฆืืึทื - 0 ืืขื), ืืึธื ืืืึธืจืื ื ืืื ืคืึธืจืฉืืขืืื ื ืื ืกืืืฅ, ืืืจื ืคืขืืืงืืึทื. ืืจืืงื ืืขื ืงื ืขืคึผื ืฉืึทืคึฟื ืืืืืึทืืืืก:
ืคึฟืึธืืขื ืืึทื ืืืขืจ
ืื ืืขืฆืืข ืืื ืข ืืืขื ืืืื ืื ืึทื ืืืืืงืืื ื ืคืื ืึท ืกืคึผืึทืจืง ืึทืจืืขื, ืืืึธืก ืืืขื ืคึผืจืึธืฆืขืก ื ืืึทืข ืืึทืื ืืืึธืก ืงืืืขื ืคึฟืื Kafka ืืขืืขืจ ืฆืืืื ืกืขืงืื ืืขืก ืืื ืึทืจืืึทื ืื ืจืขืืืืืึทื ืืื ืื ืืึทืืึทืืืืก.
ืืื ืืขืจืืื ื ืืืืื, ืืฉืขืงืคึผืืื ืฅ ืืขื ืขื ืึท ืืึทืจืฅ ืืขืงืึทื ืืืึทื ืืื SparkStreaming ืืืึธืก ืืืื ืืืื ืงืึทื ืคืืืืขืจื ืฆื ืขื ืฉืืจ ืฉืืื ืืึธืืขืจืึทื ืฅ. ืืืจ ืืืขืื ื ืืฆื ืืฉืขืงืคึผืืื ืฅ ืืื, ืืืื ืื ืคึผืจืึธืฆืขืืืจ ืคืืืื, ืื Spark Streaming ืืึธืืืืข ืืืขื ื ืึธืจ ืืึทืจืคึฟื ืฆื ืฆืืจืืงืงืืืขื ืฆื ืื ืืขืฆืืข ืืฉืขืงืคึผืืื ื ืืื ื ืขืืขื ืืื ืืืืืขืจ ืืฉืืื ืืช ืืขืจืคืื ืฆื ืฆืืจืืงืงืจืืื ืื ืคืึทืจืคืึทืื ืืึทืื.
ืืฉืขืงืคึผืืื ื ืงืขื ืขื ืืืื ืขื ืืืืึทืื ืืืจื ืืึทืฉืืขืืืงื ืึท ืืืขืืืืืึทืืขืจ ืืืืฃ ืึท ืฉืืื-ืืึธืืขืจืึทื ื, ืคืึทืจืืึธืืืขื ืืขืงืข ืกืืกืืขื (ืึทืืึท ืืื HDFS, S3, ืืื"ื ื) ืืื ืืืึธืก ืื ืืฉืขืงืคึผืืื ื ืืื ืคึฟืึธืจืืึทืฆืืข ืืืขื ืืืื ืกืืึธืจื. ืืึธืก ืืื ืืขืืื ืืื, ืืืฉื:
streamingContext.checkpoint(checkpointDirectory)
ืืื ืืื ืืืขืจ ืืืึทืฉืคึผืื, ืืืจ ืืืขืื ื ืืฆื ืื ืคืืืืขื ืืข ืฆืืืึทื ื, ื ืืืืื ืืืื ืืฉืขืงืคึผืืื ืืืืจืขืงืืึธืจื ืืืืืกืฅ, ืืขืจ ืงืึธื ืืขืงืกื ืืืขื ืืืื ืจืืงืจืืืืืื ืคึฟืื ืื ืืฉืขืงืคึผืืื ื ืืึทืื. ืืืื ืืขืจ ืืืขืืืืืึทืืขืจ ืืื ื ืืฉื ืขืงืกืืกืืืจื (ื"ื ืขืงืกืึทืงืืืืึทื ืคึฟืึทืจ ืื ืขืจืฉืืขืจ ืืึธื), ืืขืืึธืื functionToCreateContext ืืื ืืขืจืืคื ืฆื ืฉืึทืคึฟื ืึท ื ืืึทืข ืงืึธื ืืขืงืกื ืืื ืงืึทื ืคืืืืขืจ DStreams:
from pyspark.streaming import StreamingContext
context = StreamingContext.getOrCreate(checkpointDirectory, functionToCreateContext)
ืืืจ ืืึทืื ืึท DirectStream ืืืืคืขืฅ ืฆื ืคืึทืจืืื ืื ืฆื ืื "ืืจืึทื ืืึทืงืฉืึทื" ืืขืืข ื ืืฆื ืื createDirectStream ืืืคึฟื ืคืื ืื KafkaUtils ืืืืืืึธืืขืง:
from pyspark.streaming.kafka import KafkaUtils
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 2)
broker_list = 'localhost:9092'
topic = 'transaction'
directKafkaStream = KafkaUtils.createDirectStream(ssc,
[topic],
{"metadata.broker.list": broker_list})
ืคึผืึทืจืกืื ื ืื ืงืึทืืื ื ืืึทืื ืืื JSON ืคึฟืึธืจืืึทื:
rowRdd = rdd.map(lambda w: Row(branch=w['branch'],
currency=w['currency'],
amount=w['amount']))
testDataFrame = spark.createDataFrame(rowRdd)
testDataFrame.createOrReplaceTempView("treasury_stream")
ื ืืฆื Spark SQL, ืืืจ ืืึทืื ืึท ืคึผืฉืื ืืจืืคึผืื ื ืืื ืืืืึทืื ืื ืจืขืืืืืึทื ืืื ืื ืงืึทื ืกืึธืื:
select
from_unixtime(unix_timestamp()) as curr_time,
t.branch as branch_name,
t.currency as currency_code,
sum(amount) as batch_value
from treasury_stream t
group by
t.branch,
t.currency
ืืึทืงืืืขื ืื ืึธื ืคึฟืจืขื ืืขืงืกื ืืื ืืืืคื ืขืก ืืืจื Spark SQL:
sql_query = get_sql_query()
testResultDataFrame = spark.sql(sql_query)
testResultDataFrame.show(n=5)
ืืื ืืขืืึธืื ืืืจ ืจืึทืืขืืืขื ืื ืจืืืึทืืืื ื ืึทืืืจืขืืึทืืขื ืืึทืื ืืื ืึท ืืืฉ ืืื AWS RDS. ืฆื ืจืึทืืขืืืขื ืื ืึทืืืจืขืืึทืืืึธื ืจืขืืืืืึทืื ืฆื ืึท ืืึทืืึทืืึทืกืข ืืืฉ, ืืืจ ืืืขืื ื ืืฆื ืื ืฉืจืืึทืื ืืืคึฟื ืคืื ืื DataFrame ืืืืคืขืฅ:
testResultDataFrame.write
.format("jdbc")
.mode("append")
.option("driver", 'org.postgresql.Driver')
.option("url","jdbc:postgresql://myhabrtest.ciny8bykwxeg.us-east-1.rds.amazonaws.com:5432/habrDB")
.option("dbtable", "transaction_flow")
.option("user", "habr")
.option("password", "habr12345")
.save()
ืขืืืขืืข ืืืขืจืืขืจ ืืืขืื ืืึทืฉืืขืืืงื ืึท ืงืฉืจ ืฆื AWS RDS. ืืืจ ืืืฉืืคื ืืขื ืืึทื ืืฆืขืจ ืืื ืคึผืึทืจืึธื ืคึฟืึทืจ ืขืก ืืื ืื "ืืืคึผืืืืื ื AWS PostgreSQL" ืฉืจืื. ืืืจ ืืึธื ื ืืฆื ืขื ืืคึผืืื ื ืืื ืื ืืึทืืึทืืืืก ืกืขืจืืืขืจ URL, ืืืึธืก ืืื ืืขืืืืื ืืื ืื ืงืึทื ืขืงืืืืืืื ืืื ืืืืขืจืืืื ืึธืคึผืืืืืื ื:
ืืื ืกืืจ ืฆื ืจืืืืืง ืคืึทืจืืื ืื ืกืคึผืึทืจืง ืืื ืงืึทืคืงืึท, ืืืจ ืืึธื ืืืืคื ืื ืึทืจืืขื ืืืจื smark-submit ื ืืฆื ืื ืึทืจืืึทืคืึทืงื spark-streaming-kafka-0-8_2.11. ืึทืืืืืืึธื ืึทืืื, ืืืจ ืืืขืื ืืืื ื ืืฆื ืึท ืึทืจืืึทืคืึทืงื ืคึฟืึทืจ ืื ืืขืจืึทืงืืื ื ืืื ืื PostgreSQL ืืึทืืึทืืืืก; ืืืจ ืืืขืื ืึทืจืืืขืจืคืืจื ืืื ืืืจื -- ืคึผืึทืงืึทืืืฉืึทื.
ืคึฟืึทืจ ืื ืืืืืืงืืื ืคืื ืื ืฉืจืืคื, ืืืจ ืืืขืื ืืืื ืึทืจืืึทื ื ืขืืขื ืืื ืึทืจืืึทื ืฉืจืืึทื ืคึผืึทืจืึทืืขืืขืจืก ืื ื ืึธืืขื ืคืื ืื ืึธื ืืึธื ืกืขืจืืืขืจ ืืื ืื ืืขืืข ืคืื โโืืืึธืก ืืืจ ืืืืื ืฆื ืืึทืงืืืขื ืืึทืื.
ืึทืืื, ืขืก ืืื ืฆืืื ืฆื ืงืึทืืขืจ ืืื ืงืึธื ืืจืึธืืืจื ืื ืคืึทื ืืงืฉืึทื ืึทืืืื ืคืื ืื ืกืืกืืขื:
spark-submit
--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2,
org.postgresql:postgresql:9.4.1207
spark_job.py localhost:9092 transaction
ืึทืืฅ ืืึธื ืืืืกืืขืึทืจืืขื! ืืื ืืืจ ืงืขื ืขื ืืขื ืืื ืื ืืืื ืืื ืื, ืืืขื ืื ืึทืคึผืืึทืงืืืฉืึทื ืืื ืคืืืกื ืืืง, ื ืืึทืข ืึทืืืจืขืืึทืืืึธื ืจืขืืืืืึทืื ืืขื ืขื ืคึผืจืึธืืืงืฆืืข ืืขืืขืจ 2 ืกืขืงืื ืืขืก, ืืืืึทื ืืืจ ืฉืืขืื ืื ืืึทืืฉืื ื ืืขืืึทืืขื ืฆื 2 ืกืขืงืื ืืขืก ืืืขื ืืืจ ืืืฉืืคื ืืขื StreamingContext ืืืืคืขืฅ:
ืืืืึทืืขืจ, ืืืจ ืืึทืื ืึท ืคึผืฉืื ืึธื ืคึฟืจืขื ืฆื ืื ืืึทืืึทืืืืก ืฆื ืงืึธื ืืจืึธืืืจื ืื ืืืึทืืืึทื ืคืื ืจืขืงืึธืจืืก ืืื ืื ืืืฉ ืืจืึทื ืกืึทืงืืืึธื_ืคืืึธืื:
ืกืึธืฃ
ืืขืจ ืึทืจืืืงื ืืึธื ืืขืงืืงื ืืืืฃ ืึท ืืืืฉืคึผืื ืคืื ืกืืจืื ืคึผืจืึทืกืขืกืื ื ืคืื ืืื ืคึฟืึธืจืืึทืฆืืข ื ืืฆื Spark Streaming ืืื ืงืึทื ืืืฉืึทื ืืงืฉืึทื ืืื Apache Kafka ืืื PostgreSQL. ืืื ืืขื ืืืึผืงืก ืคืื ืืึทืื ืคึฟืื ืคืึทืจืฉืืื ืงืืืืื, ืขืก ืืื ืฉืืืขืจ ืฆื ืึธืืืืขืจืขืกืืึทืืืื ืื ืคึผืจืึทืงืืืฉ ืืืขืจื ืคืื Spark Streaming ืคึฟืึทืจ ืงืจืืืืืื ื ืกืืจืืืื ื ืืื ืคืึทืงืืืฉ-ืฆืืื ืึทืคึผืืึทืงืืืฉืึทื ื.
ืืืจ ืงืขื ืขื ืืขืคึฟืื ืขื ืื ืคืื ืืงืืจ ืงืึธื ืืื ืืืื ืจืืคึผืึทืืึทืืึธืจื ืืืึท
ืืื ืืื ืฆืืคืจืืื ืฆื ืืืกืงืืืืจื ืืขื ืึทืจืืืงื, ืืื ืงืืง ืคืึธืจืืืก ืฆื ืืืื ืืึทืืขืจืงืื ืืขื, ืืื ืืื ืืืื ืืึธืคึฟื ืคึฟืึทืจ ืงืึทื ืกืืจืึทืงืืืื ืงืจืืืืง ืคืื ืึทืืข ืงืึทืจืื ื ืืืืขื ืขืจ.
ืืื ืืืื ืืฉื ืืืจ ืืฆืืื!
ืคึผืก. ืืืืืขืก, ืขืก ืืื ืืขืืืขื ืคึผืืึทื ื ืขื ืฆื ื ืืฆื ืึท ืืืืข PostgreSQL ืืึทืืึทืืืืก, ืึธืืขืจ ืืืืึทื ืคืื ืืืื ืืืืข ืคึฟืึทืจ AWS, ืืื ืืึทืฉืืึธืกื ืฆื ืึทืจืืืขืจืคืืจื ืื ืืึทืืึทืืืืก ืฆื ืื ืืืึธืืงื. ืืื ืืขืจ ืืืืึทืืขืจ ืึทืจืืืงื ืืืืฃ ืืขื ืืขืืข, ืืื ืืืขื ืืืืึทืื ืืื ืฆื ืื ืกืืจืืืขื ื ืื ืืื ืฆืข ืกืืกืืขื ืืืกืงืจืืืื ืืืืื ืืื AWS ื ืืฆื AWS Kinesis ืืื AWS EMR. ืืื ืื ื ืืึทืขืก!
ืืงืืจ: www.habr.com