ืืืื
Amplitude ืืืืืื ืืช ืขืฆืื ืืืื ืืืื ืื ืืชืื ืืืฆืจ ืืืืืช ืืืืืจืช ืืืืจืืขืื ืืงืื ืฉืื ืืืืืฉืืช ืืืืืื. ืืืขืืชืื ืงืจืืืืช ืืฉ ืฆืืจื ืืืืืืจ ืืืื ืืืืืก ืืฉืื, ืืืกืืฃ ืืฉืชืืฉืื, ืื ืืื ืืช ืืื ืืืืื ืื ืืืขืจืืช BI ืืืจืช. ื ืืชื ืืืฆืข ืืื ืื ืืื ืจืง ืขื ื ืชืื ื ืืืจืืขืื ืืืืืืื ืืืืคืืืืื. ืืืืจ ืื ืืืื ืื ืืืฆื ืืืฉืื ื ืชืื ืื ืืื ืขื ืืืข ืืื ืืืื ืืชืื ืืช.
ืืจืืฉืืช ืืืงืืืืช
- ืคืจืืืงื ื-Amplitude ืื ืืืจืืขืื ืืืจ ืืืืืจืื ืืืืื ืื ืืกืคืื ืขืืืื ืกืืืืกืืืงื
- Python ืืืชืงื (ืื ื ืขืืื ืืืจืกื 3.8.3), ืฉืืงืืจื ืืคืืื ืฆืืืื ืืืจ ืืืื ืืขืืื ืืืชื ืืคืืืช ืืจืื ืืกืืกืืช
ืืืจืื
ืฉืื 1. ืืฉืืช ืืคืชื API ืืืคืชื ืกืืื
ืืื ืืืขืืืช ื ืชืื ืื, ืชืืืื ืขืืื ืืืฉืื ืืคืชื API ืืืคืชื ืกืืื.
ืืชื ืืืื ืืืฆืื ืืืชื ืขื ืืื ืืืฆืืข ืื ืชืื ืืื:
- "ื ืื ื ืชืื ืื" (ืืืืงื ืืคืื ื ืืฉืืืืืช ืืชืืชืื ื ืฉื ืืืกื)
- ืืืจ ืืช ืืคืจืืืงื ืืจืฆืื ืืื ื ืืืจืื ืื ืชืื ืื ืืขืืืจ ืืืื
- ืืชืคืจืื ืืคืจืืืงื ืฉื ืคืชื, ืืืจ "ืืืืจืืช ืคืจืืืงื"
- ืื ื ืืืฆืืื ืืช ืืืจืืืืช ืืคืชื ื-API ืืืืคืชื ืืกืืื, ืืขืชืืงืื ืืฉืืืจืื ืืืชื ืืืงืื ืืืื.
ืืืื ืืืืืฅ, ืืชื ืืืื ืืขืงืื ืืืจ ืืงืืฉืืจ, ืฉืืืืคื ืืืื ื ืจืื ืื:
analytics.amplitude.com/$$$$$$$/manage/project/********/settings,
ืืืฉืจ $$$$$$ ืืื ืืื ืืกื ืืืืคืืืืืื ืฉื ืืืจืืื ืฉืื, ****** ืืื ืืกืคืจ ืืคืจืืืงื
ืฉืื 2: ืืืืงืช ื ืืืืืช ืืกืคืจืืืช ืื ืืจืฉืืช
ืืืืฉืืช ืืืืืืช ืื ืฉืืืขื ืืืืืืืช ืฉืืืจ ืืชืงื ืช ืืช ืืกืคืจืืืช ืืืื ืืืจืืจืช ืืืื ืื ืืืจืื, ืืื ืืชื ืฆืจืื ืืืืืง. ืืจืฉืืื ืืืืื ืฉื ืืกืคืจืืืช ืฉืืื ืืฉืชืืฉืชื ืืืื ืืชืืืช ืฉืืจืืช ืืื (ืืจืกืืืช ืืกืืืจืืื ืืฆืืื ืืช ืืืืืช ืืฆืืจื):
- ืืงืฉืืช (2.10.0) - ืฉืืืืช ืืงืฉื ืืจื API ืืงืืืช ื ืชืื ืื
- pandas (1.0.1) - ืงืจืืืช json, ืืฆืืจืช ืืกืืจืช ื ืชืื ืื ืืื ืืชืืื ืืงืืืฅ
- zipfile - ืืืฅ ืงืืฆืื ืืืจืืืื ืฉืืชืงืื ืืจื ื-API
- gzip - ืคืืจืืง ืงืืฆื json ื-.gz
- os - ืงืืืช ืจืฉืืื ืฉื ืงืืฆืื ืืืจืืืื ืื ืืจืื
- ืืื - ืืืคืฆืืื ืื, ืืืืื ืืช ืืื ืืจืืฆื ืฉื ืืกืงืจืืคื
- tqdm - ืืืคืฆืืื ืื, ืื ืืืืจ ืงื ืฉื ืืชืงืืืืช ืขืืืื ืืงืืฆืื
ืฉืื 3. ืืชืืืช ืกืงืจืืคื ืืืขืื ืช ื ืชืื ืื
ืจืื: ืกืงืจืืคื ืืืืจืื ืืืื ื ืืฆื ืืกืืฃ ืืืืืจ; ืื ืชืจืฆื, ืืชื ืืืื ืืื ืืงืืช ืืืชื ืืืืชืืืืก ืืืกืืจืื ืฉืื ืืืจ ืฉืื ืืืืืช ืืฆืืจื.
ืฉืื 3.1. ืืืืื โโืกืคืจืืืช
ืื ื ืืืืืืื ืืช ืื ืืกืคืจืืืช ืืืคืืจืืืช ืืฉืื ืืฉื ื.
# ะะผะฟะพัั ะฑะธะฑะปะธะพัะตะบ
import requests
import pandas as pd
import zipfile
import gzip
import os
import time
import tqdm
from tqdm import tqdm
ืฉืื 3.2. ืืืฉืช ืืงืฉื ืืืืคืืืืื
ืืืื ืืืืืช ืืช ืชืืืืช ืืืฆืืข ืืกืงืจืืคื ืื ืืชืื ืืืชื ืืืฉืชื ื a.
startdate ื-eatte ืืืจืืื ืืชืงืืคื ืืืืจืืช ืื ืชืื ืื ืืื ืืืื ืื ืืืงืกื ืฉื ืืืงืฉื ืฉื ืฉืืื ืืฉืจืช Amplitude; ืื ืืกืฃ ืืชืืจืื, ื ืืชื ืื ืืฆืืื ืืช ืืฉืขื ืขื ืืื ืฉืื ืื ืืขืจื ืืืืจ 'T' ืืืงืฉื.
api_key ื- secret_key ืชืืืืื ืืขืจืืื ืฉืืืฉืื ืืฉืื ืืจืืฉืื; ืืืืจืืช ืืืืื, ืื ื ืืฆืืื ืืื ืจืฆืคืื ืืงืจืืืื ืืืงืื ืฉืื.
a = time.time()
# ะะฐัะฐะผะตััั ะฝะฐัะฐะปัะฝะพะน ะธ ะบะพะฝะตัะฝะพะน ะดะฐัั
startdate = '20200627'
enddate = '20200628'
api_key = 'kldfg844203rkwekfjs9234'
secret_key = '094tfjdsfmw93mxwfek'
# ะัะฟัะฐะฒะปะตะฝะธะต ะทะฐะฟัะพัะฐ ะฒ Amplitude
response = requests.get('https://amplitude.com/api/2/export?start='+startdate+'T0&end='+enddate+'T0', auth = (api_key, secret_key))
print('1. ะะฐะฟัะพั ะพัะฟัะฐะฒะปะตะฝ')
ืฉืื 3.3. ืืืจืืช ืืจืืืื ืขื ื ืชืื ืื
ื ืืฆืื ืฉื ืืืจืืืื ืื ืืชืื ืืืชื ืืืฉืชื ื ืฉื ืืงืืืฅ. ืื ืืืืืชื, ืื ื ืืฆืืื ืืช ืืชืงืืคื + ืืฆืืื ืฉืืืืืจ ืื ืชืื ื ืืฉืจืขืช. ืืืืจ ืืื, ืื ื ืืชืขืืื ืืช ืืชืืืื ืฉืืชืงืืื ืืืืคืืืืื ืืืจืืืื.
# ะกะบะฐัะธะฒะฐะฝะธะต ะฐัั
ะธะฒะฐ ั ะดะฐะฝะฝัะผะธ
filename = 'period_since'+startdate+'to'+enddate+'_amplitude_data'
with open(filename + '.zip', "wb") as code:
code.write(response.content)
print('2. ะัั
ะธะฒ ั ัะฐะนะปะฐะผะธ ััะฟะตัะฝะพ ัะบะฐัะฐะฝ')
ืฉืื 3.4. ืืืืืจ ืงืืฆืื ืืชืืงืื ืืืืฉื ืฉืื
ืกืคืจืืืช ื-zipfile ื ืื ืกืช ืืคืขืืื ืืื ืืขืืืจ ืืืืฅ ืงืืฆืื. ืืฉืืจื ืืฉืืืฉืืช, ืืืืืจ ืืจืฉืื ืืช ืืืจื ืฉืื ืืืื ืฉืืืชืจ ื ืื ืื ืืืืฅ.
# ะะทะฒะปะตัะตะฝะธะต ัะฐะนะปะพะฒ ะฒ ะฟะฐะฟะบั ะฝะฐ ะบะพะผะฟัััะตัะต
z = zipfile.ZipFile(filename + '.zip', 'r')
z.extractall(path = 'C:\Users\...\'+filename)
print('3. ะัั
ะธะฒ ั ัะฐะนะปะฐะผะธ ะธะทะฒะปะตัะตะฝ ะธ ะทะฐะฟะธัะฐะฝ ะฒ ะฟะฐะฟะบั ' + filename)
ืฉืื 3.5. ืืืจืช json
ืืืืจ ืืืืืฅ ืืงืืฆืื ืืืืจืืืื, ืขืืื ืืืืืจ ืงืืฆื json ืืคืืจืื .gz ืืืืชืื ืืืชื ืืชืื Dataframe ืืืืฉื ืขืืืื.
ืฉืืื ืื ืฉืืื ืขืืืื ืืฉื ืืช ืฉืื ืืช ืื ืชืื ืืฉืื, ืืืืงืื 000000 ืืืชืื ืืช ืืกืคืจ ืืคืจืืืงื ืฉืืื ื-Amplitude (ืื ืืคืชืื ืืื ืืช ืืช ืื ืชืื ืฉืื ืืืืฅ ืืืจืืืื ืืืืกืชืื ืขื ืฉื ืืชืืงืื ืฉืืชืืื).
ืืกืืจ:
ืืชืืืช ืกืคืจืื ืืืฉืชื ื, ืงืืืช ืจืฉืืืช ืงืืฆืื ืืกืคืจืื, ืืฆืืจืช Dataframe ืจืืง, time.sleep(1) ืืื ืฉ-tqdm ืืขืืื ืืื ืฉืฆืจืื, ืืชืื ืืืืืื ื ืคืชื ืงืืฆื .gz ืืืื ืืฉืชืืฉืื ืืคื ืืืช ืืื ืืงืจืื json ืืืืื ืืกืืจืช ืื ืชืื ืื ืื ืชืื ื.
# ะัะตะพะฑัะฐะทะพะฒะฐะฝะธะต json ะบ ะพะฑััะฝะพะผั ัะฐะฑะปะธัะฝะพะผั ัะพัะผะฐัั
directory = 'C:\Users\...\'+filename+'# ะัะตะพะฑัะฐะทะพะฒะฐะฝะธะต json ะบ ะพะฑััะฝะพะผั ัะฐะฑะปะธัะฝะพะผั ัะพัะผะฐัั
directory = 'C:\Users\...\'+filename+'\000000'
files = os.listdir(directory)
amplitude_dataframe = pd.DataFrame()
print('ะัะพะณัะตัั ะพะฑัะฐะฑะพัะบะธ ัะฐะนะปะพะฒ:')
time.sleep(1)
for i in tqdm(files):
with gzip.open(directory + '\' + i) as f:
add = pd.read_json(f, lines = 'True')
amplitude_dataframe = pd.concat([amplitude_dataframe, add])
time.sleep(1)
print('4. JSON ัะฐะนะปั ะธะท ะฐัั
ะธะฒะฐ ััะฟะตัะฝะพ ะฟัะตะพะฑัะฐะทะพะฒะฐะฝั ะธ ะทะฐะฟะธัะฐะฝั ะฒ dataframe')
0000'
files = os.listdir(directory)
amplitude_dataframe = pd.DataFrame()
print('ะัะพะณัะตัั ะพะฑัะฐะฑะพัะบะธ ัะฐะนะปะพะฒ:')
time.sleep(1)
for i in tqdm(files):
with gzip.open(directory + '\' + i) as f:
add = pd.read_json(f, lines = 'True')
amplitude_dataframe = pd.concat([amplitude_dataframe, add])
time.sleep(1)
print('4. JSON ัะฐะนะปั ะธะท ะฐัั
ะธะฒะฐ ััะฟะตัะฝะพ ะฟัะตะพะฑัะฐะทะพะฒะฐะฝั ะธ ะทะฐะฟะธัะฐะฝั ะฒ dataframe')
ืฉืื 3.6. ืืชืืืช Dataframe ืืืงืกื
ืืืขืืื ื-exel ืืื ืจืง ืืืืื ืืื. ืืืงืจืื ืจืืื, ื ืื ืืืชืจ ืืขืืื ืขื ืืกืืจืช ืื ืชืื ืื ืืืชืงืืืช ืืชืื python ืื ืืืื ืืก ืืช ืื ืชืื ืื ืืืืกืื.
ืชืฆืืจื ืื ืืืืืืฃ ืืช ื ืชืื ืืขืืืช ืื ืชืื ืื ืืื ืื ืชืื ืฉืื.
# ะะฐะฟะธัะฐัั ะฟะพะปััะตะฝะฝะพะน ัะฐะฑะปะธัั ะฒ Excel-ัะฐะนะป
amplitude_dataframe.to_excel('C:\Users\...\'+filename+'.xlsx',index=False)
print('5. Dataframe ััะฟะตัะฝะพ ะทะฐะฟะธัะฐะฝ ะฒ ัะฐะนะป ' + filename)
ืฉืื 3.7. ืื ื ืกืืคืจืื ืืช ืืื ืืจืืฆื ืฉื ืืชืกืจืื
ืจืืฉืื ืืืื ืื ืืืื ืืืฉืชื ื b, ืืืฉืื ืืืคืจืฉ ืืืกืคืจ ืืืงืืช, ืืฆืืช ืกื ืืืงืืช. ืื ืืฉืื ืืืืจืื.
b = time.time()
diff = b-a
minutes = diff//60
print('ะัะฟะพะปะฝะตะฝะธะต ะบะพะดะฐ ะทะฐะฝัะปะพ: {:.0f} ะผะธะฝัั(ั)'.format( minutes))
ืืกืงื ื
ืืชื ืืืื ืืงืจืื ืืืืื ืืืืชืืื ืืขืืื ืืืชื ืขื ืืื ืงืจืืื ืืืฉืชื ื amplitude_dataframe ืฉืืืื ื ืืชืื ืื ืชืื ืื. ืืืื ืื ื-50 ืขืืืืืช, ืืชืืื ื-80% ืืืืงืจืื ืชืฉืชืืฉื ื: event_type - event_name, event_properties - event parameters, event_time - event time, uuid - client id, user_properties - ืคืจืืืจื ืืงืื, ืืืื ืืืชืืื ืืขืืื ืืืชื ืงืืื. . ืืืืฉืจ ืืฉืืืื ื ืชืื ืื ืืืืืฉืืืื ืฉืื ืขื ืืื ืืืงืืืจืื ืืืจืืื ืืืืืื ืื ืฉื Amplitude, ืืกืืจ ืื ืืฉืืื ืฉืืืขืจืืช ืืฉืชืืฉืช ืืืชืืืืืืืื ืืฉืื ืืืืฉืื ืืงืืืืช/ืืฉืคืืื ืืืืืืืื ืืื', ืืืคื ื ืฉืขืืฉืื ืืืช, ืืืื ืืืืื ืืงืจืื ืืช ืชืืขืื Amplitude.
ืชืืื ืื ืขื ืชืฉืืืช ืืื! ืืขืช ืืชื ืืืื ืืืขืืืช ื ืชืื ื ืืืจืืขืื ืืืืืืื ื-Amplitude ืืืืฉืชืืฉ ืืื ืืืืคื ืืื ืืขืืืื ืฉืื.
ืื ืืชืกืจืื:
# ะะผะฟะพัั ะฑะธะฑะปะธะพัะตะบ
import requests
import pandas as pd
import zipfile
import gzip
import os
import time
import tqdm
from tqdm import tqdm
a = time.time()
# ะะฐัะฐะผะตััั ะฝะฐัะฐะปัะฝะพะน ะธ ะบะพะฝะตัะฝะพะน ะดะฐัั
startdate = '20200627'
enddate = '20200628'
api_key = 'd988fddd7cfc0a8a'
secret_key = 'da05cf1aeb3a361a61'
# ะัะฟัะฐะฒะปะตะฝะธะต ะทะฐะฟัะพัะฐ ะฒ Amplitude
response = requests.get('https://amplitude.com/api/2/export?start='+startdate+'T0&end='+enddate+'T0', auth = (api_key, secret_key))
print('1. ะะฐะฟัะพั ะพัะฟัะฐะฒะปะตะฝ')
# ะกะบะฐัะธะฒะฐะฝะธะต ะฐัั
ะธะฒะฐ ั ะดะฐะฝะฝัะผะธ
filename = 'period_since'+startdate+'to'+enddate+'_amplitude_data'
with open(filename + '.zip', "wb") as code:
code.write(response.content)
print('2. ะัั
ะธะฒ ั ัะฐะนะปะฐะผะธ ััะฟะตัะฝะพ ัะบะฐัะฐะฝ')
# ะะทะฒะปะตัะตะฝะธะต ัะฐะนะปะพะฒ ะฒ ะฟะฐะฟะบั ะฝะฐ ะบะพะผะฟัััะตัะต
z = zipfile.ZipFile(filename + '.zip', 'r')
z.extractall(path = 'C:\Users\...\'+filename)
print('3. ะัั
ะธะฒ ั ัะฐะนะปะฐะผะธ ะธะทะฒะปะตัะตะฝ ะธ ะทะฐะฟะธัะฐะฝ ะฒ ะฟะฐะฟะบั ' + filename)
# ะัะตะพะฑัะฐะทะพะฒะฐะฝะธะต json ะบ ะพะฑััะฝะพะผั ัะฐะฑะปะธัะฝะพะผั ัะพัะผะฐัั
directory = 'C:\Users\...\'+filename+'# ะะผะฟะพัั ะฑะธะฑะปะธะพัะตะบ
import requests
import pandas as pd
import zipfile
import gzip
import os
import time
import tqdm
from tqdm import tqdm
a = time.time()
# ะะฐัะฐะผะตััั ะฝะฐัะฐะปัะฝะพะน ะธ ะบะพะฝะตัะฝะพะน ะดะฐัั
startdate = '20200627'
enddate = '20200628'
api_key = 'd988fddd7cfc0a8a'
secret_key = 'da05cf1aeb3a361a61'
# ะัะฟัะฐะฒะปะตะฝะธะต ะทะฐะฟัะพัะฐ ะฒ Amplitude
response = requests.get('https://amplitude.com/api/2/export?start='+startdate+'T0&end='+enddate+'T0', auth = (api_key, secret_key))
print('1. ะะฐะฟัะพั ะพัะฟัะฐะฒะปะตะฝ')
# ะกะบะฐัะธะฒะฐะฝะธะต ะฐัั
ะธะฒะฐ ั ะดะฐะฝะฝัะผะธ
filename = 'period_since'+startdate+'to'+enddate+'_amplitude_data'
with open(filename + '.zip', "wb") as code:
code.write(response.content)
print('2. ะัั
ะธะฒ ั ัะฐะนะปะฐะผะธ ััะฟะตัะฝะพ ัะบะฐัะฐะฝ')
# ะะทะฒะปะตัะตะฝะธะต ัะฐะนะปะพะฒ ะฒ ะฟะฐะฟะบั ะฝะฐ ะบะพะผะฟัััะตัะต
z = zipfile.ZipFile(filename + '.zip', 'r')
z.extractall(path = 'C:\Users\...\'+filename)
print('3. ะัั
ะธะฒ ั ัะฐะนะปะฐะผะธ ะธะทะฒะปะตัะตะฝ ะธ ะทะฐะฟะธัะฐะฝ ะฒ ะฟะฐะฟะบั ' + filename)
# ะัะตะพะฑัะฐะทะพะฒะฐะฝะธะต json ะบ ะพะฑััะฝะพะผั ัะฐะฑะปะธัะฝะพะผั ัะพัะผะฐัั
directory = 'C:\Users\...\'+filename+'\000000'
files = os.listdir(directory)
amplitude_dataframe = pd.DataFrame()
print('ะัะพะณัะตัั ะพะฑัะฐะฑะพัะบะธ ัะฐะนะปะพะฒ:')
time.sleep(1)
for i in tqdm(files):
with gzip.open(directory + '\' + i) as f:
add = pd.read_json(f, lines = 'True')
amplitude_dataframe = pd.concat([amplitude_dataframe, add])
time.sleep(1)
print('4. JSON ัะฐะนะปั ะธะท ะฐัั
ะธะฒะฐ ััะฟะตัะฝะพ ะฟัะตะพะฑัะฐะทะพะฒะฐะฝั ะธ ะทะฐะฟะธัะฐะฝั ะฒ dataframe')
# ะะฐะฟะธัะฐัั ะฟะพะปััะตะฝะฝะพะน ัะฐะฑะปะธัั ะฒ Excel-ัะฐะนะป
amplitude_dataframe.to_excel('C:\Users\...\'+filename+'.xlsx',index=False)
print('5. Dataframe ััะฟะตัะฝะพ ะทะฐะฟะธัะฐะฝ ะฒ ัะฐะนะป ' + filename)
b = time.time()
diff = b-a
minutes = diff//60
print('ะัะฟะพะปะฝะตะฝะธะต ะบะพะดะฐ ะทะฐะฝัะปะพ: {:.0f} ะผะธะฝัั(ั)'.format( minutes))
0000'
files = os.listdir(directory)
amplitude_dataframe = pd.DataFrame()
print('ะัะพะณัะตัั ะพะฑัะฐะฑะพัะบะธ ัะฐะนะปะพะฒ:')
time.sleep(1)
for i in tqdm(files):
with gzip.open(directory + '\' + i) as f:
add = pd.read_json(f, lines = 'True')
amplitude_dataframe = pd.concat([amplitude_dataframe, add])
time.sleep(1)
print('4. JSON ัะฐะนะปั ะธะท ะฐัั
ะธะฒะฐ ััะฟะตัะฝะพ ะฟัะตะพะฑัะฐะทะพะฒะฐะฝั ะธ ะทะฐะฟะธัะฐะฝั ะฒ dataframe')
# ะะฐะฟะธัะฐัั ะฟะพะปััะตะฝะฝะพะน ัะฐะฑะปะธัั ะฒ Excel-ัะฐะนะป
amplitude_dataframe.to_excel('C:\Users\...\'+filename+'.xlsx',index=False)
print('5. Dataframe ััะฟะตัะฝะพ ะทะฐะฟะธัะฐะฝ ะฒ ัะฐะนะป ' + filename)
b = time.time()
diff = b-a
minutes = diff//60
print('ะัะฟะพะปะฝะตะฝะธะต ะบะพะดะฐ ะทะฐะฝัะปะพ: {:.0f} ะผะธะฝัั(ั)'.format( minutes))
ืืงืืจ: www.habr.com