Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga

O le tusitala o le tusiga, o le faaliliuga o loʻo matou lolomiina i aso nei, fai mai o lona sini o le talanoa e uiga i le atinaʻeina o se upega tafaʻilagi i le Python e faʻaaoga ai le Seleni, lea e suʻe ai tau o tiketi vaalele. Pe a suʻe tiketi, e faʻaaoga aso fetuutuunai (+- 3 aso e faʻatatau i aso faʻamaonia). E fa'asaoina e le tagata su'esu'e i'uga o su'esu'ega i se faila Excel ma tu'u atu i le tagata na fa'atautaia le su'ega se imeli ma se aotelega o mea na latou mauaina. O le sini o lenei poloketi o le fesoasoani lea i tagata faimalaga e suʻe tau sili ona lelei.

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga

Afai, a'o e malamalama i mea, ua e lagona le leiloa, va'ai i lenei tala.

O le a le mea o le a tatou sailia?

Ua e sa'oloto e fa'aoga le faiga o lo'o fa'amatalaina iinei pe a e mana'o ai. Mo se faʻataʻitaʻiga, na ou faʻaaogaina e suʻe ai tafaoga i vaiaso ma tiketi i loʻu nuʻu. Afai e te faʻamaoni i le sailia o tiketi aoga, e mafai ona e faʻatautaia le tusitusiga i luga o le 'auʻaunaga (faigofie tautua, mo 130 rubles i le masina, e fetaui lelei mo lenei mea) ma ia mautinoa e tasi pe faalua i le aso. Ole su'esu'ega ole a lafo atu ile imeli. E le gata i lea, ou te fautuaina le faʻatulagaina o mea uma ina ia mafai e le tusitusiga ona faʻasaoina se faila Excel ma suʻesuʻega iʻuga i totonu o le Dropbox folder, lea e mafai ai ona e vaʻai i ia faila mai soʻo se mea ma soʻo se taimi.

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
Ou te le'i mauaina lava tau o lo'o i ai mea sese, ae ou te manatu e mafai

Pe a suʻesuʻe, e pei ona taʻua muamua, o le "aso fetuutuunai" o loʻo faʻaaogaina e maua ai e le tusitusiga ni ofo i totonu o le tolu aso o aso ua tuʻuina atu. E ui lava pe a faʻagasolo le tusitusiga, e suʻe ni ofo i le tasi itu, e faigofie ona faʻaleleia ina ia mafai ai ona aoina faʻamatalaga i luga o le tele o faʻatonuga o vaalele. Faatasi ai ma lana fesoasoani, e mafai foi ona e suʻeina ni tau sese;

Aisea e te mana'omia ai se isi su'ega uepi?

I le taimi muamua na ou amata ai le suʻeina o upega tafaʻilagi, ou te leʻi fiafia tele i ai. Na ou manaʻo e fai nisi galuega faʻatino i le tulaga o faʻataʻitaʻiga faʻataʻitaʻiga, suʻesuʻega tau tupe, ma, atonu, i le fanua o le suʻesuʻeina o le lanu faʻalagona o tusitusiga. Ae na foliga mai e manaia tele le mafaufau pe faʻapefea ona fatuina se polokalame e aoina mai faʻamatalaga mai luga o upega tafaʻilagi. A o ou suʻesuʻeina lenei autu, na ou iloa ai o le 'upega tafaʻilagi o le "inisinia" o le Initaneti.

Atonu e te manatu o se faamatalaga mataʻutia tele lea. Ae mafaufau o Google na amata i se upega tafaʻilagi na faia e Larry Page e faʻaaoga ai Java ma Python. Google robots o loʻo suʻesuʻeina le Initaneti, taumafai e tuʻuina atu i ana tagata faʻaoga tali sili ia latou fesili. O le su'ega i luga o le upega tafaʻilagi e leai se faʻaoga, ma e tusa lava pe e te fiafia i se isi mea i Faʻamatalaga Saienisi, e te manaʻomia ni tomai vaʻaia e maua ai faʻamatalaga e te manaʻomia e suʻeina.

Na ou mauaina nisi o metotia na faʻaaogaina iinei i se matagofie le tusi e uiga i luga o upega tafaʻilagi, lea na ou mauaina talu ai nei. O loʻo iai le tele o faʻataʻitaʻiga faigofie ma manatu mo le faʻatinoina o mea ua e aʻoaʻoina. E le gata i lea, o loʻo i ai se mataupu sili ona manaia i le faʻafefe o siaki reCaptcha. Na oʻo mai lenei mea o se tala ia te aʻu, talu ai ou te leʻi iloa o loʻo i ai meafaigaluega faʻapitoa ma e oʻo lava i auaunaga uma mo le foia o ia faʻafitauli.

E te fiafia e faimalaga?!

I le fesili faigofie ma le le afaina o loʻo tuʻuina atu i le ulutala o lenei vaega, e masani ona e faʻalogo i se tali lelei, faʻatasi ma ni nai tala mai femalagaaiga a le tagata na fesiligia ai. O le to'atele oi tatou e ioe o le femalagaa'i o se auala sili lea e fa'atofu ai oe i si'osi'omaga fa'aleaganu'u fou ma fa'alautele lau va'aiga. Ae peitaʻi, afai e te fesili i se tasi pe latou te fiafia e suʻe tiketi vaalele, ou te mautinoa o le tali o le a le matua lelei. O le mea moni, e sau Python e fesoasoani iinei.

O le galuega muamua tatou te manaʻomia e foia i luga o le auala i le fatuina o se faiga mo le suʻeina o faʻamatalaga i luga o tiketi vaalele o le filifilia lea o se faʻavae talafeagai e maua ai faʻamatalaga. O le foia o lenei faafitauli sa le faigofie ia te au, ae i le faaiuga na ou filifilia le auaunaga Kayak. Sa ou taumafai i auaunaga a Momondo, Skyscanner, Expedia, ma nisi o isi, ae o auala e puipuia ai le robot i luga o nei punaoa e le mafai ona faʻaaogaina. Ina ua mavae ni nai taumafaiga, i le taimi na tatau ai ona ou feagai ma moli o auala, savaliga savali ma uila, taumafai e faatalitonu le faiga o aʻu o le tagata, na ou filifili ai e sili ona fetaui Kayak mo aʻu, e ui lava i le mea moni e tusa lava pe tele naua itulau ua utaina. i se taimi puupuu, ma amata foi siaki. Na mafai ona ou faia le bot e auina atu talosaga i luga o le saite i vaeluaga o le 4 i le 6 itula, ma sa lelei mea uma. Mai lea taimi i lea taimi, e tulaʻi mai faʻafitauli pe a galulue ma Kayak, ae afai latou te amata faʻalavelaveina oe i siaki, ona e manaʻomia lea ona e taulimaina ma le lima ona faʻalauiloa lea o le bot, pe faʻatali mo ni nai itula ma e tatau ona taofi siaki. Afai e manaʻomia, e faigofie ona e faʻafetaui le code mo se isi faʻavae, ma afai e te faia, e mafai ona e lipotia i faʻamatalaga.

Afai o loʻo e amataina i luga o le upega tafaʻilagi ma e te le iloa pe aisea e tauivi ai nisi o upega tafaʻilagi, ona e leʻi amataina lau galuega muamua i lenei vaega, ia e faia se suʻega Google i luga o upu "web scraping etiquette" . O au su'esu'ega e ono fa'amuta vave nai lo le mea e te manatu pe afai e te faia ma le le atamai le su'eina o upega tafa'ilagi.

Amataina

O se aotelega lautele lenei o mea o le a tupu i totonu o la matou upega tafaʻilagi:

  • Fa'aulufale mai faletusi mana'omia.
  • Tatala le Google Chrome tab.
  • Valaau se galuega e amata ai le bot, pasi atu i aai ma aso o le a faʻaaogaina pe a suʻe tiketi.
  • O lenei galuega e ave ai fa'ai'uga muamua o su'esu'ega, fa'avasega i le mea e sili ona lelei, ma kiliki se ki e uta ai nisi fa'ai'uga.
  • O le isi galuega e aoina mai faʻamatalaga mai le itulau atoa ma toe faʻafoʻi se faʻamatalaga faʻamatalaga.
  • O laasaga muamua e lua o loʻo faʻatinoina e ala i le faʻavasegaina o ituaiga i le tau o tiketi (taugofie) ma le saoasaoa o le vaalele (sili ona vave).
  • O le tagata faʻaoga o le faʻamaumauga e tuʻuina atu se imeli o loʻo i ai se aotelega o tau o pepa (tiketi sili ona taugofie ma tau averesi), ma o se faʻamatalaga faʻamatalaga faʻatasi ai ma faʻamatalaga faʻavasegaina e faʻailoga e tolu o loʻo taʻua i luga e faʻasaoina o se faila Excel.
  • O gaioiga uma o loʻo i luga e faia i se taamilosaga pe a maeʻa se taimi faʻapitoa.

E tatau ona maitauina o galuega uma a Seleni e amata i se avetaʻavale web. Ou te faaaogaina Chromedriver, Ou te galue ma Google Chrome, ae e iai isi filifiliga. PhantomJS ma Firefox e lauiloa foi. A maeʻa ona sii mai le avetaʻavale, e tatau ona e tuʻuina i totonu o le pusa talafeagai, ma faʻamaeʻa ai le sauniuniga mo lona faʻaaogaina. O laina muamua o la matou tusitusiga e tatalaina ai le Chrome tab fou.

Ia manatua i totonu o laʻu tala ou te le o taumafai e tatalaina vaʻaiga fou mo le sailia o fefaʻatauaiga sili i tiketi vaalele. O lo'o i ai le tele o auala sili atu e su'e ai ia ofo. Na'o lo'u mana'o e ofo atu i le aufaitau lenei mea se auala faigofie ae aoga e foia ai lenei faafitauli.

O le code lea na matou talanoa ai i luga.

from time import sleep, strftime
from random import randint
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import smtplib
from email.mime.multipart import MIMEMultipart

# Используйте тут ваш путь к chromedriver!
chromedriver_path = 'C:/{YOUR PATH HERE}/chromedriver_win32/chromedriver.exe'

driver = webdriver.Chrome(executable_path=chromedriver_path) # Этой командой открывается окно Chrome
sleep(2)

I le amataga o le code e mafai ona e vaʻai i le faʻaulufaleina mai o pusa o loʻo faʻaaogaina i la matou poloketi. O lea, randint fa'aaoga e fai ai le bot "moe" mo se numera fa'afuase'i o sekone a'o le'i amataina se su'esu'ega fou. E masani lava, e leai se bot e mafai ona faia e aunoa ma lenei. Afai e te faʻaogaina le tulafono o loʻo i luga, o le a tatalaina se faʻamalama Chrome, lea o le a faʻaogaina e le bot e galue ai ma nofoaga.

Se'i o tatou fai sina fa'ata'ita'iga ma tatala le 'upega tafa'ilagi a kayak.com i se isi fa'amalama. O le a tatou filifilia le aai o le a tatou felelei mai ai, ma le aai tatou te fia oo i ai, faapea foi ma aso o le vaalele. Pe a filifilia aso, ia mautinoa o loʻo faʻaaogaina le va o +-3 aso. Na ou tusia le code e faʻatatau i mea o loʻo gaosia e le saite e tali atu ai i ia talosaga. Afai, mo se faʻataʻitaʻiga, e te manaʻomia le suʻeina o tiketi mo naʻo aso faʻapitoa, ona i ai lea o se avanoa maualuga e tatau ona e suia le code bot. Pe a ou talanoa e uiga i le code, ou te tuʻuina atu faʻamatalaga talafeagai, ae afai e te lagona le le mautonu, taʻu mai ia te aʻu.

Kiliki nei le ki su'esu'e ma va'ai i le so'otaga i totonu o le tuatusi pa. E tatau ona tutusa ma le soʻotaga ou te faʻaogaina i le faʻataʻitaʻiga o loʻo i lalo o loʻo faʻaalia ai le fesuiaiga kayak, lea e teu ai le URL, ma le auala e faʻaaogaina get aveta'avale i luga ole laiga. A uma ona kiliki le ki su'esu'e, e tatau ona aliali mai i'uga i luga o le itulau.

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
Ina ua ou faaaogaina le poloaiga get sili atu i le lua pe tolu taimi i totonu o ni nai minute, na talosagaina aʻu e faʻatumu le faʻamaoniga e faʻaaoga ai le reCaptcha. E mafai ona e pasi ma le lima lenei siaki ma fa'aauau su'esu'ega se'ia filifili le faiga e fai se siaki fou. Ina ua ou faʻataʻitaʻiina le tusitusiga, e foliga mai o le taimi muamua o suʻesuʻega na alu malie, o lea afai e te manaʻo e faʻataʻitaʻi i le code, e tatau ona e siaki faʻatasi ma faʻagasolo le code, faʻaaoga taimi uumi i le va o suʻesuʻega. Ma, afai e te mafaufau i ai, atonu e le manaʻomia e se tagata faʻamatalaga e uiga i tau o tiketi maua i le 10-minute vaeluaga i le va o galuega suʻesuʻe.

Galulue ma se itulau e faʻaaoga ai le XPath

O lea, na matou tatalaina se faamalama ma utaina le saite. Ina ia maua le tau ma isi faʻamatalaga, matou te manaʻomia le faʻaogaina o le XPath tekinolosi poʻo le CSS filifilia. Na ou filifili e tumau i le XPath ma ou te leʻi lagona le manaʻomia e faʻaoga CSS filifilia, ae e mafai lava ona galue i lena auala. O le faʻataʻamilo i se itulau e faʻaaoga ai le XPath e mafai ona taufaasese, ma e tusa lava pe e te faʻaogaina auala na ou faʻamatalaina i totonu lenei tusiga, lea e aofia ai le kopiina o faʻamatalaga talafeagai mai le itulau code, na ou iloa ai, o le mea moni, e le o le auala sili lea e maua ai elemene manaʻomia. I le auala, i lenei O loʻo tuʻuina mai e le tusi se faʻamatalaga sili ona lelei o faʻavae o le galue i itulau e faʻaaoga ai XPath ma CSS filifilia. O le mea lea e foliga mai ai le auala aveta'avale i luga ole laiga.

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
O lea, tatou faʻaauau pea ona galue i luga o le bot. Se'i o tatou fa'aoga le tomai o le polokalame e filifili ai tiketi sili ona taugofie. I le ata o loʻo mulimuli mai, o le XPath selector code o loʻo faʻamaonia i le mumu. Ina ia mafai ona vaʻai i le code, e tatau ona e kiliki-i luga o le itulau elemene e te fiafia i ai ma filifili le Inspect command mai le lisi o loʻo faʻaalia. O lenei poloaiga e mafai ona valaʻau mo elemene itulau eseese, o le faʻailoga o le a faʻaalia ma faʻamaonia i le vaʻaiga code.

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
Va'ai itulau code

Ina ia maua se faʻamaoniga o laʻu manatu e uiga i le le lelei o le kopiina o tagata filifilia mai le code, faʻalogo i vaega nei.

O le mea lea e te maua pe a e kopiina le code:

//*[@id="wtKI-price_aTab"]/div[1]/div/div/div[1]/div/span/span

Ina ia mafai ona kopi se mea e pei o lenei, e tatau ona e kiliki-i luga o le vaega o le code e te fiafia i ai ma filifili le Kopi> Kopi XPath poloaiga mai le lisi e aliali mai.

O le mea lea na ou faʻaaogaina e faʻamatala ai le faʻamau sili ona taugofie:

cheap_results = ‘//a[@data-code = "price"]’

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
Kopi Poloaiga > Kopi XPath

E matua manino lava o le filifiliga lona lua e foliga sili atu ona faigofie. A fa'aoga, e su'e se elemene a e iai le uiga data-code, tutusa price. A faʻaaogaina le filifiliga muamua, e suʻe le elemene id lea e tutusa ma wtKI-price_aTab, ma le auala XPath i le elemene e foliga mai /div[1]/div/div/div[1]/div/span/span. Ole fesili XPath fa'apenei ile itulau ole a fai le togafiti, ae na'o le tasi. E mafai ona ou fai atu i le taimi nei lena id o le a suia i le isi taimi e utaina ai le itulau. Fa'asologa o uiga wtKI suiga malosi i taimi uma e utaina ai le itulau, o lea o le code e faʻaaogaina o le a leai se aoga pe a uma le isi itulau toe faʻaleleia. Faʻaalu sina taimi e malamalama ai XPath. O lenei malamalama o le a aoga ia te oe.

Ae ui i lea, e tatau ona maitauina o le kopiina o XPath filifilia e mafai ona aoga pe a galue i nofoaga faigofie, ma afai e te faʻalelei i lenei mea, e leai se mea e leaga ai.

Sei o tatou mafaufau i le mea e fai pe afai e te manaʻomia le mauaina uma o suʻesuʻega i le tele o laina, i totonu o se lisi. Faigofie tele. O i'uga ta'itasi o lo'o i totonu o se meafaitino ma se vasega resultWrapper. O le utaina o fa'ai'uga uma e mafai ona faia i se matasele e pei o le fa'aalia i lalo.

E tatau ona maitauina afai e te malamalama i mea o loʻo i luga, ona tatau lea ona e malamalama faigofie i le tele o tulafono o le a matou iloiloina. A o faʻagasolo lenei tulafono, matou te maua mea matou te manaʻomia (o le mea moni, o le elemene o loʻo afifi ai le iʻuga) e faʻaaoga ai se ituaiga o auala faʻapitoa (XPath). E faia lenei mea ina ia maua ai le tusitusiga o le elemene ma tuʻu i totonu o se mea e mafai ona faitau ai faʻamatalaga (muamua faʻaaoga flight_containers, ona - flights_list).

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
O laina muamua e tolu o loʻo faʻaalia ma e mafai ona tatou vaʻaia lelei mea uma tatou te manaʻomia. Ae ui i lea, e iai a matou auala sili atu ona manaia e maua ai faʻamatalaga. Matou te manaʻomia le ave faʻamatalaga mai elemene taʻitasi eseese.

Alu i le galuega!

O le auala pito sili ona faigofie e tusi ai se galuega o le utaina o faʻaiʻuga faaopoopo, o le mea lena o le a tatou amata ai. Ou te manaʻo e faʻateleina le numera o vaʻalele e maua e le polokalame faʻamatalaga e uiga i ai, e aunoa ma le faʻatupuina o masalosaloga i le auʻaunaga e oʻo atu ai i se asiasiga, o lea ou te kiliki ai le Faʻamauina atili faʻamaufaʻailoga i taimi uma e faʻaalia ai le itulau. I lenei code, e tatau ona e gauai atu i le poloka try, lea na ou faʻaopoopoina ona o nisi taimi e le lelei le utaina o le ki. Afai e te fa'afeiloa'i fo'i i lenei mea, fa'ailoa mai vala'au i lenei galuega i le code function start_kayak, lea o le a tatou tilotilo i ai i lalo.

# Загрузка большего количества результатов для того, чтобы максимизировать объём собираемых данных
def load_more():
    try:
        more_results = '//a[@class = "moreButton"]'
        driver.find_element_by_xpath(more_results).click()
        # Вывод этих заметок в ходе работы программы помогает мне быстро выяснить то, чем она занята
        print('sleeping.....')
        sleep(randint(45,60))
    except:
        pass

I le taimi nei, pe a maeʻa se suʻesuʻega umi o lenei galuega (o nisi taimi e mafai ona ou aveʻesea), ua matou sauni e faʻaalia se galuega e vaʻaia ai le itulau.

Ua uma ona ou aoina le tele o mea o loʻo manaʻomia i le galuega lea e taʻua page_scrape. O nisi taimi e tuʻufaʻatasia faʻamaumauga o auala toe foʻi mai, o lea ou te faʻaogaina ai se auala faigofie e vavae ese ai. Mo se faʻataʻitaʻiga, pe a ou faʻaogaina fesuiaiga mo le taimi muamua section_a_list и section_b_list. O la matou galuega e toe faʻafoʻi ai se faʻamatalaga faʻamatalaga flights_df, o lenei mea e mafai ai ona tatou vavaeeseina iʻuga na maua mai auala eseese faʻavasega faʻamaumauga ma mulimuli ane tuʻufaʻatasia.

def page_scrape():
    """This function takes care of the scraping part"""
    
    xp_sections = '//*[@class="section duration"]'
    sections = driver.find_elements_by_xpath(xp_sections)
    sections_list = [value.text for value in sections]
    section_a_list = sections_list[::2] # так мы разделяем информацию о двух полётах
    section_b_list = sections_list[1::2]
    
    # Если вы наткнулись на reCaptcha, вам может понадобиться что-то предпринять.
    # О том, что что-то пошло не так, вы узнаете исходя из того, что вышеприведённые списки пусты
    # это выражение if позволяет завершить работу программы или сделать ещё что-нибудь
    # тут можно приостановить работу, что позволит вам пройти проверку и продолжить скрапинг
    # я использую тут SystemExit так как хочу протестировать всё с самого начала
    if section_a_list == []:
        raise SystemExit
    
    # Я буду использовать букву A для уходящих рейсов и B для прибывающих
    a_duration = []
    a_section_names = []
    for n in section_a_list:
        # Получаем время
        a_section_names.append(''.join(n.split()[2:5]))
        a_duration.append(''.join(n.split()[0:2]))
    b_duration = []
    b_section_names = []
    for n in section_b_list:
        # Получаем время
        b_section_names.append(''.join(n.split()[2:5]))
        b_duration.append(''.join(n.split()[0:2]))

    xp_dates = '//div[@class="section date"]'
    dates = driver.find_elements_by_xpath(xp_dates)
    dates_list = [value.text for value in dates]
    a_date_list = dates_list[::2]
    b_date_list = dates_list[1::2]
    # Получаем день недели
    a_day = [value.split()[0] for value in a_date_list]
    a_weekday = [value.split()[1] for value in a_date_list]
    b_day = [value.split()[0] for value in b_date_list]
    b_weekday = [value.split()[1] for value in b_date_list]
    
    # Получаем цены
    xp_prices = '//a[@class="booking-link"]/span[@class="price option-text"]'
    prices = driver.find_elements_by_xpath(xp_prices)
    prices_list = [price.text.replace('$','') for price in prices if price.text != '']
    prices_list = list(map(int, prices_list))

    # stops - это большой список, в котором первый фрагмент пути находится по чётному индексу, а второй - по нечётному
    xp_stops = '//div[@class="section stops"]/div[1]'
    stops = driver.find_elements_by_xpath(xp_stops)
    stops_list = [stop.text[0].replace('n','0') for stop in stops]
    a_stop_list = stops_list[::2]
    b_stop_list = stops_list[1::2]

    xp_stops_cities = '//div[@class="section stops"]/div[2]'
    stops_cities = driver.find_elements_by_xpath(xp_stops_cities)
    stops_cities_list = [stop.text for stop in stops_cities]
    a_stop_name_list = stops_cities_list[::2]
    b_stop_name_list = stops_cities_list[1::2]
    
    # сведения о компании-перевозчике, время отправления и прибытия для обоих рейсов
    xp_schedule = '//div[@class="section times"]'
    schedules = driver.find_elements_by_xpath(xp_schedule)
    hours_list = []
    carrier_list = []
    for schedule in schedules:
        hours_list.append(schedule.text.split('n')[0])
        carrier_list.append(schedule.text.split('n')[1])
    # разделяем сведения о времени и о перевозчиках между рейсами a и b
    a_hours = hours_list[::2]
    a_carrier = carrier_list[1::2]
    b_hours = hours_list[::2]
    b_carrier = carrier_list[1::2]

    
    cols = (['Out Day', 'Out Time', 'Out Weekday', 'Out Airline', 'Out Cities', 'Out Duration', 'Out Stops', 'Out Stop Cities',
            'Return Day', 'Return Time', 'Return Weekday', 'Return Airline', 'Return Cities', 'Return Duration', 'Return Stops', 'Return Stop Cities',
            'Price'])

    flights_df = pd.DataFrame({'Out Day': a_day,
                               'Out Weekday': a_weekday,
                               'Out Duration': a_duration,
                               'Out Cities': a_section_names,
                               'Return Day': b_day,
                               'Return Weekday': b_weekday,
                               'Return Duration': b_duration,
                               'Return Cities': b_section_names,
                               'Out Stops': a_stop_list,
                               'Out Stop Cities': a_stop_name_list,
                               'Return Stops': b_stop_list,
                               'Return Stop Cities': b_stop_name_list,
                               'Out Time': a_hours,
                               'Out Airline': a_carrier,
                               'Return Time': b_hours,
                               'Return Airline': b_carrier,                           
                               'Price': prices_list})[cols]
    
    flights_df['timestamp'] = strftime("%Y%m%d-%H%M") # время сбора данных
    return flights_df

Sa ou taumafai e faaigoa ia fesuiaiga ina ia malamalama le code. Manatua o fesuiaiga e amata ile a e auai i le vaega muamua o le ala, ma b - i le lona lua. Sei o tatou agai atu i le isi galuega.

Auala lagolago

Ua i ai nei le matou galuega e mafai ai ona matou utaina faʻamatalaga suʻesuʻega faaopoopo ma se galuega e faʻatautaia ai na taunuuga. O lenei tusiga e mafai ona muta iinei, talu ai o nei galuega e lua e maua ai mea uma e te manaʻomia e vaʻai ai itulau e mafai ona e tatalaina oe lava. Ae matou te leʻi mafaufauina nisi o auala fesoasoani na talanoaina i luga. Mo se faʻataʻitaʻiga, o le code lea mo le lafoina o imeli ma nisi mea. O nei mea uma e mafai ona maua i le galuega start_kayak, lea o le a tatou iloiloina nei.

Mo lenei galuega e galue, e te manaʻomia faʻamatalaga e uiga i taulaga ma aso. O le fa'aogaina o lenei fa'amatalaga, e fai ai se so'oga i se fesuiaiga kayak, lea e fa'aaogaina e ave ai oe i se itulau o le a iai fa'ai'uga su'esu'e fa'avasega e ala i le latou fa'atusa sili ma le fesili. A maeʻa le sauniga muamua, matou te galulue faʻatasi ma tau i le laulau i le pito i luga o le itulau. O lona uiga, o le a matou maua le tau maualalo o tiketi ma le tau masani. O nei mea uma, faʻatasi ai ma le vaʻaiga na tuʻuina atu e le saite, o le a lafoina ile imeli. I luga o le itulau, o le laulau fetaui e tatau ona i le tulimanu agavale pito i luga. O le galue i lenei laulau, i le ala, e mafai ona mafua ai se mea sese pe a suʻeina le faʻaaogaina o aso tonu, talu ai i lenei tulaga e le o faʻaalia le laulau i luga o le itulau.

def start_kayak(city_from, city_to, date_start, date_end):
    """City codes - it's the IATA codes!
    Date format -  YYYY-MM-DD"""
    
    kayak = ('https://www.kayak.com/flights/' + city_from + '-' + city_to +
             '/' + date_start + '-flexible/' + date_end + '-flexible?sort=bestflight_a')
    driver.get(kayak)
    sleep(randint(8,10))
    
    # иногда появляется всплывающее окно, для проверки на это и его закрытия можно воспользоваться блоком try
    try:
        xp_popup_close = '//button[contains(@id,"dialog-close") and contains(@class,"Button-No-Standard-Style close ")]'
        driver.find_elements_by_xpath(xp_popup_close)[5].click()
    except Exception as e:
        pass
    sleep(randint(60,95))
    print('loading more.....')
    
#     load_more()
    
    print('starting first scrape.....')
    df_flights_best = page_scrape()
    df_flights_best['sort'] = 'best'
    sleep(randint(60,80))
    
    # Возьмём самую низкую цену из таблицы, расположенной в верхней части страницы
    matrix = driver.find_elements_by_xpath('//*[contains(@id,"FlexMatrixCell")]')
    matrix_prices = [price.text.replace('$','') for price in matrix]
    matrix_prices = list(map(int, matrix_prices))
    matrix_min = min(matrix_prices)
    matrix_avg = sum(matrix_prices)/len(matrix_prices)
    
    print('switching to cheapest results.....')
    cheap_results = '//a[@data-code = "price"]'
    driver.find_element_by_xpath(cheap_results).click()
    sleep(randint(60,90))
    print('loading more.....')
    
#     load_more()
    
    print('starting second scrape.....')
    df_flights_cheap = page_scrape()
    df_flights_cheap['sort'] = 'cheap'
    sleep(randint(60,80))
    
    print('switching to quickest results.....')
    quick_results = '//a[@data-code = "duration"]'
    driver.find_element_by_xpath(quick_results).click()  
    sleep(randint(60,90))
    print('loading more.....')
    
#     load_more()
    
    print('starting third scrape.....')
    df_flights_fast = page_scrape()
    df_flights_fast['sort'] = 'fast'
    sleep(randint(60,80))
    
    # Сохранение нового фрейма в Excel-файл, имя которого отражает города и даты
    final_df = df_flights_cheap.append(df_flights_best).append(df_flights_fast)
    final_df.to_excel('search_backups//{}_flights_{}-{}_from_{}_to_{}.xlsx'.format(strftime("%Y%m%d-%H%M"),
                                                                                   city_from, city_to, 
                                                                                   date_start, date_end), index=False)
    print('saved df.....')
    
    # Можно следить за тем, как прогноз, выдаваемый сайтом, соотносится с реальностью
    xp_loading = '//div[contains(@id,"advice")]'
    loading = driver.find_element_by_xpath(xp_loading).text
    xp_prediction = '//span[@class="info-text"]'
    prediction = driver.find_element_by_xpath(xp_prediction).text
    print(loading+'n'+prediction)
    
    # иногда в переменной loading оказывается эта строка, которая, позже, вызывает проблемы с отправкой письма
    # если это прозошло - меняем её на "Not Sure"
    weird = '¯_(ツ)_/¯'
    if loading == weird:
        loading = 'Not sure'
    
    username = '[email protected]'
    password = 'YOUR PASSWORD'

    server = smtplib.SMTP('smtp.outlook.com', 587)
    server.ehlo()
    server.starttls()
    server.login(username, password)
    msg = ('Subject: Flight Scrapernn
Cheapest Flight: {}nAverage Price: {}nnRecommendation: {}nnEnd of message'.format(matrix_min, matrix_avg, (loading+'n'+prediction)))
    message = MIMEMultipart()
    message['From'] = '[email protected]'
    message['to'] = '[email protected]'
    server.sendmail('[email protected]', '[email protected]', msg)
    print('sent email.....')

Na ou faʻataʻitaʻiina lenei tusitusiga e faʻaaoga ai se Outlook account (hotmail.com). Ou te leʻi faʻataʻitaʻiina e galue saʻo ma se faʻamatalaga Gmail, o lenei faiga imeli e lauiloa tele, ae e tele filifiliga e mafai. Afai e te faʻaaogaina se faʻamatalaga Hotmail, ona mafai lea ona galue mea uma, e tatau lava ona e faʻapipiʻi au faʻamatalaga i totonu o le code.

Afai e te manaʻo e malamalama i le mea tonu o loʻo faia i vaega patino o le code mo lenei galuega, e mafai ona e kopiina ma faʻataʻitaʻi ma i latou. O le fa'ata'ita'i i le fa'ailoga e na'o le pau lea o le auala e malamalama moni ai.

Faiga saunia

O lea la ua uma ona matou faia mea uma na matou talanoa ai, e mafai ona matou fatuina se matasele faigofie e taʻua ai a matou galuega. O loʻo talosagaina e le tusitusiga faʻamatalaga mai le tagata faʻaoga e uiga i taulaga ma aso. Pe a faʻataʻitaʻi ma le toe amataina faifaipea o le tusitusiga, atonu e te le manaʻo e faʻapipiʻi ma le lima lenei faʻamatalaga i taimi uma, o le mea lea o laina tutusa, mo le umi o suʻega, e mafai ona faʻaalia e ala i le le faʻailoaina o mea o loʻo i lalo ifo, lea e manaʻomia ai faʻamatalaga e le o tusitusiga e fa'amalo.

city_from = input('From which city? ')
city_to = input('Where to? ')
date_start = input('Search around which departure date? Please use YYYY-MM-DD format only ')
date_end = input('Return when? Please use YYYY-MM-DD format only ')

# city_from = 'LIS'
# city_to = 'SIN'
# date_start = '2019-08-21'
# date_end = '2019-09-07'

for n in range(0,5):
    start_kayak(city_from, city_to, date_start, date_end)
    print('iteration {} was complete @ {}'.format(n, strftime("%Y%m%d-%H%M")))
    
    # Ждём 4 часа
    sleep(60*60*4)
    print('sleep finished.....')

E fa'apea le fa'ata'ita'iga o le fa'ata'ita'iga.
Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga
Fa'ata'ita'iga le fa'ata'ita'iga

O taunuʻuga

Afai ua e ausia lenei mamao, faamalo! O lo'o i ai nei lau su'ega upega tafa'ilagi galue, e ui lava ua mafai ona ou va'ai i le tele o auala e fa'aleleia ai. Mo se faʻataʻitaʻiga, e mafai ona tuʻufaʻatasia ma Twilio ina ia auina atu feʻau tusitusia nai lo imeli. E mafai ona e faʻaogaina se VPN poʻo se isi mea e maua i le taimi e tasi faʻaiʻuga mai le tele o sapalai. O loʻo iai foʻi se faʻafitauli e tulaʻi mai i lea taimi ma lea taimi i le siakiina o le tagata faʻaoga nofoaga e vaʻai pe o ia o se tagata, ae mafai foi ona foia lenei faafitauli. I soo se tulaga, o lea ua i ai sau faavae e mafai ona e faʻalauteleina pe a e manaʻo ai. Mo se faʻataʻitaʻiga, ia mautinoa o loʻo auina atu se faila Excel i le tagata faʻaoga e fai ma faʻapipiʻi i se imeli.

Python - o se fesoasoani i le sailia o tiketi vaalele taugofie mo i latou e fiafia e faimalaga

Na'o tagata fa'aigoaina e mafai ona auai i le su'esu'ega. Saini ese j, faʻamolemole.

E te fa'aogaina tekinolosi e su'e ai upega tafa'ilagi?

  • lea

  • leai

8 tagata fa'aoga na palota. 1 tagata fa'aoga fa'ate'aina.

puna: www.habr.com

Faaopoopo i ai se faamatalaga