Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie
He aha nga maataapono ka hangaia he Putunga Raraunga pai?

Te aro ki te uara pakihi me te tātari i te kore o te waehere boilerplate. Te whakahaere i te DWH hei turanga waehere: te whakaputa, te arotake, te whakamatautau aunoa me te CI. Ko te tauira, te whanui, te puna tuwhera me te hapori. Ko nga tuhinga ratarata-kaiwhakamahi me te tirohanga ti'aturi (Raunga Raraunga).

He korero ano mo enei mea katoa me te mahi a DBT i roto i te Raraunga Raraunga Nui me te rauwiringa kaiao - nau mai ki te ngeru.

Tena koutou katoa

Kei te pa atu a Artemy Kozyr. Neke atu i te 5 tau e mahi ana ahau me nga whare putunga raraunga, te hanga ETL/ELT, me te tātari raraunga me te tirohanga. Kei te mahi ahau i tenei wa wira, Ka whakaako ahau i tetahi akoranga i OTUS Kaituhi Raraunga, a i tenei ra e hiahia ana ahau ki te whakapuaki ki a koe i tetahi tuhinga i tuhia e ahau mo te timatanga whakaurunga hou mo te akoranga.

Te arotake poto

Ko te anga DBT e pa ana ki te T i roto i te acronym ELT (Extract - Transform - Utaina).

I te taenga mai o nga putunga raraunga whai hua me te tauineine penei i a BigQuery, Redshift, Snowflake, kaore he take ki te whakarereke i waho o te Whare Putunga Raraunga. 

Kaore a DBT e tango i nga raraunga mai i nga punawai, engari e whakarato ana i nga waahi nui mo te mahi me nga raraunga kua utaina ki roto i te Rokiroki (i roto i te Rokiroki Roto, o waho ranei).

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie
Ko te kaupapa matua o DBT he tango i te waehere, whakahiato ki te SQL, mahia nga whakahau i roto i te raupapa tika i roto i te Repository.

Hanganga Kaupapa DBT

Ko te kaupapa he whaiaronga me nga konae e 2 noa nga momo:

  • Tauira (.sql) - he waeine o te panoni e whakaatuhia ana e te uiui SELECT
  • Kōnae whirihoranga (.yml) - tawhā, tautuhinga, whakamātautau, tuhinga

I te taumata taketake, ko te hanganga o te mahi e whai ake nei:

  • Ka whakarite te kaiwhakamahi i te waehere tauira ki tetahi IDE watea
  • Ma te whakamahi i te CLI, ka whakarewahia nga tauira, ka whakahiatohia e DBT te waehere tauira ki SQL
  • Ko te waehere SQL kua whakahiato ka mahia i roto i te Rokiroki i roto i tetahi raupapa (kauwhata)

Anei te ahua o te rere mai i te CLI:

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie

Ko nga mea katoa he KORERO

He ahua whakamate tenei o te anga Utauta Hanga Raraunga. Arā, ka tangohia e te DBT nga waehere katoa e pa ana ki te hanga i o patai ki roto i te Toa (he rereke mai i nga whakahau WHAKATAHI, WHAKATOHU, WHAKAUPAPA, TE WHAKAMAHI, KAUPAPA, KAUPAPA, ...).

Ko tetahi tauira ka uru ki te tuhi i tetahi patai SELECT e tautuhi ana i te huinga raraunga ka puta.

I tenei keehi, ka taea e te arorau huringa te taumata-maha me te whakakotahi i nga raraunga mai i etahi atu tauira. He tauira o te tauira ka hanga he whakaaturanga ota (f_orders):

{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
 
with orders as (
 
   select * from {{ ref('stg_orders') }}
 
),
 
order_payments as (
 
   select * from {{ ref('order_payments') }}
 
),
 
final as (
 
   select
       orders.order_id,
       orders.customer_id,
       orders.order_date,
       orders.status,
       {% for payment_method in payment_methods -%}
       order_payments.{{payment_method}}_amount,
       {% endfor -%}
       order_payments.total_amount as amount
   from orders
       left join order_payments using (order_id)
 
)
 
select * from final

He aha nga mea whakamiharo ka kitea i konei?

Tuatahi: Kua whakamahia te CTE (Whakaahua Ripanga Taunoa) - ki te whakarite me te mohio ki te waehere he maha nga huringa me te arorau pakihi.

Tuarua: Ko te waehere tauira he ranunga o te SQL me te reo Jinja (te reo tauira).

Ko te tauira e whakamahi ana i te koropiko hoki ki te whakaputa i te moni mo ia tikanga utu kua tohua ki te korero huinga. Ka whakamahia hoki te mahi tohutoro — te kaha ki te tohutoro etahi atu tauira i roto i te waehere:

  • I te wa whakahiato tohutoro ka tahuri ki te tohu tohu ki te ripanga, ki te tirohanga ranei i roto i te Rokiroki
  • tohutoro ka taea e koe te hanga kauwhata whakawhirinakitanga tauira

Koinei Jinja ka taapiri i nga waahanga mutunga kore ki te DBT. Ko nga mea e tino whakamahia ana ko:

  • Mēnā / atu kōrero - ngā tauākī peka
  • Mo nga koropiko
  • Taurangi
  • Tonotono - hanga tonotono

Whakaritenga: Ripanga, Tirohanga, Whakanuia

Ko te rautaki whakakikoruatia he huarahi e rite ana ki te penapena i nga huinga raraunga tauira ka puta ki te Rokiroki.

I roto i nga tikanga taketake ko:

  • Ripanga - tepu tinana i te Rokiroki
  • Tiro - tiro, ripanga mariko i Rokiroki

He maha atu ano nga rautaki whakawhiwhi taonga:

  • Whakanuia - pikinga te uta (o nga ripanga meka nui); ka taapirihia nga raina hou, ka whakarerekehia nga raina, ka whakakorehia nga raina kua mukua 
  • Ephemeral - karekau te tauira e puta tika, engari ka uru hei CTE ki etahi atu tauira
  • Ko etahi atu rautaki ka taea e koe te taapiri i a koe ano

I tua atu i nga rautaki whakauru, he whai waahi mo te arotautanga mo nga Rokiroki motuhake, hei tauira:

  • Snowflake: Nga ripanga noho noa, Hanumi te whanonga, te whakaropu ripanga, te kape i nga tahua, nga tirohanga haumaru
  • Redshift: Distkey, Sortkey (interleaved, compound), Rore Herehere Tirohanga
  • BigQuery: Wehewehenga ripanga me te whakaropu, Hanumi te whanonga, Whakamuna KMS, Tapanga me nga Tohu
  • korakora: Hōputu kōnae (parquet, csv, json, orc, delta), wehewehe_ma, rapoi_ma, peere, rautaki_whakapiki

Ko nga Rokiroki e whai ake nei kei te tautokohia inaianei:

  • Panui
  • Redshift
  • BigQuery
  • Snowflake
  • Presto (he wahanga)
  • Korakora (he wahanga)
  • Microsoft SQL Server (uruuru hapori)

Kia pai ake ta tatou tauira:

  • Kia whakanuia tona whakakī (Incremental)
  • Taapirihia nga taviri wehewehenga me te komaka mo Redshift

-- Конфигурация модели: 
-- Инкрементальное наполнение, уникальный ключ для обновления записей (unique_key)
-- Ключ сегментации (dist), ключ сортировки (sort)
{{
  config(
       materialized='incremental',
       unique_key='order_id',
       dist="customer_id",
       sort="order_date"
   )
}}
 
{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
 
with orders as (
 
   select * from {{ ref('stg_orders') }}
   where 1=1
   {% if is_incremental() -%}
       -- Этот фильтр будет применен только для инкрементального запуска
       and order_date >= (select max(order_date) from {{ this }})
   {%- endif %} 
 
),
 
order_payments as (
 
   select * from {{ ref('order_payments') }}
 
),
 
final as (
 
   select
       orders.order_id,
       orders.customer_id,
       orders.order_date,
       orders.status,
       {% for payment_method in payment_methods -%}
       order_payments.{{payment_method}}_amount,
       {% endfor -%}
       order_payments.total_amount as amount
   from orders
       left join order_payments using (order_id)
 
)
 
select * from final

Kauwhata whakawhirinakitanga tauira

He rakau whakawhirinaki hoki. Kei te mohiotia ano ko DAG (Directed Acyclic Graph).

Ka hangaia e DBT he kauwhata i runga i te whirihoranga o nga tauira kaupapa katoa, he hononga ref() i roto i nga tauira ki etahi atu tauira. Ma te whai kauwhata ka taea e koe nga mahi e whai ake nei:

  • Te whakahaere tauira i te raupapa tika
  • Te whakarara o te hanganga o te toa toa
  • Te whakahaere kauwhata iti 

Tauira o te tirohanga kauwhata:

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie
Ko ia node o te kauwhata he tauira; ko nga tapa o te kauwhata kua tohua e te kupu ref.

Kounga Raraunga me te Tuhituhi

I tua atu i te whakaputa i nga tauira ake, ka taea e DBT te whakamatautau i te maha o nga whakaaro mo te huinga raraunga ka puta, penei:

  • Ehara i te Null
  • Ahurei
  • Tikanga Tohutoro - te tapatahi tohutoro (hei tauira, id_tautoko i te ripanga ota e rite ana ki te id i te ripanga kaihoko)
  • Whakatauritehia te rarangi o nga uara e whakaaetia ana

Ka taea te taapiri i a koe ake whakamatautau (whakamatautau raraunga ritenga), penei, hei tauira, % te rereke o nga moni whiwhi me nga tohu mai i te ra, i te wiki, i te marama ki muri. Ko nga whakapae i hangaia hei patai SQL ka waiho hei whakamatautau.

Ma tenei ara, ka taea e koe te hopu i nga rereketanga me nga hapa kaore e hiahiatia ana i roto i nga raraunga i roto i nga matapihi Whare Putunga.

Mo nga tuhinga, ka whakarato a DBT i nga tikanga mo te taapiri, te whakaputa, me te tohatoha metadata me nga korero i te tauira tae noa ki nga taumata huanga. 

Anei te ahua o te taapiri i nga whakamatautau me nga tuhinga ki te taumata o te konae whirihoranga:

 - name: fct_orders
   description: This table has basic information about orders, as well as some derived facts based on payments
   columns:
     - name: order_id
       tests:
         - unique # проверка на уникальность значений
         - not_null # проверка на наличие null
       description: This is a unique identifier for an order
     - name: customer_id
       description: Foreign key to the customers table
       tests:
         - not_null
         - relationships: # проверка ссылочной целостности
             to: ref('dim_customers')
             field: customer_id
     - name: order_date
       description: Date (UTC) that the order was placed
     - name: status
       description: '{{ doc("orders_status") }}'
       tests:
         - accepted_values: # проверка на допустимые значения
             values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']

A koinei te ahua o tenei tuhinga i runga i te paetukutuku i hangaia:

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie

Tonotono me nga Kōwae

Ko te kaupapa o te DBT ehara i te mea nui ki te noho hei huinga o nga tuhinga SQL, engari ki te whakarato ki nga kaiwhakamahi he huarahi kaha me te whai hua mo te hanga i a raatau ake huringa me te tohatoha i enei waahanga.

Ko nga tonotono he huinga hanga me nga korero ka kiia he mahi i roto i nga tauira. Ko nga tonotono ka taea e koe te whakamahi ano i te SQL i waenga i nga tauira me nga kaupapa i runga i te maapono miihini DRY (Kaua e Tukuruatia koe).

Tauira Tonotono:

{% macro rename_category(column_name) %}
case
 when {{ column_name }} ilike  '%osx%' then 'osx'
 when {{ column_name }} ilike  '%android%' then 'android'
 when {{ column_name }} ilike  '%ios%' then 'ios'
 else 'other'
end as renamed_product
{% endmacro %}

Me ona whakamahinga:

{% set column_name = 'product' %}
select
 product,
 {{ rename_category(column_name) }} -- вызов макроса
from my_table

Ka tae mai a DBT me tetahi kaiwhakahaere kete ka taea e nga kaiwhakamahi te whakaputa me te whakamahi ano i nga waahanga me nga tonotono takitahi.

Ko te tikanga ka taea te uta me te whakamahi i nga whare pukapuka penei:

  • dbt_utils: te mahi tahi me te Ra/Wā, Kī whakakapi, whakamātautau Aronuinga, Kaurori/Unpivot me etahi atu
  • Ko nga tauira whakaaturanga kua rite mo nga ratonga penei i Snowplow и karawarawa 
  • Whare Pukapuka mo nga Toa Raraunga motuhake, hei tauira. Redshift 
  • Te takiuru — Kōwae mō te takiuru mahi DBT

He rarangi katoa o nga kete ka kitea i pae dbt.

Ara atu nga ahuatanga

I konei ka whakaahuahia e au etahi atu waahanga whakamere me nga whakatinanatanga e whakamahia ana e matou ko te roopu ki te hanga i tetahi Whare Putunga Raraunga wira.

Te wehenga o nga taiao wa mahi DEV - TEST - PROD

Ahakoa i roto i te roopu DWH kotahi (i roto i nga kaupapa rereke). Hei tauira, ma te whakamahi i te korero e whai ake nei:

with source as (
 
   select * from {{ source('salesforce', 'users') }}
   where 1=1
   {%- if target.name in ['dev', 'test', 'ci'] -%}           
       where timestamp >= dateadd(day, -3, current_date)   
   {%- endif -%}
 
)

Ko tenei waehere e kii ana: mo nga taiao dev, whakamatautau, ci tango raraunga anake mo nga ra e 3 kua pahure ake nei. Arā, ko te rere i roto i enei taiao ka tere ake, ka iti ake nga rauemi. I te wa e rere ana i runga i te taiao prod ka waiho te ahua tātari.

Te whakaurunga me te whakawaehere tīwae kē

Ko te Redshift he DBMS poupou e taea ai e koe te whakarite i nga huringa taapiri raraunga mo ia pou takitahi. Ko te whiriwhiri i nga algorithms tino pai ka taea te whakaiti i te mokowā kōpae ma te 20-50%.

Tonotono redshift.compress_table ka mahia te whakahau ANALYZE COMPRESSION, ka waihangahia he ripanga hou me nga algorithm whakawaehere pou e taunaki ana, nga taviri wehewehenga (dist_key) me nga taviri tohatoha (sort_key), whakawhiti i nga raraunga ki reira, a, ki te tika, mukua te kape tawhito.

Waitohu Tonotono:

{{ compress_table(schema, table,
                 drop_backup=False,
                 comprows=none|Integer,
                 sort_style=none|compound|interleaved,
                 sort_keys=none|List<String>,
                 dist_style=none|all|even,
                 dist_key=none|String) }}

Rere tauira takitaki

Ka taea e koe te whakapiri matau ki ia mahinga o te tauira, ka mahia i mua i te whakarewatanga, i muri tonu ranei i te otinga o te hanganga o te tauira:

   pre-hook: "{{ logging.log_model_start_event() }}"
   post-hook: "{{ logging.log_model_end_event() }}"

Ma te kōwae takiuru ka taea e koe te tuhi i nga metadata katoa e tika ana ki tetahi ripanga motuhake, ka taea te whakamahi ki te tirotiro me te tātari i nga putea pounamu.

Koinei te ahua o te papatohu i runga i nga raraunga takiuru i roto i te Looker:

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie

Aunoatanga o te Tiaki Rokiroki

Mena kei te whakamahi koe i etahi toronga o te mahi o te Puapu kua whakamahia, penei i te UDF (Nga Taumahi Kua Tautuhia e te Kaiwhakamahi), ko te whakaputanga o enei mahi, te mana uru, me te huri aunoa i nga putanga hou he tino watea ki te mahi i roto i te DBT.

Ka whakamahi matou i te UDF i roto i te Python ki te tatau i nga tohu, nga rohe imeera, me te wetewete bitmask.

He tauira o te tonotono hei hanga UDF i runga i tetahi taiao mahi (dev, test, prod):

{% macro create_udf() -%}
 
 {% set sql %}
       CREATE OR REPLACE FUNCTION {{ target.schema }}.f_sha256(mes "varchar")
           RETURNS varchar
           LANGUAGE plpythonu
           STABLE
       AS $$  
           import hashlib
           return hashlib.sha256(mes).hexdigest()
       $$
       ;
 {% endset %}
  
 {% set table = run_query(sql) %}
 
{%- endmacro %}

I Wheely ka whakamahia e matou a Amazon Redshift, kei runga i te PostgreSQL. Mo Redshift, he mea nui ki te kohikohi i nga tatauranga i runga i nga ripanga me te waatea mo te mokowā kōpae - ko nga whakahau WHAKATAHI me te VACUUM.

Hei mahi i tenei, ka mahia nga whakahau mai i te tonotono redshift_maintenance ia po:

{% macro redshift_maintenance() %}
 
   {% set vacuumable_tables=run_query(vacuumable_tables_sql) %}
 
   {% for row in vacuumable_tables %}
       {% set message_prefix=loop.index ~ " of " ~ loop.length %}
 
       {%- set relation_to_vacuum = adapter.get_relation(
                                               database=row['table_database'],
                                               schema=row['table_schema'],
                                               identifier=row['table_name']
                                   ) -%}
       {% do run_query("commit") %}
 
       {% if relation_to_vacuum %}
           {% set start=modules.datetime.datetime.now() %}
           {{ dbt_utils.log_info(message_prefix ~ " Vacuuming " ~ relation_to_vacuum) }}
           {% do run_query("VACUUM " ~ relation_to_vacuum ~ " BOOST") %}
           {{ dbt_utils.log_info(message_prefix ~ " Analyzing " ~ relation_to_vacuum) }}
           {% do run_query("ANALYZE " ~ relation_to_vacuum) %}
           {% set end=modules.datetime.datetime.now() %}
           {% set total_seconds = (end - start).total_seconds() | round(2)  %}
           {{ dbt_utils.log_info(message_prefix ~ " Finished " ~ relation_to_vacuum ~ " in " ~ total_seconds ~ "s") }}
       {% else %}
           {{ dbt_utils.log_info(message_prefix ~ ' Skipping relation "' ~ row.values() | join ('"."') ~ '" as it does not exist') }}
       {% endif %}
 
   {% endfor %}
 
{% endmacro %}

DBT Kapua

Ka taea te whakamahi i te DBT hei ratonga (Ratonga Whakahaere). Kei roto:

  • IDE Tukutuku mo te whakawhanake kaupapa me nga tauira
  • Te whirihoranga me te whakarite mahi
  • He ngawari, he waatea hoki te uru ki nga raarangi
  • Paetukutuku me nga tuhinga o to kaupapa
  • Hononga CI (Whakakotahi Tonu)

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie

mutunga

Ko te whakarite me te kai i te DWH ka rite ki te ngahau me te whai hua ki te inu i te maeneene. Kei roto i te DBT a Jinja, nga toronga kaiwhakamahi (kowae), he kaikoipi, he kaikorero, he kaiwhakahaere kete. Ma te whakakotahi i enei huānga ka whiwhi koe i tetahi taiao mahi katoa mo to Putunga Raraunga. Karekau he huarahi pai ake ki te whakahaere huringa i roto i te DWH i enei ra.

Utauta Hanga Raraunga he aha noa i waenga i te Whare Putunga Raraunga me te Smoothie

Ko nga whakapono i whaihia e nga kaihanga o DBT e whai ake nei:

  • Ko te Waehere, ehara i te GUI, ko te tangohanga pai rawa atu mo te whakaatu i te arorau tātari matatini
  • Ko te mahi me nga raraunga me urutau ki nga mahi pai rawa atu i roto i te hanga rorohiko (Software Engineering)

  • Ko nga hanganga raraunga tino nui me whakahaere e te hapori kaiwhakamahi hei punaha punaha tuwhera
  • Ehara i te mea ko nga taputapu tātari anake, engari ko te waehere ka piki ake hei taonga mo te hapori Open Source

Ko enei tino whakapono kua hua mai he hua e whakamahia ana e nga kamupene 850 neke atu i tenei ra, a ko enei te turanga o te maha o nga toronga whakaihiihi ka mahia a muri ake nei.

Mo te hunga e hiahia ana, he ataata o te akoranga tuwhera i hoatu e au i etahi marama ki muri hei waahanga o te akoranga tuwhera i OTUS - Utauta Hanga Raraunga mo Amazon Redshift Rokiroki.

I tua atu i te DBT me te Raraunga Putunga Raraunga, hei wahanga o te akoranga Raraunga Engineer i runga i te papaaho OTUS, ka whakaako ahau me aku hoa mahi i nga karaehe mo etahi atu kaupapa e tika ana, hou hoki:

  • Nga Tikanga Hangahanga mo nga Tono Raraunga Nui
  • Parakatihi me te Spark and Spark Streaming
  • Te torotoro i nga tikanga me nga taputapu mo te uta i nga puna raraunga
  • Te hanga whakaaturanga tātari i DWH
  • Nga ariā NoSQL: HBase, Cassandra, ElasticSearch
  • Ko nga maataapono o te aro turuki me te whakahiato 
  • Kaupapa Whakamutunga: te whakakotahi i nga pukenga katoa i raro i te tautoko kaiakopono

He Tohutoro:

  1. Tuhinga DBT - Kupu Whakataki — Tuhinga whaimana
  2. He aha tonu te dbt? — Arotake tuhinga na tetahi o nga kaituhi o DBT 
  3. Utauta Hanga Raraunga mo Amazon Redshift Rokiroki — YouTube, Te tuhi i tetahi akoranga tuwhera OTUS
  4. Te mohio ki a Greenplum — Ko te akoranga tuwhera ka whai ake ko Mei 15, 2020
  5. Akoranga Hangarau Raraunga —OTUS
  6. Hangaia he Rerengamahi Taatari Pakeke — He tirohanga ki te heke mai o nga raraunga me nga tātaritanga
  7. Kua tae ki te wa mo te wetewete puna tuwhera — Te whanaketanga o te tātari me te awe o Open Source
  8. Whakauru Tonu me te Whakamatau Hanga Aunoa me te dbtCloud — Nga tikanga o te hanga CI ma te whakamahi i te DBT
  9. Te tiimata me te akoranga DBT — Parakatihi, Nga tohutohu taahiraa-taahiraa mo te mahi takitahi
  10. Toa Jaffle — Github DBT Tutorial — Github, waehere kaupapa matauranga

Ako atu mo te akoranga.

Source: will.com

Tāpiri i te kōrero