O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

I le suʻeina o le R poʻo le Python i luga o le Initaneti, o le ae mauaina le faitau miliona o tala ma kilomita o talanoaga i luga o le autu e sili atu, vave ma sili atu ona faigofie mo le galue ma faʻamatalaga. Ae paga lea, o nei tala uma ma feeseeseaiga e le aoga tele.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

O le fa'amoemoega o lenei tusiga o le fa'atusatusaina lea o faiga fa'avae fa'amaumauga i totonu o afifi sili ona lauiloa o gagana e lua. Ma fesoasoani i le aufaitau ia vave ona iloa se mea latou te leʻi iloa. Mo i latou e tusitusi i le Python, saili pe faʻafefea ona fai le mea lava e tasi i le R, ma le isi itu.

I le taimi o le tusiga o le a tatou iloiloina le syntax o afifi sili ona lauiloa i R. O afifi ia o loʻo aofia i totonu o le faletusi tidyversema le afifi foi data.table. Ma faatusatusa la latou syntax ma pandas, o le faʻamatalaga faʻamatalaga sili ona lauiloa i le Python.

O le a matou o atu i lea laasaga ma lea laasaga i le ala atoa o suʻesuʻega faʻamaumauga mai le utaina i le faʻatinoina o galuega faʻamalama suʻesuʻe e faʻaaoga ai le Python ma le R.

Mataupu

O lenei tusiga e mafai ona faʻaaogaina e fai ma pepa faʻataʻitaʻi pe afai ua galo ia te oe le auala e faʻatino ai nisi faʻagaioiga faʻamaumauga i totonu o se tasi o afifi o loʻo iloiloina.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

  1. Ole eseesega ole syntax ile va ole R ma le Python
    1.1. Avanoa i Galuega Fa'aopoopo
    1.2. Tofiga
    1.3. Fa'asinomaga
    1.4. Metotia ma OOP
    1.5. Paipa
    1.6. Fa'asologa o Fa'amaumauga
  2. O nai upu e uiga i afifi o le a matou faʻaaogaina
    2.1. tidyverse
    2.2. fa'amaumauga. laulau
    2.3. pona
  3. Fa'apipi'i afifi
  4. utaina o Faamatalaga
  5. Fausia fa'amaumauga
  6. Filifilia o Koluma e te Manaomia
  7. Filifili laina
  8. Tu'ufa'atasiga ma Fa'atasiga
  9. Tu'ufa'atasiga tu'usa'o o laulau (UNION)
  10. Tu'ufa'atasiga fa'asaga i laulau (SO'I)
  11. Galuega fa'amalama autu ma koluma fuafuaina
  12. Lisi o fesoʻotaʻiga i le va o metotia faʻamatalaga i le R ma le Python
  13. iʻuga
  14. Se su'esu'ega pu'upu'u e uiga i le afifi e te fa'aogaina

Afai e te fiafia i suʻesuʻega faʻamatalaga, e mafai ona e mauaina laʻu telefoni и YouTube alavai. O le tele o mea o lo'o tu'uina atu i le gagana R.

Ole eseesega ole syntax ile va ole R ma le Python

Ina ia faafaigofieina mo oe ona sui mai le Python i le R, po o le isi itu, o le a ou tuuina atu ni nai manatu autu e tatau ona e gauai i ai.

Avanoa i Galuega Fa'aopoopo

O le taimi lava e utaina ai se afifi i le R, e te le manaʻomia le faʻamaonia o le igoa o le afifi e maua ai ana galuega. I le tele o tulaga e le masani ai i le R, ae e talia. E te le manaʻomia le faʻaulufaleina mai o se afifi pe afai e te manaʻomia se tasi o ana galuega i lau code, ae naʻo le valaʻau i le faʻamaonia o le igoa o le afifi ma le igoa o le galuega. O le vaeluaga i le va o afifi ma igoa o galuega i le R o se koluma lua. package_name::function_name().

I le Python, i se isi itu, e manatu e masani le valaʻau o galuega a se afifi e ala i le faʻamaonia manino o lona igoa. A la'u mai se afifi, e masani lava ona fa'apuupuu igoa, eg. pandas e masani ona fa'aogaina se fa'aigoa pd. E maua se galuega afifi e ala i se togi package_name.function_name().

Tofiga

I le R, e masani ona faʻaaoga se aū e tuʻuina atu ai se tau i se mea. obj_name <- value, e ui ina fa'atagaina se fa'ailoga tutusa e tasi, o le fa'ailoga tutusa tutusa i le R e fa'aaoga muamua e pasi ai tau e fa'atino ai finauga.

I le Python, o le tofiga e faia fa'atasi ma se fa'ailoga tutusa e tasi obj_name = value.

Fa'asinomaga

E iai fo'i eseesega taua tele iinei. I le R, o le faasino igoa e amata i le tasi ma e aofia uma ai elemene faʻamaonia i le faʻasologa o taunuuga,

I le Python, o le faasino igoa e amata mai le zero ma o le vaega filifilia e le aofia ai le elemene mulimuli o loʻo faʻamaonia i le faasino igoa. O lea mamanu x[i:j] i le Python o le a le aofia ai le elemene j.

E iai fo'i 'ese'esega ile fa'asinomaga leaga, ile fa'amatalaga R x[-1] o le a toe faʻafoʻi uma elemene o le vector sei vagana ai le mea mulimuli. I le Python, o se faʻamatalaga tutusa o le a toe faʻafoʻi naʻo le elemene mulimuli.

Metotia ma OOP

R faʻaaogaina OOP i lana lava auala, na ou tusia e uiga i lenei mea i le tusiga "OOP i le gagana R (vaega 1): S3 vasega". I se tulaga lautele, o le R o se gagana galue, ma o mea uma i totonu o loʻo fausia i luga o galuega. O le mea lea, mo se faʻataʻitaʻiga, mo tagata faʻaoga Excel, alu i tydiverse o le a faigofie atu nai lo pandas. E ui atonu o lo'u manatu patino lea.

I se faapuupuuga, o mea i le R e leai ni metotia (pe a tatou talanoa e uiga i vasega S3, ae o loʻo i ai isi faʻatinoga o le OOP e sili atu ona taatele). E na'o galuega lautele e fa'agasolo ese'ese e fa'atatau ile vasega ole mea.

Paipa

Atonu o le igoa lea pandas O le a le sa'o atoatoa, ae o le a ou taumafai e faamalamalama le uiga.

Ina ia aua neʻi faʻasaoina faʻatusatusaga vaeluaga ma e le maua ai ni mea e le manaʻomia i le siosiomaga faigaluega, e mafai ona e faʻaogaina se ituaiga paipa. O na. pasi le fa'ai'uga o se fa'atatauga mai le tasi galuega i le isi, ma 'aua le fa'asaoina i'uga vaeluagalemu.

Se'i o tatou ave le fa'ata'ita'iga fa'ailoga nei, lea tatou te teuina ai fa'atatauga vaeluagalemu i mea eseese:

temp_object <- func1()
temp_object2 <- func2(temp_object )
obj <- func3(temp_object2 )

Sa matou faia faagaioiga se 3 faasolosolo, ma o le taunuuga o ia mea taitasi na sefe i se mea ese. Ae o le mea moni, matou te le manaʻomia nei mea vavalalata.

Pe sili atu le leaga, ae sili atu ona masani i tagata faʻaoga Excel.

obj  <- func3(func2(func1()))

I lenei tulaga, matou te leʻi faʻasaoina iʻuga o le faʻatusatusaga, ae o le faitauina o code faʻatasi ai ma galuega faʻapipiʻi e matua le faigofie.

O le a tatou vaʻavaʻai i le tele o auala i le faʻaogaina o faʻamaumauga i le R, ma latou faʻatinoina faʻatinoga tutusa i auala eseese.

Pipeline i le faletusi tidyverse fa'atinoina e le fa'afoe %>%.

obj <- func1() %>% 
            func2() %>%
            func3()

O lea tatou te ave le taunuuga o le galuega func1() ma pasi e fai ma finauga muamua i func2(), ona tatou pasia lea o le taunuuga o lenei faʻatusatusaga o le finauga muamua func3(). Ma i le faaiuga, matou te tusia uma faʻatusatusaga na faia i totonu o le mea obj <-.

O mea uma o loʻo i luga o loʻo faʻaalia sili atu nai lo upu e lenei meme:
O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

В data.table e fa'aogaina filifili i se auala fa'apea.

newDT <- DT[where, select|update|do, by][where, select|update|do, by][where, select|update|do, by]

I totonu o sikuea sikuea taʻitasi e mafai ona e faʻaogaina le taunuuga o le gaioiga muamua.

В pandas o ia gaioiga e tuueseese i se togi.

obj = df.fun1().fun2().fun3()

O na. matou ave la matou laulau df ma faʻaaoga lana metotia fun1(), ona matou faʻaaogaina lea o le metotia i le taunuuga na maua fun2()mulimuli ane fun3(). O le taunuuga e sefe i se mea faitino mea faitino .

Fa'asologa o Fa'amaumauga

O faʻamaumauga faʻamaumauga i le R ma le Python e tutusa, ae e eseese igoa.

faʻamatalaga
Igoa ile R
Igoa ile Python/pandas

Fa'atulagaina o laulau
fa'amatalaga.ava, fa'amaumauga. laulau, tibble
Fa'amaumauga

Lisi vaega tasi o tau
Veʻa
Faʻasologa i pandas poʻo le lisi ile Python mama

Fa'atulagaga e le fa'apipi'i fa'asologa e tele
Lisi
Lomifefiloi (dict)

O le a tatou vaʻavaʻai i isi vaega ma eseesega ile syntax i lalo.

O nai upu e uiga i afifi o le a matou faʻaaogaina

Muamua, o le a ou taʻu atu ia te oe se mea itiiti e uiga i afifi o le a e masani ai i lenei tusiga.

tidyverse

Официальный сайт: tidyverse.org
O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua
fale faitautusi tidyverse tusia e Hedley Wickham, Saienitisi Sinia Suesue i RStudio. tidyverse e aofia ai se seti mataʻina o afifi e faafaigofieina ai le faagasologa o faamatalaga, 5 o loʻo aofia i luga o le 10 pito i luga mai le CRAN repository.

O le totonugalemu o le faletusi e aofia ai afifi nei: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats. O nei afifi taʻitasi e faʻatatau i le foia o se faʻafitauli faʻapitoa. Faataitaiga dplyr faia mo le fa'aogaina o fa'amatalaga, tidyr ia aumaia faʻamatalaga i se foliga mama, stringr fa'afaigofie le galue i manoa, ma ggplot2 o se tasi o mea e sili ona lauiloa fa'amatalaga fa'amatalaga.

manuia tidyverse o le faigofie ma faigofie ona faitau syntax, lea e tele auala e tutusa ma le SQL query language.

fa'amaumauga. laulau

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tuaОфициальный сайт: r-datatable.com

E data.table o Matt Dole o H2O.ai.

O le uluai tatalaina o le faletusi na faia i le 2006.

O le fa'asologa o afifi e le faigofie e pei o totonu tidyverse ma e sili atu ona faʻamanatuina faʻamaumauga masani i le R, ae i le taimi lava e tasi e matua faʻalauteleina i galuega.

O togafiti uma ma le laulau i totonu o lenei afifi o loʻo faʻamatalaina i puipui sikuea, ma pe afai e te faʻaliliu le syntax data.table i SQL, e te maua se mea e pei o lenei: data.table[ WHERE, SELECT, GROUP BY ]

O le malosi o lenei afifi o le saoasaoa o le gaosiga o le tele o faʻamaumauga.

pona

Официальный сайт: pandas.pydata.org O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

O le igoa o le faletusi e sau mai le fa'aupuga o le tamaoaiga "panel data", fa'aaoga e fa'amatala ai fa'asologa fa'atulagaina o fa'amatalaga.

E pandas o Amerika Wes McKinney.

A oʻo mai i suʻesuʻega faʻamatalaga i le Python, tutusa pandas Leai. O se mea e sili ona faʻaogaina, maualuga tulaga maualuga e mafai ai ona e faia soʻo se togafiti faʻatasi ma faʻamaumauga, mai le utaina o faʻamatalaga mai soʻo se punaoa e vaʻaia ai.

Fa'apipi'i afifi fa'aopoopo

O afifi o loʻo talanoaina i lenei tusiga e le o aofia i totonu o faʻasalalauga faʻavae R ma Python. E ui lava o loʻo i ai se faʻatagaga laʻititi, afai e te faʻapipiʻiina le tufatufaina o Anaconda, ona faʻapipiʻi lea pandas e le o manaʻomia.

Faʻapipiʻiina o afifi ile R

Afai na e tatalaina le siosiomaga atinaʻe RStudio ia le itiiti ifo ma le tasi, masalo ua uma ona e iloa pe faʻafefea ona faʻapipiʻi le afifi manaʻomia i le R. Ina ia faʻapipiʻi afifi, faʻaaoga le faʻatonuga masani. install.packages() e ala i le taʻavale saʻo i le R lava ia.

# установка пакетов
install.packages("vroom")
install.packages("readr")
install.packages("dplyr")
install.packages("data.table")

A maeʻa faʻapipiʻi, e manaʻomia ona faʻafesoʻotaʻi afifi, lea i le tele o tulaga e faʻaaogaina ai le poloaiga library().

# подключение или импорт пакетов в рабочее окружение
library(vroom)
library(readr)
library(dplyr)
library(data.table)

Fa'apipi'i afifi ile Python

O lea la, afai e iai sau Python mama faʻapipiʻi, ona pandas e tatau ona e faʻapipiʻi ma le lima. Tatala se laina faʻatonu, poʻo se laina, faʻalagolago i lau faiga faʻaogaina ma ulufale i le poloaiga lenei.

pip install pandas

Ona matou toe foʻi lea i le Python ma faʻaulufale mai le afifi faʻapipiʻi ma le poloaiga import.

import pandas as pd

utaina o Faamatalaga

Fa'amatalaga fa'amatalaga o se tasi lea o laasaga sili ona taua i le su'esu'eina o fa'amaumauga. O le Python ma le R, pe a manaʻomia, e tuʻuina atu ia te oe le tele o avanoa e maua ai faʻamatalaga mai soʻo se punaoa: faila i le lotoifale, faila mai le Initaneti, upega tafaʻilagi, ituaiga uma o faʻamaumauga.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

I le tusiga atoa o le a matou faʻaogaina le tele o faʻamaumauga:

  1. Lua fa'aulu mai Google Analytics.
  2. Titanic Passenger Dataset.

O faʻamatalaga uma o loʻo i luga o laʻu GitHub i foliga o faila csv ma ​​tsv. O fea tatou te talosagaina ai i latou?

Tu'u fa'amaumauga ile R: tidyverse, vroom, readr

Le utaina o faamatalaga i totonu o se faletusi tidyverse E lua afifi: vroom, readr. vroom sili atu faʻaonaponei, ae i le lumanaʻi e mafai ona tuʻufaʻatasia afifi.

Upusii mai pepa aloaia vroom.

vroom vs faitau
O le a le uiga o le tatalaina o vroom uiga mo readr? Mo le taimi nei matou te fuafua e tuʻu eseʻese afifi e lua, ae e foliga mai o le a matou tuʻufaʻatasia afifi i le lumanaʻi. O le tasi le itu le lelei o le faitau paie a vroom o nisi faʻamatalaga faʻamatalaga e le mafai ona lipotia i luma, o le auala sili e faʻatasi ai e manaʻomia ai se mafaufau.

vroom vs faitau
O le a le uiga o le tatalaina? vroom mo readr? I le taimi nei matou te fuafua e atiaʻe faʻatasi uma ia afifi, ae atonu o le a matou tuʻufaʻatasia i le lumanaʻi. O se tasi o itu le lelei o le paie faitau vroom o nisi faʻafitauli i faʻamatalaga e le mafai ona lipotia muamua, o lea e tatau ai ona e mafaufau i le auala sili e tuʻufaʻatasia ai.

I lenei tusiga o le a tatou vaʻavaʻai i faʻamaumauga uma e lua e faʻapipiʻiina:

Tu'u fa'amaumauga ile R: vroom package

# install.packages("vroom")
library(vroom)

# Чтение данных
## vroom
ga_nov  <- vroom("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_nowember.csv")
ga_dec  <- vroom("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_december.csv")
titanic <- vroom("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/titanic.csv")

Tu'uina o fa'amaumauga ile R:readr

# install.packages("readr")
library(readr)

# Чтение данных
## readr
ga_nov  <- read_tsv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_nowember.csv")
ga_dec  <- read_tsv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_december.csv")
titanic <- read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/titanic.csv")

I le afifi vroom, e tusa lava po o le a le csv / tsv faʻamaumauga faʻamaumauga, o loʻo faʻatinoina e le galuega o le igoa tutusa vroom(), i totonu o le afifi readr matou te faʻaaogaina se galuega eseese mo faʻatulagaga taʻitasi read_tsv() и read_csv().

Tu'u fa'amaumauga ile R: data.table

В data.table o loʻo i ai se galuega mo le utaina o faʻamaumauga fread().

Tu'u fa'amaumauga ile R: data.table package

# install.packages("data.table")
library(data.table)

## data.table
ga_nov  <- fread("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_nowember.csv")
ga_dec  <- fread("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/ga_december.csv")
titanic <- fread("https://raw.githubusercontent.com/selesnow/publications/master/data_example/r_python_data/titanic.csv")

Tuuina atu o faʻamatalaga i le Python: pandas

Afai tatou te faʻatusatusa i R packages, i le tulaga lea o le syntax e sili ona latalata i pandas o le a avea readr, ona pandas e mafai ona talosagaina faʻamatalaga mai soʻo se mea, ma o loʻo i ai se aiga atoa o galuega i totonu o lenei afifi read_*().

  • read_csv()
  • read_excel()
  • read_sql()
  • read_json()
  • read_html()

Ma le tele o isi galuega ua fuafuaina e faitau faʻamaumauga mai faʻasologa eseese. Ae mo o tatou faamoemoega ua lava lea read_table() poʻo read_csv() fa'aaogaina finauga Setema e fa'ama'oti ai le va'aiga o le koluma.

Tuuina atu o faʻamatalaga i le Python: pandas

import pandas as pd

ga_nov  = pd.read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/russian_text_in_r/ga_nowember.csv", sep = "t")
ga_dec  = pd.read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/russian_text_in_r/ga_december.csv", sep = "t")
titanic = pd.read_csv("https://raw.githubusercontent.com/selesnow/publications/master/data_example/russian_text_in_r/titanic.csv")

Fausia fa'amaumauga

Laupapa titanic, lea na matou utaina, o loo i ai le fanua itūʻaiga, lea e teu ai le itupa o le pasese.

Ae mo se faʻamatalaga sili atu ona faigofie o faʻamatalaga i tulaga o le itupa pasese, e tatau ona e faʻaogaina le igoa nai lo le faʻailoga o le itupa.

Ina ia faia lenei mea, o le a matou faia se tamai lisi, o se laulau e na'o le 2 koluma (code and gender name) ma le 2 laina, i le faasologa.

Fausia se fa'amaumauga i le R: tidyverse, dplyr

I le faʻataʻitaʻiga code i lalo, matou te fatuina le faʻamatalaga manaʻomia e faʻaaoga ai le galuega tibble() .

Fausia se faʻamatalaga faʻamatalaga ile R: dplyr

## dplyr
### создаём справочник
gender <- tibble(id = c(1, 2),
                 gender = c("female", "male"))

Fausia se fa'amaumauga i le R: data.table

Fausia se fa'amaumauga i le R: data.table

## data.table
### создаём справочник
gender <- data.table(id = c(1, 2),
                    gender = c("female", "male"))

Fausia se faʻamatalaga faʻamatalaga i le Python: pandas

В pandas O le fausiaina o faʻavaa e faʻatinoina i ni laasaga, muamua matou te fatuina se lolomifefiloi, ona matou faʻaliliuina lea o le lomifefiloi i se faʻamatalaga.

Fausia se faʻamatalaga faʻamatalaga i le Python: pandas

# создаём дата фрейм
gender_dict = {'id': [1, 2],
               'gender': ["female", "male"]}
# преобразуем словарь в датафрейм
gender = pd.DataFrame.from_dict(gender_dict)

Filifilia o Koluma

O laulau e te galue ai e mafai ona aofia ai le tele poʻo le selau o koluma o faʻamaumauga. Ae o le faia o suʻesuʻega, e pei o se tulafono, e te le manaʻomia uma koluma o loʻo maua i le laulau puna.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

O le mea lea, o se tasi o faʻagaioiga muamua o le a e faia ma le laulau faʻapogai o le faʻamamaina lea o faʻamatalaga le manaʻomia ma faʻasaʻoloto le manatua o loʻo i ai i lenei faʻamatalaga.

Filifilia o koluma ile R: tidyverse, dplyr

Faʻailoga dplyr e talitutusa lava ma le SQL query language, afai e te masani i ai o le ae vave faʻatautaia lenei afifi.

Ina ia filifili koluma, faʻaoga le galuega select().

O loʻo i lalo faʻataʻitaʻiga o code e mafai ona e filifilia ai koluma i auala nei:

  • Lisi igoa o koluma mana'omia
  • Va'ai i igoa koluma e fa'aaoga ai fa'amatalaga masani
  • E ala i le ituaiga faʻamatalaga poʻo soʻo se isi mea totino o faʻamatalaga o loʻo i totonu o le koluma

Filifilia koluma ile R: dplyr

# Выбор нужных столбцов
## dplyr
### выбрать по названию столбцов
select(ga_nov, date, source, sessions)
### исключь по названию столбцов
select(ga_nov, -medium, -bounces)
### выбрать по регулярному выражению, стобцы имена которых заканчиваются на s
select(ga_nov, matches("s$"))
### выбрать по условию, выбираем только целочисленные столбцы
select_if(ga_nov, is.integer)

Filifilia koluma ile R: data.table

O gaioiga tutusa i data.table e fai sina eseesega, i le amataga o le tusiga na ou tuʻuina atu se faʻamatalaga o mea o loʻo i totonu o puipui sikuea data.table.

DT[i,j,by]

Afai:
i - o fea, i.e. faamama i laina
j - filifili|faafouga|fai, i.e. filifilia o koluma ma faaliliuina
e - fa'avasegaina fa'amaumauga

Filifilia koluma ile R: data.table

## data.table
### выбрать по названию столбцов
ga_nov[ , .(date, source, sessions) ]
### исключь по названию столбцов
ga_nov[ , .SD, .SDcols = ! names(ga_nov) %like% "medium|bounces" ]
### выбрать по регулярному выражению
ga_nov[, .SD, .SDcols = patterns("s$")]

Fesuiaiga .SD e mafai ai ona e mauaina koluma uma, ma .SDcols fa'amama koluma mana'omia e fa'aaoga ai fa'amatalaga masani, po'o isi galuega e fa'amama ai igoa o koluma e te mana'omia.

Filifilia koluma i le Python, pandas

E filifili koluma ile igoa ile pandas ua lava le tuuina atu o se lisi o latou igoa. Ma e filifili pe faʻateʻaina koluma ile igoa e faʻaaoga ai faʻamatalaga masani, e tatau ona e faʻaogaina galuega drop() и filter(), ma finauga axis=1, lea e te faʻaalia ai e manaʻomia le faʻaogaina o koluma nai lo laina.

Ina ia filifili se fanua e ala i ituaiga faʻamatalaga, faʻaaoga le galuega select_dtypes(), ma i finauga aofia poʻo aloese pasi se lisi o ituaiga faʻamatalaga e fetaui ma fanua e te manaʻomia e te filifilia.

Filifilia koluma i le Python: pandas

# Выбор полей по названию
ga_nov[['date', 'source', 'sessions']]
# Исключить по названию
ga_nov.drop(['medium', 'bounces'], axis=1)
# Выбрать по регулярному выражению
ga_nov.filter(regex="s$", axis=1)
# Выбрать числовые поля
ga_nov.select_dtypes(include=['number'])
# Выбрать текстовые поля
ga_nov.select_dtypes(include=['object'])

Filifili laina

Mo se faʻataʻitaʻiga, o le laulau faʻapogai e mafai ona aofia ai ni nai tausaga o faʻamaumauga, ae naʻo lou manaʻomia e suʻeina le masina talu ai. Toe fo'i, o laina fa'aopoopo o le a fa'agesegese le faagasologa o fa'amaumauga ma fa'apipi'i ai le manatua o le PC.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

Filifili laina i le R: tydyverse, dplyr

В dplyr e fa'aoga le galuega e fa'amama ai laina filter(). E mana'omia se fa'amaumauga e fai ma finauga muamua, ona e lisiina lea o tulaga fa'amama.

Pe a tusia faʻamatalaga talafeagai e faʻamama ai se laulau, i lenei tulaga, faʻamaonia igoa koluma e aunoa ma upusii ma e aunoa ma le taʻuina o le igoa ole laulau.

A fa'aogaina le tele o fa'amatalaga talafeagai e fa'amama ai, fa'aoga mea nei:

  • & po o koma - talafeagai MA
  • | - talafeagai PO

Filifili laina ile R: dplyr

# фильтрация строк
## dplyr
### фильтрация строк по одному условию
filter(ga_nov, source == "google")
### фильтр по двум условиям соединённым логическим и
filter(ga_nov, source == "google" & sessions >= 10)
### фильтр по двум условиям соединённым логическим или
filter(ga_nov, source == "google" | sessions >= 10)

Fa'amama laina ile R: data.table

E pei ona ou tusia i luga, i data.table fa'asologa fa'aliliuga fa'amaumauga o lo'o fa'apipi'iina i puipui sikuea.

DT[i,j,by]

Afai:
i - o fea, i.e. faamama i laina
j - filifili|faafouga|fai, i.e. filifilia o koluma ma faaliliuina
e - fa'avasegaina fa'amaumauga

O le finauga e fa'aaogaina e fa'amama ai laina i, lea ei ai le tulaga muamua i puipui sikuea.

O koluma e maua i fa'amatalaga talafeagai e aunoa ma ni fa'ailoga fa'ailoga ma e aunoa ma le fa'ailoaina o le igoa ole laulau.

O fa'amatalaga talafeagai e feso'ota'i le tasi i le isi i le auala lava e tasi e pei o le in dplyr e ala i le & ma |

Fa'amama laina ile R: data.table

## data.table
### фильтрация строк по одному условию
ga_nov[source == "google"]
### фильтр по двум условиям соединённым логическим и
ga_nov[source == "google" & sessions >= 10]
### фильтр по двум условиям соединённым логическим или
ga_nov[source == "google" | sessions >= 10]

Filifilia manoa i le Python: pandas

Filifili i laina i totonu pandas tutusa ma le filiga i totonu data.table, ma e faia i puipui sikuea.

I lenei tulaga, o le avanoa i koluma e faʻatinoina e ala i le faʻaalia o le igoa o le dataframe; ona mafai lea ona faʻaalia le igoa koluma i faʻailoga i totonu o puipui sikuea (faataitaiga df['col_name']), pe leai ni upusii pe a uma le vaitaimi (faataitaiga df.col_name).

Afai e te mana'omia le fa'amamaina o se fa'amaumauga e ala i le tele o tulaga, e tatau ona tu'u tulaga ta'itasi i totonu o puipui. O tulaga talafeagai e fesoʻotaʻi le tasi ma le isi e tagata faʻatautaia & и |.

Filifilia manoa i le Python: pandas

# Фильтрация строк таблицы
### фильтрация строк по одному условию
ga_nov[ ga_nov['source'] == "google" ]
### фильтр по двум условиям соединённым логическим и
ga_nov[(ga_nov['source'] == "google") & (ga_nov['sessions'] >= 10)]
### фильтр по двум условиям соединённым логическим или
ga_nov[(ga_nov['source'] == "google") | (ga_nov['sessions'] >= 10)]

Tu'ufa'atasiga ma fa'apotopotoina o fa'amaumauga

O se tasi o fa'agaioiga e masani ona fa'aaogaina i fa'amaumauga o fa'amaumauga o le fa'avasegaina ma le tu'ufa'atasia.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

O le syntax mo le fa'atinoina o nei fa'agaioiga o lo'o fa'asalalauina i afifi uma tatou te iloiloina.

I lenei tulaga, o le a matou faia se faʻamatalaga faʻamatalaga e fai ma faʻataʻitaʻiga titanic, ma fuafua le numera ma le averesi o tau o tiketi e faʻatatau i le vasega fale.

Tu'ufa'atasiga ma fa'aputuga o fa'amaumauga i le R: tidyverse, dplyr

В dplyr e fa'aoga le galuega mo le fa'avasegaina group_by(), ma mo le tuufaatasia summarise(). Ae o le mea moni, dplyr o loʻo i ai se aiga atoa o galuega summarise_*(), ae o le faʻamoemoega o lenei tusiga o le faʻatusatusaina o le syntax faʻavae, o lea o le a tatou le o atu ai i sea vaomatua.

Galuega fa'aopoopo fa'avae:

  • sum() - aotelega
  • min() / max() - tau maualalo ma maualuga
  • mean() - averesi
  • median() — median
  • length() - tele

Tu'ufa'atasi ma fa'aputuga i le R: dplyr

## dplyr
### группировка и агрегация строк
group_by(titanic, Pclass) %>%
  summarise(passangers = length(PassengerId),
            avg_price  = mean(Fare))

Ia galue group_by() sa matou pasia le laulau o le finauga muamua titanic, ona faailoa atu lea o le fanua Pclass, lea o le a tatou fa'avasega ai la tatou laulau. O le taunuuga o lenei gaioiga e faʻaaoga ai le tagata faʻaoga %>% pasi o le finauga muamua i le galuega summarise(), ma fa'aopoopo 2 isi fanua: tagata pasese и avg_price. I le muamua, faʻaaogaina le galuega length() fuafua le numera o tiketi, ma i le lona lua faʻaaogaina le galuega mean() maua le tau averesi o tiketi.

Tu'ufa'atasiga ma fa'aputuga o fa'amaumauga i le R: data.table

В data.table o le finauga e faaaoga mo le tuufaatasia j lea ei ai lona tulaga lona lua i puipui sikuea, ma mo le faavasegaina by poʻo keyby, lea ei ai le tulaga lona tolu.

O le lisi o galuega fa'aopoopo i lenei tulaga e tutusa ma le fa'amatalaina i totonu dplyr, ona o galuega ia mai le faavae R syntax.

Tu'ufa'atasi ma fa'aputuga i le R: data.table

## data.table
### фильтрация строк по одному условию
titanic[, .(passangers = length(PassengerId),
            avg_price  = mean(Fare)),
        by = Pclass]

Tu'ufa'atasiga ma fa'atasiga o fa'amaumauga i le Python: pandas

Fa'avae i totonu pandas tutusa ma dplyr, ae o le tuufaatasiga e le tutusa ma dplyr e le o luga data.table.

I le fa'avasegaina, fa'aaoga le metotia groupby(), lea e te manaʻomia e pasi ai se lisi o koluma e faʻapipiʻi ai le faʻasologa o faʻamatalaga.

Mo le faʻapipiʻiina e mafai ona e faʻaogaina le metotia agg()lea e talia se lomifefiloi. O ki o lomifefiloi o koluma ia e te faʻaogaina ai galuega faʻapipiʻi, ma o tau o igoa ia o galuega faʻapipiʻi.

Galuega fa'aopoopo:

  • sum() - aotelega
  • min() / max() - tau maualalo ma maualuga
  • mean() - averesi
  • median() — median
  • count() - tele

galuega tauave reset_index() i le faʻataʻitaʻiga o loʻo i lalo o loʻo faʻaaogaina e toe faʻapipiʻi faʻamaufaʻailoga e pandas fa'aletonu i le mae'a ai o fa'amaumauga.

Faailoga e mafai ai ona e alu i le isi laina.

Tu'ufa'atasiga ma fa'apotopotoina i le Python: pandas

# группировка и агрегация данных
titanic.groupby(["Pclass"]).
    agg({'PassengerId': 'count', 'Fare': 'mean'}).
        reset_index()

Tu'ufa'atasi o laulau

O se ta'aloga e te fa'atasi ai i ni laulau se lua pe sili atu o le fausaga tutusa. O fa'amaumauga na matou utaina e iai laulau ga_nov и ga_dec. O laulau nei e tutusa lelei le fausaga, i.e. ia tutusa koluma, ma ituaiga faʻamatalaga i nei koluma.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

Ole mea lea e tu'u mai ile Google Analytics mo le masina o Novema ma Tesema, ile vaega lea o le a tatou tu'ufa'atasia ai fa'amaumauga nei ile laulau e tasi.

Tu'u sa'o i laulau i le R: tidyverse, dplyr

В dplyr E mafai ona e tu'ufa'atasia 2 laulau i le tasi e fa'aaoga ai le galuega bind_rows(), pasia laulau e fai ma ona finauga.

Filifili laina ile R: dplyr

# Вертикальное объединение таблиц
## dplyr
bind_rows(ga_nov, ga_dec)

Tu'u sa'o i laulau i le R: data.table

E leai foi se mea faigata, sei faaaoga rbind().

Fa'amama laina ile R: data.table

## data.table
rbind(ga_nov, ga_dec)

Tu'u sa'o i laulau i le Python: pandas

В pandas e fa'aoga le galuega e fa'atasi ai laulau concat(), lea e te manaʻomia e pasi ai se lisi o faʻavaa e tuʻufaʻatasia ai.

Filifilia manoa i le Python: pandas

# вертикальное объединение таблиц
pd.concat([ga_nov, ga_dec])

So'o fa'asaga o laulau

O se fa'agaioiga e fa'aopoopo ai koluma mai le lona lua i le laulau muamua e ala i le ki. E masani ona faʻaaogaina pe a faʻatamaoaigaina se laulau faʻamatalaga (mo se faʻataʻitaʻiga, se laulau ma faʻamatalaga faʻatau) ma nisi faʻamatalaga faʻamatalaga (mo se faʻataʻitaʻiga, tau o se oloa).

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

E tele ituaiga o fesoʻotaʻiga:

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

I le laulau na uta muamua titanic e iai la matou koluma itūʻaiga, lea e fetaui ma le numera o itupa a le pasese:

1 - fafine
2 - tane

E le gata i lea, ua matou faia se laulau - o se tusi faʻasino itupa. Mo se faʻamatalaga sili atu ona faigofie o faʻamatalaga i luga o le itupa o pasese, matou te manaʻomia le faʻaopoopoina o le igoa o le itupa mai le lisi itupa i le laulau titanic.

Fa'atasi le laulau fa'asaga i le R: tidyverse, dplyr

В dplyr O loʻo i ai se aiga atoa o galuega mo le tuʻufaʻatasiga faʻasalalau:

  • inner_join()
  • left_join()
  • right_join()
  • full_join()
  • semi_join()
  • nest_join()
  • anti_join()

O le mea e masani ona faʻaaogaina i laʻu faʻataʻitaʻiga o le left_join().

E pei o finauga muamua e lua, o galuega o loʻo lisiina i luga e ave ni laulau se lua e faʻatasi, ma o le finauga lona tolu by e tatau ona e faʻamaonia koluma e faʻatasi ai.

Fa'atasi le laulau fa'asaga i le R: dplyr

# объединяем таблицы
left_join(titanic, gender,
          by = c("Sex" = "id"))

Fa'asagaga fa'asaga o laulau i le R: data.table

В data.table E mana'omia lou tu'ufa'atasia o laulau e ala ile ki fa'aaoga le galuega merge().

O finauga e tu'ufa'atasia () galue ile data.table

  • x, y — Laupapa mo le soofaatasi
  • e - Koluma o le ki e auai pe afai e tutusa lona igoa i laulau uma e lua
  • by.x, by.y - igoa koluma e tu'u fa'atasi, pe a fai e eseese o latou igoa i laulau.
  • uma, all.x, all.y — Tu'aiga fa'atasi, o le a toe fa'afo'i uma laina uma mai laulau uma e lua, all.x e fetaui ma le LEFT JOIN operation (o le a tu'u uma ai laina o le laulau muamua), all.y - e fetaui ma le TAUMA'U AUAI fa'agaioiga (o le a tu'u uma laina o le laulau lona lua ).

Fa'asagaga fa'asaga o laulau i le R: data.table

# объединяем таблицы
merge(titanic, gender, by.x = "Sex", by.y = "id", all.x = T)

Fa'atasi le laulau fa'asaga i le Python: pandas

E faapea foi i totonu data.table, i pandas e fa'aoga le galuega e fa'atasi ai laulau merge().

O finauga o le merge() galue i pandas

  • faʻafefea - Ituaiga fesoʻotaʻiga: agavale, taumatau, fafo, totonu
  • luga - Koluma o se ki pe afai e tutusa lona igoa i laulau uma e lua
  • left_on, right_on - Igoa o koluma autu, pe afai e eseese o latou igoa i laulau

Fa'atasi le laulau fa'asaga i le Python: pandas

# объединяем по ключу
titanic.merge(gender, how = "left", left_on = "Sex", right_on = "id")

Galuega fa'amalama autu ma koluma fuafuaina

O galuega fa'amalama e tutusa lelei le uiga i galuega fa'aopoopo, ma e masani fo'i ona fa'aoga i su'esu'ega fa'amaumauga. Ae e le pei o galuega fa'aopoopo, o galuega fa'amalama e le suia ai le aofa'i o laina o le fa'amaumauga o fafo.

O le fea gagana e filifili mo le galue i faʻamaumauga - R poʻo le Python? E lua! Fa'asolo mai pandas i tidyverse ma fa'amaumauga. laulau ma tua

O le mea moni, i le faʻaaogaina o le faʻamalama faʻamalama, matou te vaevaeina le faʻamatalaga o loʻo oʻo mai i ni vaega e tusa ai ma nisi faʻataʻitaʻiga, i.e. e ala i le tau o se fanua, po o nisi fanua. Ma matou fa'atinoina galuega fa'atusa i fa'amalama ta'itasi. O le taunuuga o nei gaioiga o le a toe faʻafoʻi i laina taʻitasi, i.e. e aunoa ma le suia o le numera atoa o laina i le laulau.

Mo se faʻataʻitaʻiga, seʻi o tatou ave le laulau titanic. E mafai ona tatou fa'atatauina po'o le a le pasene o le tau o tiketi ta'itasi sa i totonu o lana vasega potu.

Ina ia faia lenei mea, e tatau ona tatou maua i laina taʻitasi le tau atoa o se tiketi mo le vasega o loʻo i ai nei o loʻo i ai le tiketi i lenei laina, ona vaevae lea o le tau o tiketi taʻitasi i le tau atoa o tiketi uma o le vasega fale e tasi. .

O galuega fa'amalama i le R: tidyverse, dplyr

E fa'aopoopo koluma fou, e aunoa ma le fa'aogaina o le fa'avasegaina o laina, i dplyr tautua galuega mutate().

E mafai ona e foia le faʻafitauli o loʻo faʻamatalaina i luga e ala i le faʻavasegaina o faʻamaumauga ile fanua Pclass ma fa'aputuina le fanua i se koluma fou fai. Sosoo ai, tatala fa'atasi le laulau ma vaevae fa'atauga fanua fai i le mea na tupu i le laasaga muamua.

Fa'amalama o lo'o galue i le R: dplyr

group_by(titanic, Pclass) %>%
  mutate(Pclass_cost = sum(Fare)) %>%
  ungroup() %>%
  mutate(ticket_fare_rate = Fare / Pclass_cost)

O galuega fa'amalama ile R: data.table

Ole algorithm ole fofo e tumau pea ile tutusa dplyr, e tatau ona tatou vaevaeina le laulau i faamalama i fanua Pclass. Tuuina atu i totonu o se koluma fou le aofaʻi mo le vaega e fetaui ma laina taʻitasi, ma faʻaopoopo se koluma lea matou te faʻatusatusa ai le faʻasoa o le tau o tiketi taʻitasi i lana vaega.

E fa'aopoopo koluma fou i data.table o lo'o iai le fa'afoe :=. Lalo o se faʻataʻitaʻiga o le foia o se faʻafitauli e faʻaaoga ai le afifi data.table

O galuega fa'amalama ile R: data.table

titanic[,c("Pclass_cost","ticket_fare_rate") := .(sum(Fare), Fare / Pclass_cost), 
        by = Pclass]

Fa'amalama galuega i le Python: pandas

Tasi auala e fa'aopoopo ai se koluma fou i pandas - fa'aaoga le galuega assign(). Ina ia tauaofai le tau o tiketi ile vasega potu, e aunoa ma le faʻavasegaina o laina, o le a matou faʻaogaina le galuega transform().

Lalo o se faʻataʻitaʻiga o se fofo tatou te faʻaopoopo i le laulau titanic tutusa 2 koluma.

Fa'amalama galuega i le Python: pandas

titanic.assign(Pclass_cost      =  titanic.groupby('Pclass').Fare.transform(sum),
               ticket_fare_rate = lambda x: x['Fare'] / x['Pclass_cost'])

Fuafuaga ma metotia fetusiaiga laulau

Lalo o se laulau o fesoʻotaʻiga i le va o metotia mo le faʻatinoina o gaioiga eseese ma faʻamaumauga i totonu o afifi na matou iloiloina.

faʻamatalaga
tidyverse
fa'amaumauga. laulau
pona

utaina o Faamatalaga
vroom()/ readr::read_csv() / readr::read_tsv()
fread()
read_csv()

Fausia fa'amaumauga
tibble()
data.table()
dict() + from_dict()

Filifilia o Koluma
select()
finauga j, tulaga lona lua i puipui sikuea
matou te pasia le lisi o koluma manaʻomia i sikuea sikuea / drop() / filter() / select_dtypes()

Filifili laina
filter()
finauga i, tulaga muamua i puipui sikuea
Matou te lisiina tulaga faʻamamaina i sikuea sikuea / filter()

Tu'ufa'atasiga ma Fa'atasiga
group_by() + summarise()
finauga j + by
groupby() + agg()

Tu'ufa'atasiga tu'usa'o o laulau (UNION)
bind_rows()
rbind()
concat()

Tu'ufa'atasiga fa'asaga i laulau (SO'I)
left_join() / *_join()
merge()
merge()

Galuega fa'amalama autu ma fa'aopoopo koluma fa'atatau
group_by() + mutate()
finauga j fa'aaogaina le tagata fa'afoe := + finauga by
transform() + assign()

iʻuga

Masalo i le tusiga na ou faʻamatalaina e le o le faʻatinoga sili ona lelei o le faʻaogaina o faʻamaumauga, o lea o le a ou fiafia pe a e faʻasaʻo aʻu mea sese i faʻamatalaga, pe na o le faʻaopoopoina o faʻamatalaga o loʻo tuʻuina atu i le tusiga ma isi metotia mo le galue ma faʻamatalaga i R / Python.

E pei ona ou tusia i luga, o le faʻamoemoega o le tusiga e le o le faʻamalosia o se manatu o se tasi i le gagana e sili atu, ae ia faʻafaigofie le avanoa e aʻoaʻo ai gagana uma e lua, pe, pe a manaʻomia, faimalaga i le va oi latou.

Afai e te fiafia i le tusiga, o le a ou fiafia i le i ai o ni au fai saofaga i laʻu YouTube и telegram alavai.

talanoaga

O fea o afifi nei e te fa'aogaina i lau galuega?

I faʻamatalaga e mafai ona e tusia le mafuaʻaga o lau filifiliga.

Na'o tagata fa'aigoaina e mafai ona auai i le su'esu'ega. Saini ese j, faʻamolemole.

O le fea fa'amaumauga o fa'amaumauga e te fa'aogaina (e mafai ona e filifilia ni filifiliga)

  • 45,2%tidyverse19

  • 33,3%fa'amaumauga.table14

  • 54,8%panda23

42 tagata fa'aoga na palota. 9 tagata fa'aoga na le mafai.

puna: www.habr.com

Faaopoopo i ai se faamatalaga