R iphakheji tidyr kunye nemisebenzi yayo emitsha pivot_longer kunye pivot_banzi

Iphakheji icocekile ibandakanywe kumbindi welona thala leencwadi lidumileyo kulwimi lwesi-R - icocekile.
Injongo ephambili yephakheji kukuzisa idatha kwifom echanekileyo.

Sele ikhona kuHabré upapasho inikezelwe kule phakheji, kodwa iqale ngo-2015. Kwaye ndifuna ukukuxelela malunga nolona tshintsho lwangoku, olwabhengezwa kwiintsuku ezimbalwa ezidlulileyo ngumlobi walo, uHedley Wickham.

R iphakheji tidyr kunye nemisebenzi yayo emitsha pivot_longer kunye pivot_banzi

S.J.K.: Ngaba collection() kunye ne spread() iya kurhoxiswa?

Hadley Wickham: Ukuya kwenye indawo. Asisayi kuphinda sincome ukusetyenziswa kwale misebenzi kwaye silungise iziphene kuzo, kodwa ziya kuqhubeka zikhona kwiphakheji kwimeko yazo yangoku.

Iziqulatho

Ukuba unomdla kuhlalutyo lwedatha, unokuba nomdla kum yilelegram и youtube imijelo. Uninzi lomxholo lunikezelwe kulwimi R.

Ingcamango yeTidyData

Injongo icocekile — kukunceda uzise idatha kwinto ebizwa ngokuba yifomu ecocekileyo. Idatha ecocekileyo yidatha apho:

  • Uguqulo ngalunye lukwikholamu.
  • Uqwalaselo ngalunye lungumtya.
  • Ixabiso ngalinye yiseli.

Kulula kakhulu kwaye kulula ngakumbi ukusebenza ngedatha enikezelwe kwidatha ecocekileyo xa uhlalutya.

Imisebenzi ephambili ibandakanyiwe kwiphakheji ye-tidyr

i-tidyr iqulethe uluhlu lwemisebenzi eyilelwe ukuguqula iitafile:

  • fill() - ukuzalisa amaxabiso alahlekileyo kwikholamu kunye namaxabiso angaphambili;
  • separate() — ukwahlula intsimi ibe ninzi usebenzisa isahluli;
  • unite() - yenza umsebenzi wokudibanisa amasimi amaninzi kwelinye, isenzo esichasayo somsebenzi separate();
  • pivot_longer() - umsebenzi oguqula idatha ukusuka kwifomathi ebanzi ukuya kwifomathi ende;
  • pivot_wider() - umsebenzi oguqula idatha ukusuka kwifomathi ende ukuya kwifomathi ebanzi. Ukusebenza umva kwalowo owenziwe ngumsebenzi pivot_longer().
  • gather()iphelelwe lixesha - umsebenzi oguqula idatha ukusuka kwifomathi ebanzi ukuya kwifomathi ende;
  • spread()iphelelwe lixesha - umsebenzi oguqula idatha ukusuka kwifomathi ende ukuya kwifomathi ebanzi. Ukusebenza umva kwalowo owenziwe ngumsebenzi gather().

Ingqikelelo entsha yokuguqula idatha ukusuka kububanzi ukuya kwifomathi ende kwaye ngokuphambeneyo

Ngaphambili, imisebenzi yayisetyenziselwa olu hlobo lwenguqu gather() и spread(). Ukutyhubela iminyaka yobukho bale misebenzi, kuye kwacaca ukuba kubasebenzisi abaninzi, kubandakanywa nombhali wepakethe, amagama ale misebenzi kunye neengxoxo zabo azicacanga ncam, kwaye kubangele ubunzima ekuyifumaneni nasekuqondeni ukuba yeyiphi le misebenzi eguqulayo. isakhelo somhla ukusuka kububanzi ukuya kwifomathi ende, kwaye ngolunye uhlobo.

Kule nkalo, kwi icocekile Imisebenzi emibini emitsha, ebalulekileyo iye yongezwa eyilelwe ukuguqula izakhelo zomhla.

Iimpawu ezintsha pivot_longer() и pivot_wider() ziphefumlelwe zezinye zeempawu ezikwiphakheji cdata, eyenziwe nguJohn Mount kunye noNina Zumel.

Ukufakela olona guqulelo lwangoku lwe-tidyr 0.8.3.9000

Ukufakela entsha, inguqulelo yangoku yepakethe icocekile 0.8.3.9000, apho izinto ezintsha zikhoyo, sebenzisa le khowudi ilandelayo.

devtools::install_github("tidyverse/tidyr")

Ngexesha lokubhalwa, le misebenzi ifumaneka kuphela kwi-dev version yephakheji kwi-GitHub.

Ukutshintshela kwiimpawu ezintsha

Ngapha koko, akunzima ukutshintshela imibhalo emidala ukusebenza ngemisebenzi emitsha; ukuqonda ngcono, ndiya kuthatha umzekelo kumaxwebhu emisebenzi emidala kwaye ndibonise indlela imisebenzi efanayo eyenziwa ngayo kusetyenziswa emitsha. pivot_*() imisebenzi.

Guqula ifomathi ebanzi ibe yifomati ende.

Ikhowudi yomzekelo ukusuka kuxwebhu lomsebenzi wokuqokelela

# example
library(dplyr)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

# old
stocks_gather <- stocks %>% gather(key   = stock, 
                                   value = price, 
                                   -time)

# new
stocks_long   <- stocks %>% pivot_longer(cols      = -time, 
                                       names_to  = "stock", 
                                       values_to = "price")

Ukuguqula ifomathi ende kwifomati ebanzi.

Ikhowudi yomzekelo ukusuka kumaxwebhu omsebenzi wosasazo

# old
stocks_spread <- stocks_gather %>% spread(key = stock, 
                                          value = price) 

# new 
stock_wide    <- stocks_long %>% pivot_wider(names_from  = "stock",
                                            values_from = "price")

Ngokuba kule mizekelo ingasentla yokusebenza nayo pivot_longer() и pivot_wider(), kwitheyibhile yokuqala esitokisini akukho zikholamu ezidweliswe kwiingxoxo amagama_ukuya и values_to amagama abo makabe kumanqaku okucaphula.

Itheyibhile eya kukunceda ukuba ufumane ngokulula indlela yokutshintshela ekusebenzeni ngombono omtsha icocekile.

R iphakheji tidyr kunye nemisebenzi yayo emitsha pivot_longer kunye pivot_banzi

Inqaku elivela kumbhali

Wonke umbhalo ongezantsi uyaguquguquka, ndingatsho nokuguqulela simahla iivignettes kwiwebhusayithi esemthethweni yethala leencwadi.

Umzekelo olula wokuguqula idatha ukusuka kububanzi ukuya kwifomathi ende

pivot_longer () - yenza isethi yedatha ixesha elide ngokunciphisa inani leekholomu kunye nokwandisa inani lemiqolo.

R iphakheji tidyr kunye nemisebenzi yayo emitsha pivot_longer kunye pivot_banzi

Ukwenza imizekelo evezwe kwinqaku, kufuneka kuqala udibanise iipakethe eziyimfuneko:

library(tidyr)
library(dplyr)
library(readr)

Masithi sinetheyibhile eneziphumo zophando oluthi (phakathi kwezinye izinto) lubuze abantu ngenkolo yabo nangomvuzo wonyaka:

#> # A tibble: 18 x 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 Agnostic      27        34        60        81        76       137
#>  2 Atheist       12        27        37        52        35        70
#>  3 Buddhist      27        21        30        34        33        58
#>  4 Catholic     418       617       732       670       638      1116
#>  5 Don’t k…      15        14        15        11        10        35
#>  6 Evangel…     575       869      1064       982       881      1486
#>  7 Hindu          1         9         7         9        11        34
#>  8 Histori…     228       244       236       238       197       223
#>  9 Jehovah…      20        27        24        24        21        30
#> 10 Jewish        19        19        25        25        30        95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> #   `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>

Le theyibhile iqulethe idatha yenkolo yabaphenduli kwimiqolo, kwaye amanqanaba engeniso asasazwe kumagama eekholamu. Inani labaphenduli kwicandelo ngalinye ligcinwa kumaxabiso eseli ekudibaneni kwenkolo kunye nenqanaba lengeniso. Ukuzisa itafile kwifomathi ecocekileyo, echanekileyo, kwanele ukuyisebenzisa pivot_longer():

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # … with 170 more rows

Iingxoxo zoMsebenzi pivot_longer()

  • Ingxabano yokuqala iikhola, ichaza ukuba yeyiphi imiqolo ekufuneka idityaniswe. Kule meko, zonke iikholamu ngaphandle ixesha.
  • Impikiswano amagama_ukuya inika igama loguqulo oluya kwenziwa ukusuka kumagama eekholamu esizidibanise.
  • values_to inika igama loguquko oluya kwenziwa kwidatha egcinwe kumaxabiso eeseli zekholamu ezidityanisiweyo.

Iinkcukacha

Lo ngumsebenzi omtsha wepakethe icocekile, ebingafumaneki ngaphambili xa usebenza ngemisebenzi yelifa.

Ubalulo sisakhelo sedatha, umqolo ngamnye ohambelana nekholamu enye kwisakhelo somhla wemveliso entsha, kunye neekholamu ezimbini ezikhethekileyo eziqala ngo:

  • .name iqulathe igama lekholamu yoqobo.
  • .ixabiso iqulathe igama loluhlu oluza kuqulatha amaxabiso eseli.

Imihlathi eseleyo yenkcazo ibonisa indlela umhlathi omtsha oza kubonisa ngayo igama leekholamu ezicinezelweyo ukusuka .name.

Inkcazo ichaza i-metadata egcinwe kwigama lekholomu, kunye nomqolo omnye kwikholamu nganye kunye nekholamu enye yenguqu nganye, idityaniswe negama lekholomu, le nkcazo ingabonakala ididekile okwangoku, kodwa emva kokujonga imizekelo embalwa iya kuba yinto eninzi. icace ngakumbi.

Inqaku lengcaciso kukuba unokubuyisa, ulungise, kwaye uchaze imethadatha entsha yesakhelo sedatha eguqulwayo.

Ukusebenza neenkcukacha xa uguqula itafile ukusuka kwifomathi ebanzi ukuya kwifomathi ende, sebenzisa umsebenzi pivot_longer_spec().

Indlela osebenza ngayo lo msebenzi kukuba ithatha nasiphi na isakhelo somhla kwaye ivelise imetadata yayo ngendlela echazwe ngasentla.

Njengomzekelo, makhe sithathe ukuba ngubani iseti yedatha ebonelelweyo kunye nephakheji icocekile. Le datha iqulethe ulwazi olunikezelwe ngumbutho wezempilo wamazwe ngamazwe malunga nesifo sephepha.

who
#> # A tibble: 7,240 x 60
#>    country iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534
#>    <chr>   <chr> <chr> <int>       <int>        <int>        <int>
#>  1 Afghan… AF    AFG    1980          NA           NA           NA
#>  2 Afghan… AF    AFG    1981          NA           NA           NA
#>  3 Afghan… AF    AFG    1982          NA           NA           NA
#>  4 Afghan… AF    AFG    1983          NA           NA           NA
#>  5 Afghan… AF    AFG    1984          NA           NA           NA
#>  6 Afghan… AF    AFG    1985          NA           NA           NA
#>  7 Afghan… AF    AFG    1986          NA           NA           NA
#>  8 Afghan… AF    AFG    1987          NA           NA           NA
#>  9 Afghan… AF    AFG    1988          NA           NA           NA
#> 10 Afghan… AF    AFG    1989          NA           NA           NA
#> # … with 7,230 more rows, and 53 more variables

Masakhe iinkcukacha zayo.

spec <- who %>%
  pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")

#> # A tibble: 56 x 3
#>    .name        .value name        
#>    <chr>        <chr>  <chr>       
#>  1 new_sp_m014  count  new_sp_m014 
#>  2 new_sp_m1524 count  new_sp_m1524
#>  3 new_sp_m2534 count  new_sp_m2534
#>  4 new_sp_m3544 count  new_sp_m3544
#>  5 new_sp_m4554 count  new_sp_m4554
#>  6 new_sp_m5564 count  new_sp_m5564
#>  7 new_sp_m65   count  new_sp_m65  
#>  8 new_sp_f014  count  new_sp_f014 
#>  9 new_sp_f1524 count  new_sp_f1524
#> 10 new_sp_f2534 count  new_sp_f2534
#> # … with 46 more rows

amasimi lizwe, isoxnumx, isoxnumx sele ziguquguquka. Umsebenzi wethu kukuguqula iikholamu nge entsha_sp_m014 kwi newrel_f65.

Amagama ale kholamu agcina olu lwazi lulandelayo:

  • Isimaphambili new_ ibonisa ukuba ikholamu iqulethe idatha kwiimeko ezintsha zesifo sofuba, umhla okhoyo wangoku uqulethe ulwazi kuphela kwizifo ezitsha, ngoko esi simaphambili kwimeko yangoku ayithwali nayiphi na intsingiselo.
  • sp/rel/sp/ep ichaza indlela yokufumanisa isifo.
  • m/f isini somguli.
  • 014/1524/2535/3544/4554/65 uluhlu lobudala besigulane.

Singahlula le miqolo ngokusebenzisa umsebenzi extract()usebenzisa intetho eqhelekileyo.

spec <- spec %>%
        extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")

#> # A tibble: 56 x 5
#>    .name        .value diagnosis gender age  
#>    <chr>        <chr>  <chr>     <chr>  <chr>
#>  1 new_sp_m014  count  sp        m      014  
#>  2 new_sp_m1524 count  sp        m      1524 
#>  3 new_sp_m2534 count  sp        m      2534 
#>  4 new_sp_m3544 count  sp        m      3544 
#>  5 new_sp_m4554 count  sp        m      4554 
#>  6 new_sp_m5564 count  sp        m      5564 
#>  7 new_sp_m65   count  sp        m      65   
#>  8 new_sp_f014  count  sp        f      014  
#>  9 new_sp_f1524 count  sp        f      1524 
#> 10 new_sp_f2534 count  sp        f      2534 
#> # … with 46 more rows

Nceda uqaphele ikholam .name kufuneka ihlale ingatshintshanga kuba esi sisalathiso sethu kuluhlu lwamagama esiseko sedata.

Isini kunye nobudala (iikholamu ngesini и ubudala) zilungisiwe kwaye ziyaziwa amaxabiso, ngoko kuyacetyiswa ukuba uguqule le kholamu kwizinto:

spec <-  spec %>%
            mutate(
              gender = factor(gender, levels = c("f", "m")),
              age = factor(age, levels = unique(age), ordered = TRUE)
            ) 

Okokugqibela, ukuze sisebenzise iinkcukacha esizenzileyo kwisakhelo somhla wokuqala Ngubani kufuneka sisebenzise ingxabano umzekelo kumsebenzi pivot_longer().

who %>% pivot_longer(spec = spec)

#> # A tibble: 405,440 x 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <fct>  <ord> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # … with 405,430 more rows

Yonke into esisanda kuyenza inokuboniswa ngokwesicwangciso ngolu hlobo lulandelayo:

R iphakheji tidyr kunye nemisebenzi yayo emitsha pivot_longer kunye pivot_banzi

Ukuchazwa usebenzisa amaxabiso amaninzi (.value)

Kulo mzekelo ungentla, ikholamu yenkcazo .ixabiso iqulathe ixabiso elinye kuphela, kwiimeko ezininzi oku kunjalo.

Kodwa ngamanye amaxesha imeko inokuvela xa ufuna ukuqokelela idatha kwiikholamu ezineentlobo ezahlukeneyo zedatha kumaxabiso. Ukusebenzisa umsebenzi welifa spread() oku kuya kuba nzima kakhulu ukwenza.

Lo mzekelo ungezantsi uthathwe iivignettes kwiphakheji idatha yedatha.

Masenze i-dataframe yoqeqesho.

family <- tibble::tribble(
  ~family,  ~dob_child1,  ~dob_child2, ~gender_child1, ~gender_child2,
       1L, "1998-11-26", "2000-01-29",             1L,             2L,
       2L, "1996-06-22",           NA,             2L,             NA,
       3L, "2002-07-11", "2004-04-05",             2L,             2L,
       4L, "2004-10-10", "2009-08-27",             1L,             1L,
       5L, "2000-12-05", "2005-02-28",             2L,             1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)

#> # A tibble: 5 x 5
#>   family dob_child1 dob_child2 gender_child1 gender_child2
#>    <int> <date>     <date>             <int>         <int>
#> 1      1 1998-11-26 2000-01-29             1             2
#> 2      2 1996-06-22 NA                     2            NA
#> 3      3 2002-07-11 2004-04-05             2             2
#> 4      4 2004-10-10 2009-08-27             1             1
#> 5      5 2000-12-05 2005-02-28             2             1

Isakhelo somhla esiyiliweyo siqulathe idatha yabantwana bosapho olunye kumqolo ngamnye. Iintsapho zinokuba nomntwana omnye okanye ababini. Kumntwana ngamnye, idatha inikwe ngomhla wokuzalwa kunye nesini, kwaye idatha yomntwana ngamnye ikwiikholamu ezahlukeneyo; umsebenzi wethu kukuzisa le datha kwifomathi echanekileyo ukuze ihlalutywe.

Nceda uqaphele ukuba sineenguqu ezimbini ezinolwazi malunga nomntwana ngamnye: isini kunye nomhla wokuzalwa (iikholamu ezinesimaphambili Dop ziqulathe umhla wokuzalwa, iikholamu ezinezimaphambili ngesini ziqulathe isini somntwana). Isiphumo esilindelekileyo kukuba kufuneka zivele kwimiqolo eyahlukileyo. Sinokukwenza oku ngokuvelisa inkcazo apho ikholamu .value ziya kuba neentsingiselo ezimbini ezahlukeneyo.

spec <- family %>%
  pivot_longer_spec(-family) %>%
  separate(col = name, into = c(".value", "child"))%>%
  mutate(child = parse_number(child))

#> # A tibble: 4 x 3
#>   .name         .value child
#>   <chr>         <chr>  <dbl>
#> 1 dob_child1    dob        1
#> 2 dob_child2    dob        2
#> 3 gender_child1 gender     1
#> 4 gender_child2 gender     2

Ke, makhe sijonge inyathelo ngenyathelo kwizenzo ezenziwa yile khowudi ingentla.

  • pivot_longer_spec(-family) - yenza inkcazo ecinezela zonke iikholamu ezikhoyo ngaphandle kwekholamu yosapho.
  • separate(col = name, into = c(".value", "child")) - yahlula ikholamu .name, equlathe amagama emihlaba yemvelaphi, usebenzisa i underscore kwaye ingenisa amaxabiso anesiphumo kwimiqolo. .ixabiso и Umntwana.
  • mutate(child = parse_number(child)) - guqula ixabiso lendawo Umntwana ukusuka kumbhalo ukuya kudidi lwedatha yamanani.

Ngoku sinokufaka isiphumo esibalulwayo kwi-dataframe yokuqala kwaye sizise itafile kwifom efunwayo.

family %>% 
    pivot_longer(spec = spec, na.rm = T)

#> # A tibble: 9 x 4
#>   family child dob        gender
#>    <int> <dbl> <date>      <int>
#> 1      1     1 1998-11-26      1
#> 2      1     2 2000-01-29      2
#> 3      2     1 1996-06-22      2
#> 4      3     1 2002-07-11      2
#> 5      3     2 2004-04-05      2
#> 6      4     1 2004-10-10      1
#> 7      4     2 2009-08-27      1
#> 8      5     1 2000-12-05      2
#> 9      5     2 2005-02-28      1

Sisebenzisa ingxoxo na.rm = TRUE, ngenxa yokuba ifom yangoku yedatha inyanzelisa ukudalwa kwemiqolo eyongezelelweyo yoqwalaselo olungekhoyo. Ngokuba Usapho 2 lunomntwana omnye kuphela, na.rm = TRUE iqinisekisa ukuba usapho lwe-2 luya kuba nomqolo omnye kwimveliso.

Ukuguqula izakhelo zomhla ukusuka kwifomati emide ukuya kububanzi

pivot_wider() - luguqulo oluguqukileyo, kwaye ngokuchaseneyo kwandisa inani leekholamu zomhla wesakhelo ngokunciphisa inani lemiqolo.

R iphakheji tidyr kunye nemisebenzi yayo emitsha pivot_longer kunye pivot_banzi

Olu hlobo lwenguqu alufane lusetyenziswe ukuzisa idatha kwifomu echanekileyo, nangona kunjalo, obu buchule bunokuba luncedo ekudaleni iitafile zepivot ezisetyenziswa kumboniso, okanye ukudibanisa nezinye izixhobo.

Ngokwenene imisebenzi pivot_longer() и pivot_wider() zi-symmetrical, kwaye zivelisa izenzo ezichaseneyo enye kwenye, o.k. df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) izakubuyisela idf yoqobo.

Owona mzekelo ulula wokuguqula itafile kwifomati ebanzi

Ukubonisa indlela umsebenzi osebenza ngayo pivot_wider() siya kusebenzisa idataset fish_ukudibana, egcina ulwazi malunga nendlela izikhululo ezahlukeneyo ezirekhoda ngayo ukuhamba kweentlanzi ecaleni komlambo.

#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # … with 104 more rows

Kwiimeko ezininzi, le theyibhile iya kuba nolwazi ngakumbi kwaye kulula ukuyisebenzisa ukuba unikezela ngolwazi lwesikhululo ngasinye kwikholamu eyahlukileyo.

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> # … with 9 more rows, and 1 more variable: MAW <int>

Le datha iseti irekhodi yolwazi kuphela xa iintlanzi zifunyenwe sisikhululo, oko kukuthi. ukuba nayiphi na intlanzi ayizange irekhodwe sisikhululo esithile, ngoko le datha ayiyi kuba kwitafile. Oku kuthetha ukuba isiphumo siya kuzaliswa nge-NA.

Nangona kunjalo, kule meko siyazi ukuba ukungabikho kwerekhodi kuthetha ukuba intlanzi ayizange ibonwe, ngoko sinokusebenzisa ingxabano amaxabiso_gcwalisa kumsebenzi pivot_wider() kwaye ugcwalise la maxabiso angekhoyo ngooziro:

fish_encounters %>% pivot_wider(
  names_from = station, 
  values_from = seen,
  values_fill = list(seen = 0)
)

#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1     0     0     0     0     0
#>  5 4847        1     1      1     0       0     0     0     0     0     0
#>  6 4848        1     1      1     1       0     0     0     0     0     0
#>  7 4849        1     1      0     0       0     0     0     0     0     0
#>  8 4850        1     1      0     1       1     1     1     0     0     0
#>  9 4851        1     1      0     0       0     0     0     0     0     0
#> 10 4854        1     1      0     0       0     0     0     0     0     0
#> # … with 9 more rows, and 1 more variable: MAW <int>

Ukuvelisa igama lekholamu ukusuka kwiintlobo ezininzi zemithombo

Khawucinge ukuba sinetafile equlethe indibaniselwano yemveliso, ilizwe kunye nonyaka. Ukwenza isakhelo somhla wovavanyo, ungaqhuba le khowudi ilandelayo:

df <- expand_grid(
  product = c("A", "B"), 
  country = c("AI", "EI"), 
  year = 2000:2014
) %>%
  filter((product == "A" & country == "AI") | product == "B") %>% 
  mutate(value = rnorm(nrow(.)))

#> # A tibble: 45 x 4
#>    product country  year    value
#>    <chr>   <chr>   <int>    <dbl>
#>  1 A       AI       2000 -2.05   
#>  2 A       AI       2001 -0.676  
#>  3 A       AI       2002  1.60   
#>  4 A       AI       2003 -0.353  
#>  5 A       AI       2004 -0.00530
#>  6 A       AI       2005  0.442  
#>  7 A       AI       2006 -0.610  
#>  8 A       AI       2007 -2.77   
#>  9 A       AI       2008  0.899  
#> 10 A       AI       2009 -0.106  
#> # … with 35 more rows

Umsebenzi wethu kukwandisa isakhelo sedatha ukwenzela ukuba ikholamu enye iqulethe idatha yendibaniselwano nganye yemveliso kunye nelizwe. Ukwenza oku, faka nje ingxabano amagama_asuka i-vector equlethe amagama emihlaba ekufuneka idityaniswe.

df %>% pivot_wider(names_from = c(product, country),
                 values_from = "value")

#> # A tibble: 15 x 4
#>     year     A_AI    B_AI    B_EI
#>    <int>    <dbl>   <dbl>   <dbl>
#>  1  2000 -2.05     0.607   1.20  
#>  2  2001 -0.676    1.65   -0.114 
#>  3  2002  1.60    -0.0245  0.501 
#>  4  2003 -0.353    1.30   -0.459 
#>  5  2004 -0.00530  0.921  -0.0589
#>  6  2005  0.442   -1.55    0.594 
#>  7  2006 -0.610    0.380  -1.28  
#>  8  2007 -2.77     0.830   0.637 
#>  9  2008  0.899    0.0175 -1.30  
#> 10  2009 -0.106   -0.195   1.03  
#> # … with 5 more rows

Unako kwakhona ukufaka iinkcukacha kumsebenzi pivot_wider(). Kodwa xa ingenisiwe pivot_wider() iinkcukacha zenza uguqulelo oluchaseneyo pivot_longer(): Imihlathi ekhankanyiweyo kwi .name, usebenzisa amaxabiso avela .ixabiso kunye nezinye iikholamu.

Kule datha yedatha, unokuvelisa inkcazo yesiko ukuba ufuna ilizwe elinokwenzeka kunye nemveliso yokudibanisa ibe nekholomu yayo, kungekhona nje ekhoyo kwidatha:

spec <- df %>% 
  expand(product, country, .value = "value") %>% 
  unite(".name", product, country, remove = FALSE)

#> # A tibble: 4 x 4
#>   .name product country .value
#>   <chr> <chr>   <chr>   <chr> 
#> 1 A_AI  A       AI      value 
#> 2 A_EI  A       EI      value 
#> 3 B_AI  B       AI      value 
#> 4 B_EI  B       EI      value

df %>% pivot_wider(spec = spec) %>% head()

#> # A tibble: 6 x 5
#>    year     A_AI  A_EI    B_AI    B_EI
#>   <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2000 -2.05       NA  0.607   1.20  
#> 2  2001 -0.676      NA  1.65   -0.114 
#> 3  2002  1.60       NA -0.0245  0.501 
#> 4  2003 -0.353      NA  1.30   -0.459 
#> 5  2004 -0.00530    NA  0.921  -0.0589
#> 6  2005  0.442      NA -1.55    0.594

Imizekelo emininzi ephucukileyo yokusebenza ngengcinga entsha ye-tidyr

Ukucoca idatha usebenzisa i-US Census Income kunye nedatha yeRenti njengomzekelo.

Iseti yedatha us_rente_ingeniso iqulethe ingeniso ephakathi kunye nolwazi lwerenti yelizwe ngalinye e-US ngo-2017 (iseti yedatha ekhoyo kwiphakheji i-tidycensus).

us_rent_income
#> # A tibble: 104 x 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # … with 94 more rows

Kwifom apho idatha igcinwa khona kwi-dataset us_rente_ingeniso ukusebenza nabo akulunganga kakhulu, ngoko singathanda ukwenza isethi yedatha enemiqolo: Ukuqesha, rent_moe, eze, ingeniso_moe. Zininzi iindlela zokwenza le ngcaciso, kodwa eyona ngongoma iphambili kukuba kufuneka sivelise yonke indibaniselwano yamaxabiso aguquguqukayo kunye. uqikelelo/moekwaye emva koko uvelise igama lekholamu.

  spec <- us_rent_income %>% 
    expand(variable, .value = c("estimate", "moe")) %>% 
    mutate(
      .name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
    )

#> # A tibble: 4 x 3
#>   variable .value   .name     
#>   <chr>    <chr>    <chr>     
#> 1 income   estimate income    
#> 2 income   moe      income_moe
#> 3 rent     estimate rent      
#> 4 rent     moe      rent_moe

Ukubonelela ngale ngcaciso pivot_wider() isinika iziphumo esizifunayo:

us_rent_income %>% pivot_wider(spec = spec)

#> # A tibble: 52 x 6
#>    GEOID NAME                 income income_moe  rent rent_moe
#>    <chr> <chr>                 <dbl>      <dbl> <dbl>    <dbl>
#>  1 01    Alabama               24476        136   747        3
#>  2 02    Alaska                32940        508  1200       13
#>  3 04    Arizona               27517        148   972        4
#>  4 05    Arkansas              23789        165   709        5
#>  5 06    California            29454        109  1358        3
#>  6 08    Colorado              32401        109  1125        5
#>  7 09    Connecticut           35326        195  1123        5
#>  8 10    Delaware              31560        247  1076       10
#>  9 11    District of Columbia  43198        681  1424       17
#> 10 12    Florida               25952         70  1077        3
#> # … with 42 more rows

IBhanki yehlabathi

Ngamanye amaxesha ukuzisa idatha kwifomu efunwayo kufuna amanyathelo amaninzi.
Iseti yedatha ibhanki_yelizwe_pop iqulethe idatha yeBhanki yehlabathi malunga nabemi belizwe ngalinye phakathi kuka-2000 kunye no-2018.

#> # A tibble: 1,056 x 20
#>    country indicator `2000` `2001` `2002` `2003`  `2004`  `2005`   `2006`
#>    <chr>   <chr>      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#>  1 ABW     SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4  4.49e+4
#>  2 ABW     SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#>  3 ABW     SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5  1.01e+5
#>  4 ABW     SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0  7.98e-1
#>  5 AFG     SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6  5.93e+6
#>  6 AFG     SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0  4.12e+0
#>  7 AFG     SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7  2.59e+7
#>  8 AFG     SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0  3.23e+0
#>  9 AGO     SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7  1.15e+7
#> 10 AGO     SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0  4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

Injongo yethu kukudala iseti yedatha ecocekileyo kunye noguquko ngalunye kwikholamu yalo. Akukacaci kakuhle ukuba ngawaphi amanyathelo afunekayo, kodwa siza kuqala ngeyona ngxaki icacileyo: unyaka usasazeke kwiikholamu ezininzi.

Ukuze ulungise oku kufuneka usebenzise umsebenzi pivot_longer().

pop2 <- world_bank_pop %>% 
  pivot_longer(`2000`:`2017`, names_to = "year")

#> # A tibble: 19,008 x 4
#>    country indicator   year  value
#>    <chr>   <chr>       <chr> <dbl>
#>  1 ABW     SP.URB.TOTL 2000  42444
#>  2 ABW     SP.URB.TOTL 2001  43048
#>  3 ABW     SP.URB.TOTL 2002  43670
#>  4 ABW     SP.URB.TOTL 2003  44246
#>  5 ABW     SP.URB.TOTL 2004  44669
#>  6 ABW     SP.URB.TOTL 2005  44889
#>  7 ABW     SP.URB.TOTL 2006  44881
#>  8 ABW     SP.URB.TOTL 2007  44686
#>  9 ABW     SP.URB.TOTL 2008  44375
#> 10 ABW     SP.URB.TOTL 2009  44052
#> # … with 18,998 more rows

Isinyathelo esilandelayo kukujonga ukuguquguquka kwesalathisi.
pop2 %>% count(indicator)

#> # A tibble: 4 x 2
#>   indicator       n
#>   <chr>       <int>
#> 1 SP.POP.GROW  4752
#> 2 SP.POP.TOTL  4752
#> 3 SP.URB.GROW  4752
#> 4 SP.URB.TOTL  4752

Apho iSP.POP.GROW ikukukhula kwabemi, iSP.POP.TOTL inabemi bebonke, kunye neSP.URB. * into efanayo, kodwa kuphela kwimimandla yasezidolophini. Masahlule ezi xabiso zibe zimbini eziguquguqukayo: indawo - indawo (iyonke okanye idolophu) kunye noguquko oluqulethe idatha yokwenyani (abemi okanye ukukhula):

pop3 <- pop2 %>% 
  separate(indicator, c(NA, "area", "variable"))

#> # A tibble: 19,008 x 5
#>    country area  variable year  value
#>    <chr>   <chr> <chr>    <chr> <dbl>
#>  1 ABW     URB   TOTL     2000  42444
#>  2 ABW     URB   TOTL     2001  43048
#>  3 ABW     URB   TOTL     2002  43670
#>  4 ABW     URB   TOTL     2003  44246
#>  5 ABW     URB   TOTL     2004  44669
#>  6 ABW     URB   TOTL     2005  44889
#>  7 ABW     URB   TOTL     2006  44881
#>  8 ABW     URB   TOTL     2007  44686
#>  9 ABW     URB   TOTL     2008  44375
#> 10 ABW     URB   TOTL     2009  44052
#> # … with 18,998 more rows

Ngoku ekuphela kwento ekufuneka siyenze kukwahlulahlula umahluko ube ziikholamu ezimbini:

pop3 %>% 
  pivot_wider(names_from = variable, values_from = value)

#> # A tibble: 9,504 x 5
#>    country area  year   TOTL    GROW
#>    <chr>   <chr> <chr> <dbl>   <dbl>
#>  1 ABW     URB   2000  42444  1.18  
#>  2 ABW     URB   2001  43048  1.41  
#>  3 ABW     URB   2002  43670  1.43  
#>  4 ABW     URB   2003  44246  1.31  
#>  5 ABW     URB   2004  44669  0.951 
#>  6 ABW     URB   2005  44889  0.491 
#>  7 ABW     URB   2006  44881 -0.0178
#>  8 ABW     URB   2007  44686 -0.435 
#>  9 ABW     URB   2008  44375 -0.698 
#> 10 ABW     URB   2009  44052 -0.731 
#> # … with 9,494 more rows

Uluhlu lwabafowunelwa

Umzekelo wokugqibela, khawucinge ukuba unoluhlu lwabafowunelwa olukhutshelweyo kwaye ulunamathisele kwiwebhusayithi:

contacts <- tribble(
  ~field, ~value,
  "name", "Jiena McLellan",
  "company", "Toyota", 
  "name", "John Smith", 
  "company", "google", 
  "email", "[email protected]",
  "name", "Huxley Ratcliffe"
)

Ukudwelisa olu luhlu kunzima kakhulu kuba akukho ziguquko ezichonga ukuba yeyiphi idata yeyiphi umfowunelwa. Singakulungisa oku ngokuqaphela ukuba idata yomfowunelwa omtsha ngamnye iqala ngokuthi "igama", ngoko ke sinokwenza isichongi esisodwa kwaye sisongeze ngelinye ixesha umhlathi wendawo uqulethe ixabiso "igama":

contacts <- contacts %>% 
  mutate(
    person_id = cumsum(field == "name")
  )
contacts

#> # A tibble: 6 x 3
#>   field   value            person_id
#>   <chr>   <chr>                <int>
#> 1 name    Jiena McLellan           1
#> 2 company Toyota                   1
#> 3 name    John Smith               2
#> 4 company google                   2
#> 5 email   [email protected]          2
#> 6 name    Huxley Ratcliffe         3

Ngoku ukuba sine-ID eyodwa yomfowunelwa ngamnye, sinokujika intsimi kunye nexabiso libe yimihlathi:

contacts %>% 
  pivot_wider(names_from = field, values_from = value)

#> # A tibble: 3 x 4
#>   person_id name             company email          
#>       <int> <chr>            <chr>   <chr>          
#> 1         1 Jiena McLellan   Toyota  <NA>           
#> 2         2 John Smith       google  [email protected]
#> 3         3 Huxley Ratcliffe <NA>    <NA>

isiphelo

Uluvo lwam lobuqu kukuba umbono omtsha icocekile ngokwenene enembile ngakumbi, kwaye iphezulu kakhulu ekusebenzeni kwimisebenzi yelifa spread() и gather(). Ndiyathemba ukuba eli nqaku likuncede ukuba ujongane nayo pivot_longer() и pivot_wider().

umthombo: www.habr.com

Yongeza izimvo