R package tidyr uye mabasa ayo matsva pivot_longer uye pivot_wider

Package tidyr inosanganisirwa mukati meimwe yeanonyanya kufarirwa maraibhurari mumutauro weR - tidyverse.
Chinangwa chikuru chepakeji ndechekuunza iyo data mufomu chaiyo.

Yatova kuwanikwa paHabré chinyorwa yakatsaurirwa kupakiti iyi, asi yakatanga muna 2015. Uye ndinoda kukuudza nezve shanduko dzazvino, dzakaziviswa mazuva mashoma apfuura nemunyori wayo, Hedley Wickham.

R package tidyr uye mabasa ayo matsva pivot_longer uye pivot_wider

S.J.K.: Kuunganidza () uye kupararira () kuchaderedzwa?

Hadley Wickham: Kusvika kumwero wakati. Isu hatichakurudzire kushandiswa kweaya mabasa uye kugadzirisa tsikidzi mazviri, asi ivo vacharamba varipo mupakeji mune yavo yazvino mamiriro.

Zviri mukati

Kana iwe uchifarira kuongorora data, unogona kunge uchifarira zvangu teregiramu и YouTube channels. Zvizhinji zvemukati zvakatsaurirwa kumutauro weR.

TidyData pfungwa

Chinangwa tidyr - batsira iwe kuunza iyo data kune inonzi yakarongeka fomu. Neat data idata uko:

  • Mutsauko wega wega uri mukoramu.
  • Chimwe nechimwe chekucherechedza itambo.
  • Kukosha kwega kwega isero.

Zviri nyore uye zviri nyore kushanda nedata rinounzwa mune yakarongeka data paunenge uchiongorora.

Mabasa makuru anosanganisirwa mune tidyr package

tidyr ine seti yemabasa akagadzirirwa kushandura matafura:

  • fill() - kuzadza kukosha kwakashaikwa mukoramu neakapfuura kukosha;
  • separate() - kupatsanura munda mumwe kuita akati wandei uchishandisa kupatsanura;
  • unite() - inoita mashandiro ekubatanidza minda yakati wandei kuita imwe, inverse chiito chebasa separate();
  • pivot_longer() - basa rinoshandura data kubva kune yakakura fomati kuenda kune refu fomati;
  • pivot_wider() - basa rinoshandura data kubva kune yakareba fomati kuenda kune yakakura fomati. Iko reverse kushanda kweiyo yakaitwa nebasa pivot_longer().
  • gather()yapera - basa rinoshandura data kubva kune yakakura fomati kuenda kune refu fomati;
  • spread()yapera - basa rinoshandura data kubva kune yakareba fomati kuenda kune yakakura fomati. Iko reverse kushanda kweiyo yakaitwa nebasa gather().

Pfungwa nyowani yekushandura data kubva pakufara kuenda kune refu fomati uye neimwe nzira

Kare, mabasa aishandiswa kune rudzi urwu rwekushandura gather() и spread(). Kwemakore ekuvapo kwemabasa aya, zvakava pachena kuti kune vazhinji vashandisi, kusanganisira munyori wepasuru, mazita emabasa aya uye nharo dzawo aive asina kunyatsojeka, uye zvakakonzera matambudziko mukuawana uye kunzwisisa kuti nderipi remabasa aya anoshandura. date frame kubva pahupamhi kusvika kureba fomati, uye neimwe nzira.

Panyaya iyi, in tidyr Maviri matsva, akakosha mabasa akawedzerwa ayo akagadzirirwa kushandura mazuva mafuremu.

Zvitsva zvitsva pivot_longer() и pivot_wider() vakafemerwa nezvimwe zvezvinhu zviri mupakeji cdata, yakagadzirwa naJohn Mount naNina Zumel.

Kuisa iyo yazvino vhezheni yetidyr 0.8.3.9000

Kuisa iyo nyowani, yazvino vhezheni yepasuru tidyr 0.8.3.9000, pane zvitsva zvinowanikwa, shandisa kodhi inotevera.

devtools::install_github("tidyverse/tidyr")

Panguva yekunyora, aya mabasa anongowanikwa mune dev vhezheni yepakeji paGitHub.

Shanduko kune zvitsva

Muchokwadi, hazvina kuoma kuendesa zvinyorwa zvekare kuti zvishande nemabasa matsva; kuti unzwisise zviri nani, ini ndichatora muenzaniso kubva muzvinyorwa zvemabasa ekare uye kuratidza maitirwo akafanana anoitwa uchishandisa matsva. pivot_*() mabasa.

Shandura fomati yakafara kuita fomati refu.

Muenzaniso kodhi kubva kuunganidza basa zvinyorwa

# example
library(dplyr)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

# old
stocks_gather <- stocks %>% gather(key   = stock, 
                                   value = price, 
                                   -time)

# new
stocks_long   <- stocks %>% pivot_longer(cols      = -time, 
                                       names_to  = "stock", 
                                       values_to = "price")

Kushandura fomati refu kuita yakafara fomati.

Muenzaniso kodhi kubva kuspread function zvinyorwa

# old
stocks_spread <- stocks_gather %>% spread(key = stock, 
                                          value = price) 

# new 
stock_wide    <- stocks_long %>% pivot_wider(names_from  = "stock",
                                            values_from = "price")

Nokuti mumienzaniso iri pamusoro yekushanda nayo pivot_longer() и pivot_wider(), mutafura yepakutanga mumatanda hapana makoramu akanyorwa mumakakava mazita_ku и values_to mazita avo anofanira kunge ari muquotation marks.

Tafura iyo ichakubatsira iwe zvakanyanya nyore kuona kuti ungachinja sei kushanda nepfungwa itsva tidyr.

R package tidyr uye mabasa ayo matsva pivot_longer uye pivot_wider

Cherechedza kubva kumunyori

Zvose zvinyorwa zviri pasi apa zvinogadzirisa, ndingatoti shanduro yekusununguka vignettes kubva kune yepamutemo tidyverse raibhurari webhusaiti.

Muenzaniso wakapfava wekushandura data kubva pahupamhi kuenda kune refu fomati

pivot_longer () - inoita kuti data irebe nekudzikisa huwandu hwemakoramu uye kuwedzera huwandu hwemitsara.

R package tidyr uye mabasa ayo matsva pivot_longer uye pivot_wider

Kuti uite mienzaniso yakaratidzwa muchinyorwa, iwe unofanirwa kutanga wabatanidza mapakeji anodiwa:

library(tidyr)
library(dplyr)
library(readr)

Ngatitii tine tafura ine mhedzisiro yeongororo iyo (pakati pezvimwe zvinhu) yakabvunza vanhu nezvechitendero chavo uye mari yegore:

#> # A tibble: 18 x 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 Agnostic      27        34        60        81        76       137
#>  2 Atheist       12        27        37        52        35        70
#>  3 Buddhist      27        21        30        34        33        58
#>  4 Catholic     418       617       732       670       638      1116
#>  5 Don’t k…      15        14        15        11        10        35
#>  6 Evangel…     575       869      1064       982       881      1486
#>  7 Hindu          1         9         7         9        11        34
#>  8 Histori…     228       244       236       238       197       223
#>  9 Jehovah…      20        27        24        24        21        30
#> 10 Jewish        19        19        25        25        30        95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> #   `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>

Tafura iyi ine data yechitendero chevakapindura mumitsara, uye mazinga emari akapararira pamazita emakoramu. Huwandu hwevakapindura kubva muchikamu chega chega hunochengeterwa mumasero kukosha pamharadzano yechitendero uye chikamu chemari. Kuunza tafura mune yakatsvinda, chaiyo fomati, yakakwana kushandisa pivot_longer():

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # … with 170 more rows

Basa Nharo pivot_longer()

  • Nharo yekutanga makorari, inotsanangura kuti ndeapi makoramu anoda kubatanidzwa. Muchiitiko ichi, mitsara yose kunze nguva.
  • Kupokana mazita_ku inopa zita reshanduko ichagadzirwa kubva mumazita emakoramu atakabatanidza.
  • values_to inopa zita rekuchinja kuchagadzirwa kubva kune data rakachengetwa mumakoshi emasero emakoramu akabatanidzwa.

Specs

Uku ndiko kushanda kutsva kwepakeji tidyr, iyo yaimbove isingawanikwe kana uchishanda nemabasa enhaka.

Tsanangudzo ndeye data furemu, mutsara wega wega unoenderana nekoramu imwe mune itsva yekubuda zuva furemu, uye maviri akakosha makoramu anotanga ne:

  • .name ine zita rekutanga rechikamu.
  • .value rine zita rekoramu rinenge riine macell values.

Makoramu asara eiyo yakatarwa anoratidza kuti iyo nyowani icharatidza sei zita remakoramu akadzvanywa kubva .name.

Tsanangudzo inotsanangura metadata yakachengetwa muzita rekoramu, ine mutsara wekoramu yega yega uye imwe koramu kune imwe neimwe shanduko, yakasanganiswa nezita rekoramu, iyi tsananguro inogona kuratidzika kuvhiringa panguva ino, asi mushure mekutarisa mienzaniso mishoma inozove yakawanda. zvakajeka.

Iyo poindi yeiyo yakatarwa ndeyekuti iwe unogona kudzoreredza, kugadzirisa, uye kutsanangura metadata nyowani yedataframe iri kushandurwa.

Kuti ushande nezvakatsanangurwa paunenge uchishandura tafura kubva kune yakakura fomati kuenda kune yakareba fomati, shandisa basa racho pivot_longer_spec().

Mashandire anoita basa iri nderekuti rinotora chero zuva rayo uye rinogadzira metadata yaro nenzira yakatsanangurwa pamusoro.

Semuenzaniso, ngatitorei iyo dataset yakapihwa nepakeji tidyr. Iyi dataset ine ruzivo rwakapihwa nesangano rehutano repasi rose pamusoro pezviitiko zvechirwere cherurindi.

who
#> # A tibble: 7,240 x 60
#>    country iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534
#>    <chr>   <chr> <chr> <int>       <int>        <int>        <int>
#>  1 Afghan… AF    AFG    1980          NA           NA           NA
#>  2 Afghan… AF    AFG    1981          NA           NA           NA
#>  3 Afghan… AF    AFG    1982          NA           NA           NA
#>  4 Afghan… AF    AFG    1983          NA           NA           NA
#>  5 Afghan… AF    AFG    1984          NA           NA           NA
#>  6 Afghan… AF    AFG    1985          NA           NA           NA
#>  7 Afghan… AF    AFG    1986          NA           NA           NA
#>  8 Afghan… AF    AFG    1987          NA           NA           NA
#>  9 Afghan… AF    AFG    1988          NA           NA           NA
#> 10 Afghan… AF    AFG    1989          NA           NA           NA
#> # … with 7,230 more rows, and 53 more variables

Ngativake matauriro ayo.

spec <- who %>%
  pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")

#> # A tibble: 56 x 3
#>    .name        .value name        
#>    <chr>        <chr>  <chr>       
#>  1 new_sp_m014  count  new_sp_m014 
#>  2 new_sp_m1524 count  new_sp_m1524
#>  3 new_sp_m2534 count  new_sp_m2534
#>  4 new_sp_m3544 count  new_sp_m3544
#>  5 new_sp_m4554 count  new_sp_m4554
#>  6 new_sp_m5564 count  new_sp_m5564
#>  7 new_sp_m65   count  new_sp_m65  
#>  8 new_sp_f014  count  new_sp_f014 
#>  9 new_sp_f1524 count  new_sp_f1524
#> 10 new_sp_f2534 count  new_sp_f2534
#> # … with 46 more rows

minda nyika, svc, svc zvatove zvakasiyana. Basa redu nderekupeta makoramu nawo new_sp_m014 pamusoro newrel_f65.

Mazita emakoramu aya anochengeta mashoko anotevera:

  • Prefix new_ inoratidza kuti iyo column ine data pamusoro pezviitiko zvitsva zvechirwere chetachiona, zuva remazuva ano rine ruzivo chete pamusoro pezvirwere zvitsva, saka chivakashure ichi mumamiriro ezvinhu azvino hachina chirevo.
  • sp/rel/sp/ep inotsanangura nzira yekuongorora chirwere.
  • m/f hutano hwemurwere.
  • 014/1524/2535/3544/4554/65 zera remurwere.

Tinogona kupatsanura aya makoramu tichishandisa basa extract()kushandisa kutaura nguva dzose.

spec <- spec %>%
        extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")

#> # A tibble: 56 x 5
#>    .name        .value diagnosis gender age  
#>    <chr>        <chr>  <chr>     <chr>  <chr>
#>  1 new_sp_m014  count  sp        m      014  
#>  2 new_sp_m1524 count  sp        m      1524 
#>  3 new_sp_m2534 count  sp        m      2534 
#>  4 new_sp_m3544 count  sp        m      3544 
#>  5 new_sp_m4554 count  sp        m      4554 
#>  6 new_sp_m5564 count  sp        m      5564 
#>  7 new_sp_m65   count  sp        m      65   
#>  8 new_sp_f014  count  sp        f      014  
#>  9 new_sp_f1524 count  sp        f      1524 
#> 10 new_sp_f2534 count  sp        f      2534 
#> # … with 46 more rows

Ndapota cherechedza chikamu .name inofanira kuramba isina kuchinjwa sezvo iyi indekisi yedu mumazita emakoramu ekutanga dataset.

Gender uye zera (columns hukama и zera) dzakagadzirisa uye dzinozivikanwa hunhu, saka zvinokurudzirwa kushandura aya makoramu kuita zvinhu:

spec <-  spec %>%
            mutate(
              gender = factor(gender, levels = c("f", "m")),
              age = factor(age, levels = unique(age), ordered = TRUE)
            ) 

Chekupedzisira, kuti tishandise iyo yakatarwa yatakagadzira kune yekutanga zuva furemu ani tinofanira kushandisa nharo spec mubasa pivot_longer().

who %>% pivot_longer(spec = spec)

#> # A tibble: 405,440 x 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <fct>  <ord> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # … with 405,430 more rows

Zvese zvatangoita zvinogona kuratidzwa nenzira inotevera:

R package tidyr uye mabasa ayo matsva pivot_longer uye pivot_wider

Tsanangudzo uchishandisa maitiro akawanda (.value)

Mumuenzaniso uri pamusoro, mutsara wezvinyorwa .value yaingova noukoshi humwe chete, kazhinji kacho ndizvo zvazviri.

Asi dzimwe nguva mamiriro ezvinhu anogona kuitika kana iwe uchida kuunganidza data kubva kumakoroni ane akasiyana marudzi e data muhukoshi. Kushandisa basa renhaka spread() izvi zvingava zvakaoma kuita.

Muenzaniso uri pasi apa unotorwa kubva vignettes ku package data.table.

Ngatigadzirei dataframe yekudzidzira.

family <- tibble::tribble(
  ~family,  ~dob_child1,  ~dob_child2, ~gender_child1, ~gender_child2,
       1L, "1998-11-26", "2000-01-29",             1L,             2L,
       2L, "1996-06-22",           NA,             2L,             NA,
       3L, "2002-07-11", "2004-04-05",             2L,             2L,
       4L, "2004-10-10", "2009-08-27",             1L,             1L,
       5L, "2000-12-05", "2005-02-28",             2L,             1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)

#> # A tibble: 5 x 5
#>   family dob_child1 dob_child2 gender_child1 gender_child2
#>    <int> <date>     <date>             <int>         <int>
#> 1      1 1998-11-26 2000-01-29             1             2
#> 2      2 1996-06-22 NA                     2            NA
#> 3      3 2002-07-11 2004-04-05             2             2
#> 4      4 2004-10-10 2009-08-27             1             1
#> 5      5 2000-12-05 2005-02-28             2             1

Date rakagadzirwa rine data revana vemhuri imwe mumutsara wega wega. Mhuri dzinogona kuva nemwana mumwe kana vaviri. Kumwana wega wega, data rinopihwa pazuva rekuzvarwa uye murume kana mukadzi, uye data remwana wega wega riri mumakoramu akasiyana; basa redu nderekuunza iyi data kune iyo chaiyo fomati kuti iongororwe.

Ndokumbira utarise kuti isu tine maviri akasiyana ane ruzivo nezve mwana wega wega: murume kana mukadzi uye zuva rekuzvarwa (makoramu ane prefix dop zvine zuva rekuzvarwa, makoramu ane chivakashure hukama ine bonde remwana). Mhedzisiro inotarisirwa ndeyekuti dzinofanirwa kuoneka muzvikamu zvakasiyana. Tinogona kuita izvi nekugadzira iyo yakatarwa iyo iyo column .value zvichava nezvirevo zviviri zvakasiyana.

spec <- family %>%
  pivot_longer_spec(-family) %>%
  separate(col = name, into = c(".value", "child"))%>%
  mutate(child = parse_number(child))

#> # A tibble: 4 x 3
#>   .name         .value child
#>   <chr>         <chr>  <dbl>
#> 1 dob_child1    dob        1
#> 2 dob_child2    dob        2
#> 3 gender_child1 gender     1
#> 4 gender_child2 gender     2

Saka, ngatitorei nhanho-nhanho kutarisa zviito zvinoitwa nekodhi iri pamusoro.

  • pivot_longer_spec(-family) - gadzira chirevo chinomanikidza makoramu ese aripo kunze kwekoramu yemhuri.
  • separate(col = name, into = c(".value", "child")) - patsanura mbiru .name, iyo ine mazita enzvimbo dzenzvimbo, uchishandisa iyo underscore uye nekupinda iyo inokonzeresa kukosha mumakoramu. .value и Mwana.
  • mutate(child = parse_number(child)) - shandura maitiro emunda Mwana kubva pane zvinyorwa kusvika pamhando yedata data.

Iye zvino isu tinokwanisa kushandisa iyo inoguma yakatarwa kune yekutanga dataframe uye kuunza tafura kune yaunoda fomu.

family %>% 
    pivot_longer(spec = spec, na.rm = T)

#> # A tibble: 9 x 4
#>   family child dob        gender
#>    <int> <dbl> <date>      <int>
#> 1      1     1 1998-11-26      1
#> 2      1     2 2000-01-29      2
#> 3      2     1 1996-06-22      2
#> 4      3     1 2002-07-11      2
#> 5      3     2 2004-04-05      2
#> 6      4     1 2004-10-10      1
#> 7      4     2 2009-08-27      1
#> 8      5     1 2000-12-05      2
#> 9      5     2 2005-02-28      1

Tinoshandisa nharo na.rm = TRUE, nokuti chimiro chemazuva ano che data chinomanikidzira kusikwa kwemitsara yekuwedzera kune zvisizvo zvekucherechedza. Nokuti mhuri 2 ine mwana mumwe chete, na.rm = TRUE inovimbisa kuti mhuri 2 ichava nemutsara mumwe mukubuda.

Kushandura mafaremu emazuva kubva kureba kuenda kune yakafara fomati

pivot_wider() - ndiyo inverse shanduko, uye zvinopesana inowedzera huwandu hwemakoramu edeti furemu nekudzikisa huwandu hwemitsara.

R package tidyr uye mabasa ayo matsva pivot_longer uye pivot_wider

Rudzi urwu rweshanduko haruwanzo shandiswa kuunza data muchimiro chakarurama, zvisinei, nzira iyi inogona kubatsira pakugadzira matafura epivot anoshandiswa mukuratidzira, kana kusanganisa nemamwe maturusi.

Chaizvoizvo mabasa pivot_longer() и pivot_wider() ane symmetrical, uye anogadzira zviito zvinopesana kune mumwe nemumwe, kureva: df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) ichadzorera df yepakutanga.

Muenzaniso wakapfava wekushandura tafura kune yakakura fomati

Kuratidza kuti basa racho rinoshanda sei pivot_wider() isu tichashandisa dataset hove_kusangana, iyo inochengetedza ruzivo rwekuti zviteshi zvakasiyana-siyana zvinonyora sei kufamba kwehove murwizi.

#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # … with 104 more rows

Kazhinji, tafura iyi ichave inodzidzisa uye iri nyore kushandisa kana iwe ukapa ruzivo rwechiteshi chimwe nechimwe mumutsara wakasiyana.

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> # … with 9 more rows, and 1 more variable: MAW <int>

Iyi data seti inongorekodha ruzivo kana hove dzaonekwa nechiteshi, i.e. kana chero hove isina kunyorwa neimwe chiteshi, saka iyi data haizove mutafura. Izvi zvinoreva kuti zvinobuda zvichazadzwa neNA.

Zvisinei, munyaya iyi tinoziva kuti kusavapo kwechinyorwa kunoreva kuti hove haina kuonekwa, saka tinogona kushandisa nharo values_fill mubasa pivot_wider() uye zadza izvi zvisipo ne zeros:

fish_encounters %>% pivot_wider(
  names_from = station, 
  values_from = seen,
  values_fill = list(seen = 0)
)

#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1     0     0     0     0     0
#>  5 4847        1     1      1     0       0     0     0     0     0     0
#>  6 4848        1     1      1     1       0     0     0     0     0     0
#>  7 4849        1     1      0     0       0     0     0     0     0     0
#>  8 4850        1     1      0     1       1     1     1     0     0     0
#>  9 4851        1     1      0     0       0     0     0     0     0     0
#> 10 4854        1     1      0     0       0     0     0     0     0     0
#> # … with 9 more rows, and 1 more variable: MAW <int>

Kugadzira zita rekoramu kubva kune akawanda masosi akasiyana

Fungidzira isu tine tafura ine musanganiswa wechigadzirwa, nyika uye gore. Kuti ugadzire zuva rekuyedza, unogona kumhanyisa kodhi inotevera:

df <- expand_grid(
  product = c("A", "B"), 
  country = c("AI", "EI"), 
  year = 2000:2014
) %>%
  filter((product == "A" & country == "AI") | product == "B") %>% 
  mutate(value = rnorm(nrow(.)))

#> # A tibble: 45 x 4
#>    product country  year    value
#>    <chr>   <chr>   <int>    <dbl>
#>  1 A       AI       2000 -2.05   
#>  2 A       AI       2001 -0.676  
#>  3 A       AI       2002  1.60   
#>  4 A       AI       2003 -0.353  
#>  5 A       AI       2004 -0.00530
#>  6 A       AI       2005  0.442  
#>  7 A       AI       2006 -0.610  
#>  8 A       AI       2007 -2.77   
#>  9 A       AI       2008  0.899  
#> 10 A       AI       2009 -0.106  
#> # … with 35 more rows

Basa redu nderekuwedzera iyo data data kuitira kuti imwe koramu ine data yemusanganiswa wega wega wechigadzirwa nenyika. Kuti uite izvi, ingopfuura munharo mazita_kubva vhekita rine mazita eminda ichabatanidzwa.

df %>% pivot_wider(names_from = c(product, country),
                 values_from = "value")

#> # A tibble: 15 x 4
#>     year     A_AI    B_AI    B_EI
#>    <int>    <dbl>   <dbl>   <dbl>
#>  1  2000 -2.05     0.607   1.20  
#>  2  2001 -0.676    1.65   -0.114 
#>  3  2002  1.60    -0.0245  0.501 
#>  4  2003 -0.353    1.30   -0.459 
#>  5  2004 -0.00530  0.921  -0.0589
#>  6  2005  0.442   -1.55    0.594 
#>  7  2006 -0.610    0.380  -1.28  
#>  8  2007 -2.77     0.830   0.637 
#>  9  2008  0.899    0.0175 -1.30  
#> 10  2009 -0.106   -0.195   1.03  
#> # … with 5 more rows

Iwe unogona zvakare kushandisa zvakatemwa kune basa pivot_wider(). Asi kana yatumirwa pivot_wider() iyo yakatarwa inoita shanduko yakapesana pivot_longer(): Makoramu akatsanangurwa mukati .name, uchishandisa kukosha kubva .value nemamwe makoramu.

Kune iyi dhatabheti, unogona kugadzira yakasarudzika kana iwe uchida kuti nyika yese inobvira uye chigadzirwa musanganiswa ive nekoramu yayo, kwete chete iyo iripo mune data:

spec <- df %>% 
  expand(product, country, .value = "value") %>% 
  unite(".name", product, country, remove = FALSE)

#> # A tibble: 4 x 4
#>   .name product country .value
#>   <chr> <chr>   <chr>   <chr> 
#> 1 A_AI  A       AI      value 
#> 2 A_EI  A       EI      value 
#> 3 B_AI  B       AI      value 
#> 4 B_EI  B       EI      value

df %>% pivot_wider(spec = spec) %>% head()

#> # A tibble: 6 x 5
#>    year     A_AI  A_EI    B_AI    B_EI
#>   <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2000 -2.05       NA  0.607   1.20  
#> 2  2001 -0.676      NA  1.65   -0.114 
#> 3  2002  1.60       NA -0.0245  0.501 
#> 4  2003 -0.353      NA  1.30   -0.459 
#> 5  2004 -0.00530    NA  0.921  -0.0589
#> 6  2005  0.442      NA -1.55    0.594

Mienzaniso yakati wandei yekushanda neiyo nyowani tidyr pfungwa

Kuchenesa data uchishandisa iyo US Census Income uye Rent dataset semuenzaniso.

Data set isu_yerent_mari ine mari yepakati uye ruzivo rwekurenda kudunhu rega rega muUS ye2017 (data set inowanikwa mupakeji tidycensus).

us_rent_income
#> # A tibble: 104 x 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # … with 94 more rows

Mune fomu iyo iyo data inochengetwa mune dataset isu_yerent_mari kushanda navo kunonetsa zvakanyanya, saka tinoda kugadzira data seti ine makoramu: rendi, rent_moe, uya, mari_moe. Pane nzira dzakawanda dzekugadzira iyi dhizaini, asi chinhu chikuru ndechekuti isu tinofanirwa kugadzira ese musanganiswa wezvakasiyana kukosha uye estimate/moeuye wobva wagadzira zita rekoramu.

  spec <- us_rent_income %>% 
    expand(variable, .value = c("estimate", "moe")) %>% 
    mutate(
      .name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
    )

#> # A tibble: 4 x 3
#>   variable .value   .name     
#>   <chr>    <chr>    <chr>     
#> 1 income   estimate income    
#> 2 income   moe      income_moe
#> 3 rent     estimate rent      
#> 4 rent     moe      rent_moe

Kupa iyi specification pivot_wider() inotipa mhedzisiro yatiri kutsvaga:

us_rent_income %>% pivot_wider(spec = spec)

#> # A tibble: 52 x 6
#>    GEOID NAME                 income income_moe  rent rent_moe
#>    <chr> <chr>                 <dbl>      <dbl> <dbl>    <dbl>
#>  1 01    Alabama               24476        136   747        3
#>  2 02    Alaska                32940        508  1200       13
#>  3 04    Arizona               27517        148   972        4
#>  4 05    Arkansas              23789        165   709        5
#>  5 06    California            29454        109  1358        3
#>  6 08    Colorado              32401        109  1125        5
#>  7 09    Connecticut           35326        195  1123        5
#>  8 10    Delaware              31560        247  1076       10
#>  9 11    District of Columbia  43198        681  1424       17
#> 10 12    Florida               25952         70  1077        3
#> # … with 42 more rows

The World Bank

Dzimwe nguva kuunza data yakaiswa mufomu yaunoda kunoda matanho akati wandei.
Dataset world_bank_pop ine World Bank data pamusoro pehuwandu hwenyika imwe neimwe pakati pa2000 na2018.

#> # A tibble: 1,056 x 20
#>    country indicator `2000` `2001` `2002` `2003`  `2004`  `2005`   `2006`
#>    <chr>   <chr>      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#>  1 ABW     SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4  4.49e+4
#>  2 ABW     SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#>  3 ABW     SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5  1.01e+5
#>  4 ABW     SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0  7.98e-1
#>  5 AFG     SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6  5.93e+6
#>  6 AFG     SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0  4.12e+0
#>  7 AFG     SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7  2.59e+7
#>  8 AFG     SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0  3.23e+0
#>  9 AGO     SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7  1.15e+7
#> 10 AGO     SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0  4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

Chinangwa chedu ndechekugadzira yakatsvinda data seti ine imwe neimwe shanduko mukoramu yayo. Hazvinyatso kujeka kuti ndeapi matanho anodiwa, asi isu tichatanga nedambudziko riri pachena: gore rinopararira pamakoramu akawanda.

Kuti ugadzirise izvi unoda kushandisa basa pivot_longer().

pop2 <- world_bank_pop %>% 
  pivot_longer(`2000`:`2017`, names_to = "year")

#> # A tibble: 19,008 x 4
#>    country indicator   year  value
#>    <chr>   <chr>       <chr> <dbl>
#>  1 ABW     SP.URB.TOTL 2000  42444
#>  2 ABW     SP.URB.TOTL 2001  43048
#>  3 ABW     SP.URB.TOTL 2002  43670
#>  4 ABW     SP.URB.TOTL 2003  44246
#>  5 ABW     SP.URB.TOTL 2004  44669
#>  6 ABW     SP.URB.TOTL 2005  44889
#>  7 ABW     SP.URB.TOTL 2006  44881
#>  8 ABW     SP.URB.TOTL 2007  44686
#>  9 ABW     SP.URB.TOTL 2008  44375
#> 10 ABW     SP.URB.TOTL 2009  44052
#> # … with 18,998 more rows

Nhanho inotevera ndeyekutarisa chiratidzo chekuchinja.
pop2 %>% count(indicator)

#> # A tibble: 4 x 2
#>   indicator       n
#>   <chr>       <int>
#> 1 SP.POP.GROW  4752
#> 2 SP.POP.TOTL  4752
#> 3 SP.URB.GROW  4752
#> 4 SP.URB.TOTL  4752

Apo SP.POP.GROW iri kukura kwevanhu, SP.POP.TOTL ihuwandu hwevanhu, uye SP.URB. * chinhu chimwe chete, asi kunzvimbo dzemumaguta chete. Ngatipatsanurei hunhu uhwu muzvikamu zviviri: nharaunda - nzvimbo (yakazara kana yedhorobha) uye shanduko ine data chaiyo (huwandu kana kukura):

pop3 <- pop2 %>% 
  separate(indicator, c(NA, "area", "variable"))

#> # A tibble: 19,008 x 5
#>    country area  variable year  value
#>    <chr>   <chr> <chr>    <chr> <dbl>
#>  1 ABW     URB   TOTL     2000  42444
#>  2 ABW     URB   TOTL     2001  43048
#>  3 ABW     URB   TOTL     2002  43670
#>  4 ABW     URB   TOTL     2003  44246
#>  5 ABW     URB   TOTL     2004  44669
#>  6 ABW     URB   TOTL     2005  44889
#>  7 ABW     URB   TOTL     2006  44881
#>  8 ABW     URB   TOTL     2007  44686
#>  9 ABW     URB   TOTL     2008  44375
#> 10 ABW     URB   TOTL     2009  44052
#> # … with 18,998 more rows

Iye zvino zvatinofanira kuita kupatsanura shanduko kuita makoramu maviri:

pop3 %>% 
  pivot_wider(names_from = variable, values_from = value)

#> # A tibble: 9,504 x 5
#>    country area  year   TOTL    GROW
#>    <chr>   <chr> <chr> <dbl>   <dbl>
#>  1 ABW     URB   2000  42444  1.18  
#>  2 ABW     URB   2001  43048  1.41  
#>  3 ABW     URB   2002  43670  1.43  
#>  4 ABW     URB   2003  44246  1.31  
#>  5 ABW     URB   2004  44669  0.951 
#>  6 ABW     URB   2005  44889  0.491 
#>  7 ABW     URB   2006  44881 -0.0178
#>  8 ABW     URB   2007  44686 -0.435 
#>  9 ABW     URB   2008  44375 -0.698 
#> 10 ABW     URB   2009  44052 -0.731 
#> # … with 9,494 more rows

Mazita ekubatana

Mumwe muenzaniso wekupedzisira, fungidzira uine runyorwa rwekufona rwawakakopa uye nekumisa kubva pawebhusaiti:

contacts <- tribble(
  ~field, ~value,
  "name", "Jiena McLellan",
  "company", "Toyota", 
  "name", "John Smith", 
  "company", "google", 
  "email", "[email protected]",
  "name", "Huxley Ratcliffe"
)

Kunyora rondedzero iyi kwakaoma nekuti hapana musiyano unoratidza kuti ndeupi data nderaani. Isu tinogona kugadzirisa izvi nekucherechedza kuti data reumwe neumwe mutsva rinotanga ne "zita", saka isu tinokwanisa kugadzira yakasarudzika identifier uye nekuiwedzera imwe neimwe nguva iyo munda column ine kukosha "zita":

contacts <- contacts %>% 
  mutate(
    person_id = cumsum(field == "name")
  )
contacts

#> # A tibble: 6 x 3
#>   field   value            person_id
#>   <chr>   <chr>                <int>
#> 1 name    Jiena McLellan           1
#> 2 company Toyota                   1
#> 3 name    John Smith               2
#> 4 company google                   2
#> 5 email   [email protected]          2
#> 6 name    Huxley Ratcliffe         3

Iye zvino zvatine ID yakasarudzika kune yega yega, tinogona kushandura munda uye kukosha kuita makoramu:

contacts %>% 
  pivot_wider(names_from = field, values_from = value)

#> # A tibble: 3 x 4
#>   person_id name             company email          
#>       <int> <chr>            <chr>   <chr>          
#> 1         1 Jiena McLellan   Toyota  <NA>           
#> 2         2 John Smith       google  [email protected]
#> 3         3 Huxley Ratcliffe <NA>    <NA>

mhedziso

Maonero angu ndeekuti pfungwa itsva tidyr zvechokwadi zvakanyanya intuitive, uye zvakanyanya kukwirira mukushanda kune legacy mabasa spread() и gather(). Ndinovimba kuti chinyorwa ichi chakakubatsira kubata nacho pivot_longer() и pivot_wider().

Source: www.habr.com

Voeg