R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider

Pob tidyr suav nrog lub hauv paus ntawm ib lub tsev qiv ntawv nrov tshaj plaws hauv R hom - tidyverse.
Lub hom phiaj tseem ceeb ntawm pob yog coj cov ntaub ntawv mus rau hauv daim ntawv raug.

Twb muaj nyob rau ntawm Habre luam tawm mob siab rau lub pob no, tab sis nws hnub rov qab mus rau 2015. Thiab kuv xav qhia koj txog qhov kev hloov pauv tam sim no, uas tau tshaj tawm ob peb hnub dhau los los ntawm nws tus kws sau ntawv, Hedley Wickham.

R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider

S.J.K.: Puas yuav sau ( ) thiab kis ( ) raug txiav tawm?

Hadley Wickham: Rau qee yam. Peb yuav tsis pom zoo kom siv cov haujlwm no thiab kho cov kab mob hauv lawv, tab sis lawv tseem yuav muaj nyob hauv pob hauv lawv lub xeev tam sim no.

Txheem

Yog tias koj txaus siab rau cov ntaub ntawv tsom xam, tej zaum koj yuav txaus siab rau kuv telegram и youtube cov channel. Cov ntsiab lus feem ntau yog mob siab rau R hom lus.

TidyData tswvyim

Lub hom phiaj tidyr - pab koj nqa cov ntaub ntawv mus rau lub npe hu ua zoo zoo. Cov ntaub ntawv zoo yog cov ntaub ntawv uas:

  • Txhua qhov sib txawv yog nyob rau hauv ib kab.
  • Txhua qhov kev soj ntsuam yog ib txoj hlua.
  • Txhua tus nqi yog cell.

Nws yog qhov yooj yim dua thiab yooj yim dua los ua haujlwm nrog cov ntaub ntawv uas tau nthuav tawm hauv cov ntaub ntawv huv si thaum ua kev tshuaj xyuas.

Cov haujlwm tseem ceeb suav nrog hauv pob tidyr

tidyr muaj cov haujlwm tsim los hloov cov ntxhuav:

  • fill() - sau cov txiaj ntsig uas ploj lawm hauv ib kab nrog cov txiaj ntsig dhau los;
  • separate() - splits ib daim teb rau hauv ob peb siv ib tug separator;
  • unite() - ua haujlwm ntawm kev sib txuas ntau qhov chaw rau hauv ib qho, qhov cuam tshuam ntawm kev ua haujlwm separate();
  • pivot_longer() - ib txoj haujlwm uas hloov cov ntaub ntawv los ntawm ntau hom ntawv mus rau hom ntawv ntev;
  • pivot_wider() - muaj nuj nqi uas hloov cov ntaub ntawv los ntawm hom ntev mus rau hom dav. Rov qab ua haujlwm ntawm ib qho ua los ntawm kev ua haujlwm pivot_longer().
  • gather()dhau lawm - ib txoj haujlwm uas hloov cov ntaub ntawv los ntawm ntau hom ntawv mus rau hom ntawv ntev;
  • spread()dhau lawm - muaj nuj nqi uas hloov cov ntaub ntawv los ntawm hom ntev mus rau hom dav. Rov qab ua haujlwm ntawm ib qho ua los ntawm kev ua haujlwm gather().

Lub tswv yim tshiab rau kev hloov cov ntaub ntawv los ntawm dav mus rau ntev hom thiab rov ua dua

Yav dhau los, kev ua haujlwm tau siv rau hom kev hloov pauv no gather() и spread(). Tau ntau xyoo dhau los ntawm cov haujlwm no, nws tau pom tseeb tias rau cov neeg siv feem ntau, suav nrog tus sau lub pob, cov npe ntawm cov haujlwm no thiab lawv cov lus sib cav tsis pom tseeb, thiab ua rau muaj teeb meem nrhiav lawv thiab nkag siab qhov twg ntawm cov haujlwm no hloov pauv. ib hnub thav duab los ntawm dav mus rau ntev hom, thiab vice versa.

Hauv qhov no, hauv tidyr Ob txoj haujlwm tshiab, tseem ceeb tau ntxiv uas tau tsim los hloov cov hnub thav duab.

Cov yam ntxwv tshiab pivot_longer() и pivot_wider() tau txais kev tshoov siab los ntawm qee yam ntawm cov yam ntxwv hauv pob cdata, tsim los ntawm John Mount thiab Nina Zumel.

Txhim kho qhov tam sim no version ntawm tidyr 0.8.3.9000

Txhawm rau nruab qhov tshiab, feem ntau tam sim no ntawm lub pob tidyr 0.8.3.9000, qhov twg cov yam ntxwv tshiab muaj, siv cov cai hauv qab no.

devtools::install_github("tidyverse/tidyr")

Thaum lub sijhawm sau ntawv, cov haujlwm no tsuas yog muaj nyob rau hauv dev version ntawm pob ntawm GitHub.

Hloov mus rau cov yam ntxwv tshiab

Qhov tseeb, hloov cov ntawv qub los ua haujlwm nrog cov haujlwm tshiab tsis yog qhov nyuaj, kom nkag siab ntau dua, kuv yuav ua piv txwv los ntawm cov ntaub ntawv ntawm cov haujlwm qub thiab qhia tias tib txoj haujlwm ua haujlwm li cas siv cov tshiab. pivot_*() kev ua haujlwm.

Hloov hom ntawv dav mus rau hom ntawv ntev.

Piv txwv code los ntawm cov ntaub ntawv sau ua haujlwm

# example
library(dplyr)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

# old
stocks_gather <- stocks %>% gather(key   = stock, 
                                   value = price, 
                                   -time)

# new
stocks_long   <- stocks %>% pivot_longer(cols      = -time, 
                                       names_to  = "stock", 
                                       values_to = "price")

Hloov hom ntawv ntev mus rau hom dav.

Piv txwv code los ntawm cov ntaub ntawv nthuav dav

# old
stocks_spread <- stocks_gather %>% spread(key = stock, 
                                          value = price) 

# new 
stock_wide    <- stocks_long %>% pivot_wider(names_from  = "stock",
                                            values_from = "price")

Vim hauv cov piv txwv saum toj no ntawm kev ua haujlwm nrog pivot_longer() и pivot_wider(), nyob rau hauv thawj lub rooj stocks tsis muaj kab ntawv teev nyob rau hauv kev sib cav npe_to и values_rau lawv cov npe yuav tsum muaj nyob rau hauv cov lus hais.

Ib lub rooj uas yuav pab tau koj yooj yim tshaj plaws kom paub yuav ua li cas hloov mus ua hauj lwm nrog lub tswv yim tshiab tidyr.

R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider

Nco tseg los ntawm tus sau

Tag nrho cov ntawv hauv qab no yog yoog raws, kuv txawm hais tias dawb txhais lus vignettes los ntawm official tidyverse tsev qiv ntawv lub vev xaib.

Ib qho piv txwv yooj yim ntawm kev hloov cov ntaub ntawv los ntawm dav mus rau hom ntawv ntev

pivot_longer () - ua kom cov ntaub ntawv teev ntev dua los ntawm kev txo tus naj npawb ntawm kab thiab nce cov kab.

R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider

Txhawm rau khiav cov piv txwv hais hauv kab lus, koj yuav tsum xub txuas cov pob khoom tsim nyog:

library(tidyr)
library(dplyr)
library(readr)

Cia peb hais tias peb muaj ib lub rooj nrog cov txiaj ntsig ntawm kev tshawb fawb uas (nrog rau lwm yam) nug tib neeg txog lawv txoj kev ntseeg thiab cov nyiaj tau los txhua xyoo:

#> # A tibble: 18 x 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 Agnostic      27        34        60        81        76       137
#>  2 Atheist       12        27        37        52        35        70
#>  3 Buddhist      27        21        30        34        33        58
#>  4 Catholic     418       617       732       670       638      1116
#>  5 Don’t k…      15        14        15        11        10        35
#>  6 Evangel…     575       869      1064       982       881      1486
#>  7 Hindu          1         9         7         9        11        34
#>  8 Histori…     228       244       236       238       197       223
#>  9 Jehovah…      20        27        24        24        21        30
#> 10 Jewish        19        19        25        25        30        95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> #   `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>

Cov lus no muaj cov neeg teb cov ntaub ntawv kev ntseeg nyob rau hauv kab, thiab cov nyiaj tau los tau tawg mus thoob cov npe. Tus naj npawb ntawm cov neeg teb los ntawm txhua pawg yog khaws cia rau hauv cov txiaj ntsig ntawm tes ntawm kev sib tshuam ntawm kev ntseeg thiab cov nyiaj tau los. Txhawm rau nqa lub rooj rau hauv qhov zoo nkauj, raug hom, nws txaus siv pivot_longer():

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # … with 170 more rows

Function Arguments pivot_longer()

  • Thawj qhov kev sib cav dab tshos, piav qhia cov kab twg yuav tsum tau muab sib xyaw. Hauv qhov no, tag nrho cov kab tshwj xeeb lub sij hawm.
  • sib cav npe_to muab lub npe ntawm qhov sib txawv uas yuav raug tsim los ntawm cov npe ntawm cov kab peb sib txuas.
  • values_rau muab lub npe ntawm qhov sib txawv uas yuav raug tsim los ntawm cov ntaub ntawv khaws cia hauv qhov tseem ceeb ntawm cov hlwb ntawm cov kab sib txuas.

Спецификации

Qhov no yog ib tug tshiab functionality ntawm lub pob tidyr, uas yav tas los tsis muaj thaum ua haujlwm nrog cov haujlwm qub txeeg qub teg.

Ib qho kev qhia tshwj xeeb yog cov ntaub ntawv thav duab, txhua kab uas sib raug rau ib kem nyob rau hauv cov ntawv tso tawm hnub tshiab, thiab ob kab tshwj xeeb uas pib nrog:

  • .name muaj tus thawj kab npe.
  • .tus nqi muaj lub npe ntawm kab ntawv uas yuav muaj cov nqi ntawm tes.

Cov kab ntawv ntxiv ntawm qhov kev qhia tshwj xeeb qhia txog yuav ua li cas cov kab tshiab yuav tso tawm lub npe ntawm cov kab uas tau sau los ntawm .name.

Cov lus qhia tshwj xeeb piav qhia cov metadata khaws cia hauv kab npe, nrog ib kab rau txhua kab thiab ib kab rau txhua qhov sib txawv, ua ke nrog cov npe kab lus, cov ntsiab lus no yuav zoo li tsis meej pem tam sim no, tab sis tom qab saib ob peb yam piv txwv nws yuav dhau los ua ntau. meej dua.

Lub ntsiab lus ntawm qhov tshwj xeeb yog tias koj tuaj yeem khaws, hloov kho, thiab txhais cov metadata tshiab rau cov ntaub ntawv hloov dua siab tshiab.

Txhawm rau ua haujlwm nrog cov lus qhia tshwj xeeb thaum hloov lub rooj los ntawm ntau hom ntawv mus rau ib hom ntawv ntev, siv cov haujlwm pivot_longer_spec().

Qhov kev ua haujlwm no ua haujlwm li cas yog tias nws yuav siv sijhawm ib hnub thiab tsim nws cov metadata raws li tau piav qhia saum toj no.

Ua piv txwv, cia peb coj tus dataset uas tau muab nrog lub pob tidyr. Cov ntaub ntawv no muaj cov ntaub ntawv muab los ntawm lub koom haum saib xyuas kev noj qab haus huv thoob ntiaj teb txog qhov tshwm sim ntawm tuberculosis.

who
#> # A tibble: 7,240 x 60
#>    country iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534
#>    <chr>   <chr> <chr> <int>       <int>        <int>        <int>
#>  1 Afghan… AF    AFG    1980          NA           NA           NA
#>  2 Afghan… AF    AFG    1981          NA           NA           NA
#>  3 Afghan… AF    AFG    1982          NA           NA           NA
#>  4 Afghan… AF    AFG    1983          NA           NA           NA
#>  5 Afghan… AF    AFG    1984          NA           NA           NA
#>  6 Afghan… AF    AFG    1985          NA           NA           NA
#>  7 Afghan… AF    AFG    1986          NA           NA           NA
#>  8 Afghan… AF    AFG    1987          NA           NA           NA
#>  9 Afghan… AF    AFG    1988          NA           NA           NA
#> 10 Afghan… AF    AFG    1989          NA           NA           NA
#> # … with 7,230 more rows, and 53 more variables

Cia peb tsim nws cov specification.

spec <- who %>%
  pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")

#> # A tibble: 56 x 3
#>    .name        .value name        
#>    <chr>        <chr>  <chr>       
#>  1 new_sp_m014  count  new_sp_m014 
#>  2 new_sp_m1524 count  new_sp_m1524
#>  3 new_sp_m2534 count  new_sp_m2534
#>  4 new_sp_m3544 count  new_sp_m3544
#>  5 new_sp_m4554 count  new_sp_m4554
#>  6 new_sp_m5564 count  new_sp_m5564
#>  7 new_sp_m65   count  new_sp_m65  
#>  8 new_sp_f014  count  new_sp_f014 
#>  9 new_sp_f1524 count  new_sp_f1524
#> 10 new_sp_f2534 count  new_sp_f2534
#> # … with 46 more rows

teb lub teb chaws, isoxnumx, isoxnumx twb variable. Peb txoj haujlwm yog tig cov kab nrog new_sp_m014 rau newrel_f65.

Cov npe ntawm cov kab no khaws cov ntaub ntawv hauv qab no:

  • Lub Npe new_ qhia tias kab ntawv muaj cov ntaub ntawv hais txog cov mob tshiab ntawm tuberculosis, hnub tim tam sim no tsuas muaj cov ntaub ntawv hais txog cov kab mob tshiab xwb, yog li cov lus qhia ua ntej hauv cov ntsiab lus tam sim no tsis muaj lub ntsiab lus.
  • sp/rel/sp/ep piav qhia txog ib txoj hauv kev kuaj mob.
  • m/f poj niam txiv neej tus neeg mob.
  • 014/1524/2535/3544/4554/65 tus neeg mob hnub nyoog ntau yam.

Peb tuaj yeem faib cov kab no los ntawm kev ua haujlwm extract()siv cov lus qhia tsis tu ncua.

spec <- spec %>%
        extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")

#> # A tibble: 56 x 5
#>    .name        .value diagnosis gender age  
#>    <chr>        <chr>  <chr>     <chr>  <chr>
#>  1 new_sp_m014  count  sp        m      014  
#>  2 new_sp_m1524 count  sp        m      1524 
#>  3 new_sp_m2534 count  sp        m      2534 
#>  4 new_sp_m3544 count  sp        m      3544 
#>  5 new_sp_m4554 count  sp        m      4554 
#>  6 new_sp_m5564 count  sp        m      5564 
#>  7 new_sp_m65   count  sp        m      65   
#>  8 new_sp_f014  count  sp        f      014  
#>  9 new_sp_f1524 count  sp        f      1524 
#> 10 new_sp_f2534 count  sp        f      2534 
#> # … with 46 more rows

Thov nco ntsoov lub kem .name yuav tsum nyob twj ywm tsis hloov vim qhov no yog peb qhov Performance index rau hauv kab npe ntawm cov ntaub ntawv qub.

Tub los ntxhais thiab hnub nyoog (kem poj niam txiv neej и muaj hnub nyoog) muaj qhov ruaj khov thiab paub qhov tseem ceeb, yog li nws raug pom zoo kom hloov cov kab no mus rau yam:

spec <-  spec %>%
            mutate(
              gender = factor(gender, levels = c("f", "m")),
              age = factor(age, levels = unique(age), ordered = TRUE)
            ) 

Thaum kawg, txhawm rau siv cov lus qhia tshwj xeeb peb tsim rau thawj hnub thav duab uas peb yuav tsum siv qhov kev sib cav spec hauv kev ua haujlwm pivot_longer().

who %>% pivot_longer(spec = spec)

#> # A tibble: 405,440 x 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <fct>  <ord> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # … with 405,430 more rows

Txhua yam peb nyuam qhuav ua tuaj yeem ua tau piav qhia raws li hauv qab no:

R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider

Specification siv ntau tus nqi (.value)

Hauv qhov piv txwv saum toj no, kab lus specification .tus nqi muaj tsuas yog ib qho nqi, feem ntau qhov no yog rooj plaub.

Tab sis qee zaus qhov xwm txheej yuav tshwm sim thaum koj xav tau sau cov ntaub ntawv los ntawm txhua kab nrog cov ntaub ntawv sib txawv hauv cov txiaj ntsig. Siv ib txoj haujlwm qub txeeg qub teg spread() qhov no yuav nyuaj heev ua.

Cov piv txwv hauv qab no yog muab los ntawm vignettes mus rau lub pob cov ntaub ntawv.

Cia peb tsim ib qho kev qhia dataframe.

family <- tibble::tribble(
  ~family,  ~dob_child1,  ~dob_child2, ~gender_child1, ~gender_child2,
       1L, "1998-11-26", "2000-01-29",             1L,             2L,
       2L, "1996-06-22",           NA,             2L,             NA,
       3L, "2002-07-11", "2004-04-05",             2L,             2L,
       4L, "2004-10-10", "2009-08-27",             1L,             1L,
       5L, "2000-12-05", "2005-02-28",             2L,             1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)

#> # A tibble: 5 x 5
#>   family dob_child1 dob_child2 gender_child1 gender_child2
#>    <int> <date>     <date>             <int>         <int>
#> 1      1 1998-11-26 2000-01-29             1             2
#> 2      2 1996-06-22 NA                     2            NA
#> 3      3 2002-07-11 2004-04-05             2             2
#> 4      4 2004-10-10 2009-08-27             1             1
#> 5      5 2000-12-05 2005-02-28             2             1

Daim ntawv teev hnub tsim muaj cov ntaub ntawv ntawm cov menyuam yaus ntawm ib tsev neeg hauv txhua kab. Cov tsev neeg yuav muaj ib lossis ob tug menyuam. Rau txhua tus me nyuam, cov ntaub ntawv yog muab rau hnub yug thiab poj niam txiv neej, thiab cov ntaub ntawv rau txhua tus menyuam yog nyob rau hauv ib kem cais; peb txoj hauj lwm yog coj cov ntaub ntawv no mus rau cov qauv kom raug rau kev tsom xam.

Thov nco ntsoov tias peb muaj ob qhov sib txawv nrog cov ntaub ntawv hais txog txhua tus menyuam: lawv poj niam txiv neej thiab hnub yug (kab nrog ua ntej tom qab muaj hnub yug, kab nrog prefix poj niam txiv neej muaj kev sib deev ntawm tus menyuam). Qhov kev xav tau yog tias lawv yuav tsum tshwm sim hauv kab sib cais. Peb tuaj yeem ua qhov no los ntawm kev tsim ib qho kev qhia tshwj xeeb hauv kab ntawv .value yuav muaj ob lub ntsiab lus sib txawv.

spec <- family %>%
  pivot_longer_spec(-family) %>%
  separate(col = name, into = c(".value", "child"))%>%
  mutate(child = parse_number(child))

#> # A tibble: 4 x 3
#>   .name         .value child
#>   <chr>         <chr>  <dbl>
#> 1 dob_child1    dob        1
#> 2 dob_child2    dob        2
#> 3 gender_child1 gender     1
#> 4 gender_child2 gender     2

Yog li, cia peb ua ib kauj ruam los ntawm kev saib cov haujlwm ua los ntawm cov cai saum toj no.

  • pivot_longer_spec(-family) - tsim ib qho tshwj xeeb uas compresses tag nrho cov kab uas twb muaj lawm tsuas yog tsev neeg kem.
  • separate(col = name, into = c(".value", "child")) - cais kab .name, uas muaj cov npe ntawm qhov chaw teb, siv cov ntawv hauv qab thiab nkag mus rau cov txiaj ntsig tshwm sim rau hauv kab .tus nqi и tus me nyuam.
  • mutate(child = parse_number(child)) - hloov cov nqi teb tus me nyuam los ntawm cov ntawv mus rau cov ntaub ntawv tus lej.

Tam sim no peb tuaj yeem siv cov txiaj ntsig tau los ntawm cov ntaub ntawv qub thiab coj lub rooj mus rau daim ntawv xav tau.

family %>% 
    pivot_longer(spec = spec, na.rm = T)

#> # A tibble: 9 x 4
#>   family child dob        gender
#>    <int> <dbl> <date>      <int>
#> 1      1     1 1998-11-26      1
#> 2      1     2 2000-01-29      2
#> 3      2     1 1996-06-22      2
#> 4      3     1 2002-07-11      2
#> 5      3     2 2004-04-05      2
#> 6      4     1 2004-10-10      1
#> 7      4     2 2009-08-27      1
#> 8      5     1 2000-12-05      2
#> 9      5     2 2005-02-28      1

Peb siv kev sib cav na.rm = TRUE, vim tias daim ntawv tam sim no ntawm cov ntaub ntawv yuam kev tsim cov kab ntxiv rau kev soj ntsuam uas tsis muaj nyob. Vim tsev neeg 2 muaj ib tug me nyuam xwb, na.rm = TRUE lav tias tsev neeg 2 yuav muaj ib kab hauv cov zis.

Hloov cov hnub thav duab los ntawm ntev mus rau dav hom

pivot_wider() - yog qhov kev hloov pauv hloov pauv, thiab rov qab ua kom cov kab ntawm kab hnub los ntawm kev txo cov kab.

R pob tidyr thiab nws cov haujlwm tshiab pivot_longer thiab pivot_wider

Qhov kev hloov pauv no tsis tshua muaj siv los coj cov ntaub ntawv mus rau hauv daim ntawv raug, txawm li cas los xij, cov txheej txheem no tuaj yeem muaj txiaj ntsig zoo rau kev tsim cov lus pivot siv hauv kev nthuav qhia, lossis rau kev koom ua ke nrog lwm cov cuab yeej.

Tiag tiag cov haujlwm pivot_longer() и pivot_wider() yog symmetrical, thiab tsim ua inverse rau ib leeg, i.e. df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) yuav rov qab tus qub df.

Qhov yooj yim piv txwv ntawm kev hloov lub rooj rau ib hom ntawv dav

Yuav ua kom pom kev ua haujlwm li cas pivot_wider() peb yuav siv lub dataset ntses_encounters, uas khaws cov ntaub ntawv hais txog yuav ua li cas cov chaw nres tsheb sib txawv sau cov kev txav ntawm cov ntses raws tus dej.

#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # … with 104 more rows

Feem ntau, cov lus no yuav qhia ntau dua thiab siv tau yooj yim dua yog tias koj nthuav qhia cov ntaub ntawv rau txhua qhov chaw nres tsheb hauv ib kab sib cais.

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> # … with 9 more rows, and 1 more variable: MAW <int>

Cov ntaub ntawv no tsuas yog sau cov ntaub ntawv thaum ntses tau kuaj pom los ntawm qhov chaw nres tsheb, i.e. yog tias ib qho ntses tsis raug kaw los ntawm qee qhov chaw nres tsheb, cov ntaub ntawv no yuav tsis nyob hauv lub rooj. Qhov no txhais tau tias cov zis yuav puv nrog NA.

Txawm li cas los xij, qhov no peb paub tias qhov tsis muaj cov ntaub ntawv txhais tau tias tsis pom cov ntses, yog li peb tuaj yeem siv qhov kev sib cav. values_fill hauv kev ua haujlwm pivot_wider() thiab sau cov txiaj ntsig uas ploj lawm nrog xoom:

fish_encounters %>% pivot_wider(
  names_from = station, 
  values_from = seen,
  values_fill = list(seen = 0)
)

#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1     0     0     0     0     0
#>  5 4847        1     1      1     0       0     0     0     0     0     0
#>  6 4848        1     1      1     1       0     0     0     0     0     0
#>  7 4849        1     1      0     0       0     0     0     0     0     0
#>  8 4850        1     1      0     1       1     1     1     0     0     0
#>  9 4851        1     1      0     0       0     0     0     0     0     0
#> 10 4854        1     1      0     0       0     0     0     0     0     0
#> # … with 9 more rows, and 1 more variable: MAW <int>

Tsim ib lub npe kab los ntawm ntau qhov sib txawv

Xav txog tias peb muaj ib lub rooj uas muaj kev sib txuas ntawm cov khoom, lub teb chaws thiab xyoo. Txhawm rau tsim ib daim ntawv xeem hnub, koj tuaj yeem khiav cov cai hauv qab no:

df <- expand_grid(
  product = c("A", "B"), 
  country = c("AI", "EI"), 
  year = 2000:2014
) %>%
  filter((product == "A" & country == "AI") | product == "B") %>% 
  mutate(value = rnorm(nrow(.)))

#> # A tibble: 45 x 4
#>    product country  year    value
#>    <chr>   <chr>   <int>    <dbl>
#>  1 A       AI       2000 -2.05   
#>  2 A       AI       2001 -0.676  
#>  3 A       AI       2002  1.60   
#>  4 A       AI       2003 -0.353  
#>  5 A       AI       2004 -0.00530
#>  6 A       AI       2005  0.442  
#>  7 A       AI       2006 -0.610  
#>  8 A       AI       2007 -2.77   
#>  9 A       AI       2008  0.899  
#> 10 A       AI       2009 -0.106  
#> # … with 35 more rows

Peb txoj haujlwm yog txhawm rau nthuav cov ntaub ntawv thav duab kom ib kem muaj cov ntaub ntawv rau txhua qhov sib xyaw ntawm cov khoom thiab lub teb chaws. Txhawm rau ua qhov no, tsuas yog hla hauv kev sib cav npe_of ib vector uas muaj cov npe ntawm cov teb yuav tsum tau merged.

df %>% pivot_wider(names_from = c(product, country),
                 values_from = "value")

#> # A tibble: 15 x 4
#>     year     A_AI    B_AI    B_EI
#>    <int>    <dbl>   <dbl>   <dbl>
#>  1  2000 -2.05     0.607   1.20  
#>  2  2001 -0.676    1.65   -0.114 
#>  3  2002  1.60    -0.0245  0.501 
#>  4  2003 -0.353    1.30   -0.459 
#>  5  2004 -0.00530  0.921  -0.0589
#>  6  2005  0.442   -1.55    0.594 
#>  7  2006 -0.610    0.380  -1.28  
#>  8  2007 -2.77     0.830   0.637 
#>  9  2008  0.899    0.0175 -1.30  
#> 10  2009 -0.106   -0.195   1.03  
#> # … with 5 more rows

Koj tseem tuaj yeem siv cov lus qhia tshwj xeeb rau kev ua haujlwm pivot_wider(). Tab sis thaum xa mus rau pivot_wider() cov specification ua qhov opposite conversion pivot_longer(): Cov kab ntawv teev nyob rau hauv .name, siv tus nqi los ntawm .tus nqi thiab lwm kab.

Rau cov ntaub ntawv no, koj tuaj yeem tsim cov kev cai tshwj xeeb yog tias koj xav tau txhua lub teb chaws thiab cov khoom sib xyaw kom muaj nws tus kheej kem, tsis yog cov uas muaj nyob hauv cov ntaub ntawv:

spec <- df %>% 
  expand(product, country, .value = "value") %>% 
  unite(".name", product, country, remove = FALSE)

#> # A tibble: 4 x 4
#>   .name product country .value
#>   <chr> <chr>   <chr>   <chr> 
#> 1 A_AI  A       AI      value 
#> 2 A_EI  A       EI      value 
#> 3 B_AI  B       AI      value 
#> 4 B_EI  B       EI      value

df %>% pivot_wider(spec = spec) %>% head()

#> # A tibble: 6 x 5
#>    year     A_AI  A_EI    B_AI    B_EI
#>   <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2000 -2.05       NA  0.607   1.20  
#> 2  2001 -0.676      NA  1.65   -0.114 
#> 3  2002  1.60       NA -0.0245  0.501 
#> 4  2003 -0.353      NA  1.30   -0.459 
#> 5  2004 -0.00530    NA  0.921  -0.0589
#> 6  2005  0.442      NA -1.55    0.594

Ntau qhov piv txwv ntawm kev ua haujlwm nrog lub tswv yim tshiab tidyr

Ntxuav cov ntaub ntawv siv US Census Income thiab Rent dataset ua piv txwv.

Cov ntaub ntawv teev peb_rent_income muaj cov nyiaj tau los nruab nrab thiab cov ntaub ntawv xauj tsev rau txhua lub xeev hauv Asmeskas rau 2017 (cov ntaub ntawv teev muaj nyob rau hauv pob tidycensus).

us_rent_income
#> # A tibble: 104 x 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # … with 94 more rows

Nyob rau hauv daim ntawv uas cov ntaub ntawv khaws cia nyob rau hauv lub dataset peb_rent_income kev ua haujlwm nrog lawv yog qhov tsis yooj yim heev, yog li peb xav tsim cov ntaub ntawv teeb tsa nrog kab: nqi xauj tsev, rent_moe, tuaj, khwv nyiaj_moe. Muaj ntau txoj hauv kev los tsim qhov kev qhia tshwj xeeb no, tab sis lub ntsiab lus tseem ceeb yog tias peb yuav tsum tsim txhua qhov sib xyaw ua ke ntawm cov txiaj ntsig sib txawv thiab kwv/moethiab tom qab ntawd tsim lub npe kab.

  spec <- us_rent_income %>% 
    expand(variable, .value = c("estimate", "moe")) %>% 
    mutate(
      .name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
    )

#> # A tibble: 4 x 3
#>   variable .value   .name     
#>   <chr>    <chr>    <chr>     
#> 1 income   estimate income    
#> 2 income   moe      income_moe
#> 3 rent     estimate rent      
#> 4 rent     moe      rent_moe

Muab qhov specification no pivot_wider() muab peb cov txiaj ntsig peb tab tom nrhiav:

us_rent_income %>% pivot_wider(spec = spec)

#> # A tibble: 52 x 6
#>    GEOID NAME                 income income_moe  rent rent_moe
#>    <chr> <chr>                 <dbl>      <dbl> <dbl>    <dbl>
#>  1 01    Alabama               24476        136   747        3
#>  2 02    Alaska                32940        508  1200       13
#>  3 04    Arizona               27517        148   972        4
#>  4 05    Arkansas              23789        165   709        5
#>  5 06    California            29454        109  1358        3
#>  6 08    Colorado              32401        109  1125        5
#>  7 09    Connecticut           35326        195  1123        5
#>  8 10    Delaware              31560        247  1076       10
#>  9 11    District of Columbia  43198        681  1424       17
#> 10 12    Florida               25952         70  1077        3
#> # … with 42 more rows

Lub World Bank

Qee lub sij hawm nqa cov ntaub ntawv teev rau hauv daim ntawv xav tau yuav tsum muaj ob peb kauj ruam.
Cov ntaub ntawv world_bank_pop muaj World Bank cov ntaub ntawv ntawm cov pej xeem ntawm txhua lub teb chaws ntawm 2000 thiab 2018.

#> # A tibble: 1,056 x 20
#>    country indicator `2000` `2001` `2002` `2003`  `2004`  `2005`   `2006`
#>    <chr>   <chr>      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#>  1 ABW     SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4  4.49e+4
#>  2 ABW     SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#>  3 ABW     SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5  1.01e+5
#>  4 ABW     SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0  7.98e-1
#>  5 AFG     SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6  5.93e+6
#>  6 AFG     SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0  4.12e+0
#>  7 AFG     SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7  2.59e+7
#>  8 AFG     SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0  3.23e+0
#>  9 AGO     SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7  1.15e+7
#> 10 AGO     SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0  4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

Peb lub hom phiaj yog los tsim cov ntaub ntawv zoo zoo nrog txhua qhov sib txawv hauv nws tus kheej kem. Nws tsis paub meej tias cov kauj ruam twg yuav tsum tau ua, tab sis peb yuav pib nrog qhov teeb meem pom tseeb tshaj plaws: lub xyoo tau kis thoob plaws ntau kab.

Txhawm rau txhim kho qhov no koj yuav tsum siv lub luag haujlwm pivot_longer().

pop2 <- world_bank_pop %>% 
  pivot_longer(`2000`:`2017`, names_to = "year")

#> # A tibble: 19,008 x 4
#>    country indicator   year  value
#>    <chr>   <chr>       <chr> <dbl>
#>  1 ABW     SP.URB.TOTL 2000  42444
#>  2 ABW     SP.URB.TOTL 2001  43048
#>  3 ABW     SP.URB.TOTL 2002  43670
#>  4 ABW     SP.URB.TOTL 2003  44246
#>  5 ABW     SP.URB.TOTL 2004  44669
#>  6 ABW     SP.URB.TOTL 2005  44889
#>  7 ABW     SP.URB.TOTL 2006  44881
#>  8 ABW     SP.URB.TOTL 2007  44686
#>  9 ABW     SP.URB.TOTL 2008  44375
#> 10 ABW     SP.URB.TOTL 2009  44052
#> # … with 18,998 more rows

Cov kauj ruam tom ntej yog saib qhov ntsuas qhov sib txawv.
pop2 %>% count(indicator)

#> # A tibble: 4 x 2
#>   indicator       n
#>   <chr>       <int>
#> 1 SP.POP.GROW  4752
#> 2 SP.POP.TOTL  4752
#> 3 SP.URB.GROW  4752
#> 4 SP.URB.TOTL  4752

Qhov twg SP.POP.GROW yog pej xeem kev loj hlob, SP.POP.TOTL yog tag nrho cov pejxeem, thiab SP.URB. * tib yam, tab sis tsuas yog rau hauv nroog. Cia peb faib cov txiaj ntsig no ua ob qhov sib txawv: thaj tsam - cheeb tsam (tag nrho lossis nroog) thiab qhov sib txawv uas muaj cov ntaub ntawv tiag tiag (cov pej xeem lossis kev loj hlob):

pop3 <- pop2 %>% 
  separate(indicator, c(NA, "area", "variable"))

#> # A tibble: 19,008 x 5
#>    country area  variable year  value
#>    <chr>   <chr> <chr>    <chr> <dbl>
#>  1 ABW     URB   TOTL     2000  42444
#>  2 ABW     URB   TOTL     2001  43048
#>  3 ABW     URB   TOTL     2002  43670
#>  4 ABW     URB   TOTL     2003  44246
#>  5 ABW     URB   TOTL     2004  44669
#>  6 ABW     URB   TOTL     2005  44889
#>  7 ABW     URB   TOTL     2006  44881
#>  8 ABW     URB   TOTL     2007  44686
#>  9 ABW     URB   TOTL     2008  44375
#> 10 ABW     URB   TOTL     2009  44052
#> # … with 18,998 more rows

Tam sim no txhua yam peb tau ua yog faib qhov sib txawv rau hauv ob kab:

pop3 %>% 
  pivot_wider(names_from = variable, values_from = value)

#> # A tibble: 9,504 x 5
#>    country area  year   TOTL    GROW
#>    <chr>   <chr> <chr> <dbl>   <dbl>
#>  1 ABW     URB   2000  42444  1.18  
#>  2 ABW     URB   2001  43048  1.41  
#>  3 ABW     URB   2002  43670  1.43  
#>  4 ABW     URB   2003  44246  1.31  
#>  5 ABW     URB   2004  44669  0.951 
#>  6 ABW     URB   2005  44889  0.491 
#>  7 ABW     URB   2006  44881 -0.0178
#>  8 ABW     URB   2007  44686 -0.435 
#>  9 ABW     URB   2008  44375 -0.698 
#> 10 ABW     URB   2009  44052 -0.731 
#> # … with 9,494 more rows

Hu rau daim ntawv

Ib qho piv txwv kawg, xav txog tias koj muaj ib daim ntawv teev npe uas koj tau theej thiab muab tso rau ntawm lub vev xaib:

contacts <- tribble(
  ~field, ~value,
  "name", "Jiena McLellan",
  "company", "Toyota", 
  "name", "John Smith", 
  "company", "google", 
  "email", "[email protected]",
  "name", "Huxley Ratcliffe"
)

Tabulating daim ntawv teev npe no yog qhov nyuaj heev vim tias tsis muaj qhov sib txawv uas txheeb xyuas cov ntaub ntawv twg yog tus neeg sib cuag. Peb tuaj yeem kho qhov no los ntawm kev ceeb toom tias cov ntaub ntawv rau txhua qhov kev sib cuag tshiab pib nrog lub npe ("lub npe"), yog li peb tuaj yeem tsim tus cim tshwj xeeb thiab nce nws los ntawm ib qho txhua zaus tus nqi "lub npe" ntsib hauv kab ke:

contacts <- contacts %>% 
  mutate(
    person_id = cumsum(field == "name")
  )
contacts

#> # A tibble: 6 x 3
#>   field   value            person_id
#>   <chr>   <chr>                <int>
#> 1 name    Jiena McLellan           1
#> 2 company Toyota                   1
#> 3 name    John Smith               2
#> 4 company google                   2
#> 5 email   [email protected]          2
#> 6 name    Huxley Ratcliffe         3

Tam sim no peb muaj tus ID tshwj xeeb rau txhua qhov kev sib cuag, peb tuaj yeem tig daim teb thiab tus nqi rau hauv kab:

contacts %>% 
  pivot_wider(names_from = field, values_from = value)

#> # A tibble: 3 x 4
#>   person_id name             company email          
#>       <int> <chr>            <chr>   <chr>          
#> 1         1 Jiena McLellan   Toyota  <NA>           
#> 2         2 John Smith       google  [email protected]
#> 3         3 Huxley Ratcliffe <NA>    <NA>

xaus

Kuv tus kheej xav tias lub tswv yim tshiab tidyr tiag tiag ntau intuitive, thiab ho superior nyob rau hauv functionality rau legacy functions spread() и gather(). Kuv vam tias tsab xov xwm no tau pab koj daws nrog pivot_longer() и pivot_wider().

Tau qhov twg los: www.hab.com

Ntxiv ib saib