R package tidyr le mesebetsi ea eona e mecha ea pivot_longer le pivot_wider

Package tidyr e kenyellelitsoe khubung ea e 'ngoe ea lilaebrari tse tsebahalang haholo ka puo ea R - hloekile.
Sepheo se seholo sa sephutheloana ke ho tlisa data ka foromo e nepahetseng.

E se e fumaneha ho Habré phatlalatso e inehetse ho sephutheloana sena, empa e qalile ka 2015. 'Me ke batla ho u bolella ka liphetoho tsa morao-rao, tse phatlalalitsoeng matsatsing a seng makae a fetileng ke mongoli oa eona, Hedley Wickham.

R package tidyr le mesebetsi ea eona e mecha ea pivot_longer le pivot_wider

SJK: Na collection() and spread() e tla tlosoa?

Hadley Wickham: Ho isa bohōleng bo itseng. Re ke ke ra hlola re khothaletsa tšebeliso ea mesebetsi ena le ho lokisa liphoso ho tsona, empa li tla tsoelapele ho ba teng ka har'a sephutheloana boemong ba tsona ba hajoale.

Tse ka hare

Haeba u thahasella tlhahlobo ea data, u ka 'na ua thahasella ea ka thelekramo и mang youtube dikanale. Boholo ba litaba bo nehetsoe puong ea R.

Khopolo ea TidyData

Sepheo tidyr — e o thusa ho tlisa data ho seo ho thoeng ke makhethe. Lintlha tse nepahetseng ke data moo:

  • Mofuta o mong le o mong o ka kholomong.
  • Pono ka 'ngoe ke khoele.
  • Boleng bo bong le bo bong ke sele.

Ho bonolo ebile ho bonolo haholoanyane ho sebetsa ka data e hlahisoang ka data e makhethe ha u etsa tlhahlobo.

Mesebetsi ea mantlha e kenyellelitsoeng ka har'a sephutheloana sa tidyr

tidyr e na le sehlopha sa mesebetsi e etselitsoeng ho fetola litafole:

  • fill() - ho tlatsa litekanyetso tse sieo kholong e nang le boleng bo fetileng;
  • separate() - o arola tšimo e le 'ngoe hore e be tse ngata ka ho sebelisa searohano;
  • unite() - e etsa ts'ebetso ea ho kopanya masimo a 'maloa ho a le mong, ketso e fapaneng ea ts'ebetso separate();
  • pivot_longer() - ts'ebetso e fetolelang data ho tloha sebopehong se pharaletseng ho ea ho sebopeho se selelele;
  • pivot_wider() - ts'ebetso e fetolelang data ho tloha sebopehong se selelele ho ea ho sebopeho se pharaletseng. Tshebetso e furallang ya e etsoang ke mosebetsi pivot_longer().
  • gather()e siiloe ke nako - ts'ebetso e fetolelang data ho tloha sebopehong se pharaletseng ho ea ho sebopeho se selelele;
  • spread()e siiloe ke nako - ts'ebetso e fetolelang data ho tloha sebopehong se selelele ho ea ho sebopeho se pharaletseng. Tshebetso e furallang ya e etsoang ke mosebetsi gather().

Khopolo e ncha ea ho fetolela data ho tloha ho bophara ho ea ho sebopeho se selelele le ka tsela e fapaneng

Pele, mesebetsi e ne e sebelisoa bakeng sa phetoho ea mofuta ona gather() и spread(). Ho theosa le lilemo tsa ho ba teng ha mesebetsi ena, ho ile ha totobala hore ho basebelisi ba bangata, ho kenyelletsa le mongoli oa sephutheloana, mabitso a mesebetsi ena le likhang tsa bona li ne li sa totobala, 'me li bakile mathata ho li fumana le ho utloisisa hore na ke efe ea mesebetsi ena e sokolohang. moralo oa letsatsi ho tloha ho bophara ho ea ho sebopeho se selelele, le ka tsela e fapaneng.

Tabeng ena, ho tidyr Ho kentsoe mesebetsi e 'meli e mecha, ea bohlokoa e etselitsoeng ho fetola liforeimi tsa matsatsi.

Likarolo tse ncha pivot_longer() и pivot_wider() ba ile ba bululeloa ke tse ling tsa likarolo tsa sephutheloana cdata, e entsoeng ke John Mount le Nina Zumel.

Ho kenya mofuta oa hajoale oa tidyr 0.8.3.9000

Ho kenya mofuta o mocha, oa hajoale oa sephutheloana tidyr 0.8.3.9000, moo likarolo tse ncha li fumanehang, sebelisa khoutu e latelang.

devtools::install_github("tidyverse/tidyr")

Ka nako ea ho ngola, mesebetsi ena e fumaneha feela ka mofuta oa dev oa sephutheloana ho GitHub.

Phetolelo ho likarolo tse ncha

Ebile, ha ho thata ho fetisetsa mangolo a khale ho sebetsa le mesebetsi e mecha; bakeng sa kutloisiso e betere, ke tla nka mohlala ho tsoa litokomaneng tsa mesebetsi ea khale le ho bonts'a hore na ts'ebetso e ts'oanang e etsoa joang ke sebelisa tse ncha. pivot_*() mesebetsi.

Fetolela sebopeho se pharaletseng ho sebopeho se selelele.

Mohlala oa khoutu ho tsoa litokomaneng tsa ts'ebetso ea ho bokella

# example
library(dplyr)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

# old
stocks_gather <- stocks %>% gather(key   = stock, 
                                   value = price, 
                                   -time)

# new
stocks_long   <- stocks %>% pivot_longer(cols      = -time, 
                                       names_to  = "stock", 
                                       values_to = "price")

Ho fetolela sebopeho se selelele ho sebopeho se pharaletseng.

Mohlala oa khoutu ho tsoa litokomaneng tsa ts'ebetso ea phatlalatso

# old
stocks_spread <- stocks_gather %>% spread(key = stock, 
                                          value = price) 

# new 
stock_wide    <- stocks_long %>% pivot_wider(names_from  = "stock",
                                            values_from = "price")

Hobane mehlaleng e ka holimo ea ho sebetsa le pivot_longer() и pivot_wider(), tafoleng ea pele metšoasong ha ho litšiea tse thathamisitsoeng ka likhang mabitso_ho и values_to mabitso a bona a be ka matshwao a qotso.

Tafole e tla u thusa ho tseba habonolo mokhoa oa ho fetohela ho sebetsa ka mohopolo o mocha tidyr.

R package tidyr le mesebetsi ea eona e mecha ea pivot_longer le pivot_wider

Ela hloko ho tsoa ho mongoli

Litemana tsohle tse ka tlase lia ikamahanya le maemo, nka ba ka re phetolelo ea mahala li-vignettes ho tsoa webosaeteng ea semmuso ea tidyverse.

Mohlala o bonolo oa ho fetola data ho tloha ka bophara ho ea ho sebopeho se selelele

pivot_longer () - e etsa hore data e be telele ka ho fokotsa palo ea likholomo le ho eketsa palo ea mela.

R package tidyr le mesebetsi ea eona e mecha ea pivot_longer le pivot_wider

Ho tsamaisa mehlala e hlahisitsoeng sengolong, o hloka ho hokela liphutheloana tse hlokahalang pele:

library(tidyr)
library(dplyr)
library(readr)

A re re re na le tafole e nang le liphetho tsa phuputso eo (har'a lintho tse ling) e ileng ea botsa batho ka bolumeli ba bona le chelete ea selemo:

#> # A tibble: 18 x 11
#>    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k`
#>    <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#>  1 Agnostic      27        34        60        81        76       137
#>  2 Atheist       12        27        37        52        35        70
#>  3 Buddhist      27        21        30        34        33        58
#>  4 Catholic     418       617       732       670       638      1116
#>  5 Don’t k…      15        14        15        11        10        35
#>  6 Evangel…     575       869      1064       982       881      1486
#>  7 Hindu          1         9         7         9        11        34
#>  8 Histori…     228       244       236       238       197       223
#>  9 Jehovah…      20        27        24        24        21        30
#> 10 Jewish        19        19        25        25        30        95
#> # … with 8 more rows, and 4 more variables: `$75-100k` <dbl>,
#> #   `$100-150k` <dbl>, `>150k` <dbl>, `Don't know/refused` <dbl>

Lethathamo lena le na le lintlha tsa bolumeli ba ba arabelang ka mela, 'me maemo a chelete a hasana ho pholletsa le mabitso a likholomo. Palo ea ba arabelitsoeng ho tsoa sehlopheng ka seng e bolokiloe ka boleng ba lisele mateanong a bolumeli le boemo ba chelete. Ho tlisa tafole ka mokhoa o makhethe, o nepahetseng, ho lekane ho e sebelisa pivot_longer():

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")

pew %>% 
  pivot_longer(cols = -religion, names_to = "income", values_to = "count")
#> # A tibble: 180 x 3
#>    religion income             count
#>    <chr>    <chr>              <dbl>
#>  1 Agnostic <$10k                 27
#>  2 Agnostic $10-20k               34
#>  3 Agnostic $20-30k               60
#>  4 Agnostic $30-40k               81
#>  5 Agnostic $40-50k               76
#>  6 Agnostic $50-75k              137
#>  7 Agnostic $75-100k             122
#>  8 Agnostic $100-150k            109
#>  9 Agnostic >150k                 84
#> 10 Agnostic Don't know/refused    96
#> # … with 170 more rows

Likhang Tsa Mosebetsi pivot_longer()

  • Khang ea pele Likhola, e hlalosa hore na ke litšiea life tse lokelang ho kopanngoa. Tabeng ena, litšiea tsohle ntle le nako.
  • khang mabitso_ho e fana ka lebitso la phapano e tla bōptjoa ho tsoa mabitsong a litšiea tseo re li kopantseng.
  • values_to e fana ka lebitso la phapang e tla etsoa ho tsoa ho data e bolokiloeng boleng ba lisele tsa likholomo tse kopaneng.

Lintlha (edita)

Ena ke ts'ebetso e ncha ea sephutheloana tidyr, eo pele e neng e sa fumanehe ha e sebetsa ka mesebetsi ea lefa.

Tlhaloso ke foreimi ea data, mola o mong le o mong o tsamaellanang le kholomo e le 'ngoe ka har'a foreimi e ncha ea letsatsi la tlhahiso, le litšiea tse peli tse khethehileng tse qalang ka:

  • .name e na le lebitso la pele la kholomo.
  • .boleng e na le lebitso la kholomo e tla ba le boleng ba lisele.

Likholomo tse setseng tsa litlhaloso li bontša kamoo kholomo e ncha e tla bontša lebitso la litšiea tse hatelitsoeng ho tloha ho .name.

Tlhaloso e hlalosa metadata e bolokiloeng ka lebitso la kholomo, ka mola o le mong bakeng sa kholomo ka 'ngoe le kholumo e le' ngoe bakeng sa phetoho e 'ngoe le e' ngoe, e kopantsoe le lebitso la kholomo, tlhaloso ena e ka 'na ea bonahala e ferekanya hona joale, empa ka mor'a ho sheba mehlala e seng mekae e tla fetoha haholo. hlakileng haholoanyane.

Taba ea tlhaloso ke hore o ka khona ho fumana, ho fetola, le ho hlalosa metadata e ncha bakeng sa dataframe e ntseng e fetoloa.

Ho sebetsa ka litlhaloso ha u fetola tafole ho tloha sebopeho se pharaletseng ho ea ho sebopeho se selelele, sebelisa ts'ebetso pivot_longer_spec().

Tsela eo ts'ebetso ena e sebetsang ka eona ke hore e nka nako efe kapa efe mme e hlahisa metadata ea eona ka mokhoa o hlalositsoeng ka holimo.

E le mohlala, ha re nke hore na dataset e fanoeng le sephutheloana ke mang tidyr. Lethathamo lena la boitsebiso le na le lintlha tse fanoeng ke mokhatlo oa machaba oa bophelo bo botle mabapi le liketsahalo tsa lefuba.

who
#> # A tibble: 7,240 x 60
#>    country iso2  iso3   year new_sp_m014 new_sp_m1524 new_sp_m2534
#>    <chr>   <chr> <chr> <int>       <int>        <int>        <int>
#>  1 Afghan… AF    AFG    1980          NA           NA           NA
#>  2 Afghan… AF    AFG    1981          NA           NA           NA
#>  3 Afghan… AF    AFG    1982          NA           NA           NA
#>  4 Afghan… AF    AFG    1983          NA           NA           NA
#>  5 Afghan… AF    AFG    1984          NA           NA           NA
#>  6 Afghan… AF    AFG    1985          NA           NA           NA
#>  7 Afghan… AF    AFG    1986          NA           NA           NA
#>  8 Afghan… AF    AFG    1987          NA           NA           NA
#>  9 Afghan… AF    AFG    1988          NA           NA           NA
#> 10 Afghan… AF    AFG    1989          NA           NA           NA
#> # … with 7,230 more rows, and 53 more variables

Ha re aheng litlhaloso tsa eona.

spec <- who %>%
  pivot_longer_spec(new_sp_m014:newrel_f65, values_to = "count")

#> # A tibble: 56 x 3
#>    .name        .value name        
#>    <chr>        <chr>  <chr>       
#>  1 new_sp_m014  count  new_sp_m014 
#>  2 new_sp_m1524 count  new_sp_m1524
#>  3 new_sp_m2534 count  new_sp_m2534
#>  4 new_sp_m3544 count  new_sp_m3544
#>  5 new_sp_m4554 count  new_sp_m4554
#>  6 new_sp_m5564 count  new_sp_m5564
#>  7 new_sp_m65   count  new_sp_m65  
#>  8 new_sp_f014  count  new_sp_f014 
#>  9 new_sp_f1524 count  new_sp_f1524
#> 10 new_sp_f2534 count  new_sp_f2534
#> # … with 46 more rows

masimo naheng, isoxnumx, isoxnumx li se li ntse li fetoha. Mosebetsi oa rona ke ho phetla litšiea ka new_sp_m014 ka newrel_f65.

Mabitso a likholomo tsena a boloka lintlha tse latelang:

  • Sehlongoapele new_ e bontša hore kholomo e na le lintlha tse mabapi le linyeoe tse ncha tsa lefuba, letsatsi la hona joale le na le tlhahisoleseding feela ka mafu a macha, kahoo sehlomathiso sena sa moelelo oa hona joale ha se na moelelo leha e le ofe.
  • sp/rel/sp/ep e hlalosa mokhoa oa ho hlahloba lefu.
  • m/f bong ba mokuli.
  • 014/1524/2535/3544/4554/65 lilemo tsa mokuli.

Re ka arola litšiea tsena ka ho sebelisa mosebetsi extract()ka ho sebedisa polelo e tlwaelehileng.

spec <- spec %>%
        extract(name, c("diagnosis", "gender", "age"), "new_?(.*)_(.)(.*)")

#> # A tibble: 56 x 5
#>    .name        .value diagnosis gender age  
#>    <chr>        <chr>  <chr>     <chr>  <chr>
#>  1 new_sp_m014  count  sp        m      014  
#>  2 new_sp_m1524 count  sp        m      1524 
#>  3 new_sp_m2534 count  sp        m      2534 
#>  4 new_sp_m3544 count  sp        m      3544 
#>  5 new_sp_m4554 count  sp        m      4554 
#>  6 new_sp_m5564 count  sp        m      5564 
#>  7 new_sp_m65   count  sp        m      65   
#>  8 new_sp_f014  count  sp        f      014  
#>  9 new_sp_f1524 count  sp        f      1524 
#> 10 new_sp_f2534 count  sp        f      2534 
#> # … with 46 more rows

Ka kopo hlokomela kholomo .name e lokela ho lula e sa fetohe kaha lena ke index ea rona ea mabitso a kholumo ea dataset ea mantlha.

Bong le lilemo (likholomo tekano и dilemo) li na le boleng bo tsitsitseng le bo tsejoang, kahoo ho khothaletsoa ho fetolela likholomo tsena ho lintlha:

spec <-  spec %>%
            mutate(
              gender = factor(gender, levels = c("f", "m")),
              age = factor(age, levels = unique(age), ordered = TRUE)
            ) 

Qetellong, molemong oa ho sebelisa litlhaloso tseo re li entseng ho foreimi ea letsatsi la mantlha ea ileng a re hloka ho sebelisa khang mohlomong mosebetsing pivot_longer().

who %>% pivot_longer(spec = spec)

#> # A tibble: 405,440 x 8
#>    country     iso2  iso3   year diagnosis gender age   count
#>    <chr>       <chr> <chr> <int> <chr>     <fct>  <ord> <int>
#>  1 Afghanistan AF    AFG    1980 sp        m      014      NA
#>  2 Afghanistan AF    AFG    1980 sp        m      1524     NA
#>  3 Afghanistan AF    AFG    1980 sp        m      2534     NA
#>  4 Afghanistan AF    AFG    1980 sp        m      3544     NA
#>  5 Afghanistan AF    AFG    1980 sp        m      4554     NA
#>  6 Afghanistan AF    AFG    1980 sp        m      5564     NA
#>  7 Afghanistan AF    AFG    1980 sp        m      65       NA
#>  8 Afghanistan AF    AFG    1980 sp        f      014      NA
#>  9 Afghanistan AF    AFG    1980 sp        f      1524     NA
#> 10 Afghanistan AF    AFG    1980 sp        f      2534     NA
#> # … with 405,430 more rows

Ntho e 'ngoe le e' ngoe eo re sa tsoa e etsa e ka hlalosoa ka mokhoa o latelang:

R package tidyr le mesebetsi ea eona e mecha ea pivot_longer le pivot_wider

Tlhaloso e sebelisang boleng bo bongata (.value)

Mohlala o ka holimo, kholomo ea litlhaloso .boleng e na le boleng bo le bong feela, hangata ho joalo.

Empa ka linako tse ling boemo bo ka hlaha ha o hloka ho bokella data ho tsoa likholomong tse nang le mefuta e fapaneng ea data ka boleng. Ho sebelisa mosebetsi oa lefa spread() sena se ka ba thata haholo ho se etsa.

Mohlala o ka tlase o nkiloe ho li-vignettes ho sephutheloana data.tafole.

Ha re theheng dataframe ea koetliso.

family <- tibble::tribble(
  ~family,  ~dob_child1,  ~dob_child2, ~gender_child1, ~gender_child2,
       1L, "1998-11-26", "2000-01-29",             1L,             2L,
       2L, "1996-06-22",           NA,             2L,             NA,
       3L, "2002-07-11", "2004-04-05",             2L,             2L,
       4L, "2004-10-10", "2009-08-27",             1L,             1L,
       5L, "2000-12-05", "2005-02-28",             2L,             1L,
)
family <- family %>% mutate_at(vars(starts_with("dob")), parse_date)

#> # A tibble: 5 x 5
#>   family dob_child1 dob_child2 gender_child1 gender_child2
#>    <int> <date>     <date>             <int>         <int>
#> 1      1 1998-11-26 2000-01-29             1             2
#> 2      2 1996-06-22 NA                     2            NA
#> 3      3 2002-07-11 2004-04-05             2             2
#> 4      4 2004-10-10 2009-08-27             1             1
#> 5      5 2000-12-05 2005-02-28             2             1

Foreimi ea letsatsi e entsoeng e na le lintlha tsa bana ba lelapa le le leng moleng o mong le o mong. Malapa a ka ba le ngoana a le mong kapa ba babeli. Bakeng sa ngoana e mong le e mong, lintlha li fanoa ka letsatsi la tsoalo le bong, 'me lintlha tsa ngoana e mong le e mong li ka har'a mela e arohaneng; mosebetsi oa rona ke ho tlisa lintlha tsena ka mokhoa o nepahetseng bakeng sa tlhahlobo.

Ka kopo hlokomela hore re na le liphapang tse peli tse nang le tlhaiso-leseling ka ngoana ka mong: bong ba hae le letsatsi la tsoalo (likholomo tse nang le sehlongwapele. dop li na le letsatsi la tsoalo, likholomo tse nang le sehlongwapele tekano e na le bong ba ngoana). Sephetho se lebelletsoeng ke hore li lokela ho hlaha ka mela e arohaneng. Re ka etsa sena ka ho hlahisa tlhaloso eo ho eona kholomo .value e tla ba le meelelo e 'meli e fapaneng.

spec <- family %>%
  pivot_longer_spec(-family) %>%
  separate(col = name, into = c(".value", "child"))%>%
  mutate(child = parse_number(child))

#> # A tibble: 4 x 3
#>   .name         .value child
#>   <chr>         <chr>  <dbl>
#> 1 dob_child1    dob        1
#> 2 dob_child2    dob        2
#> 3 gender_child1 gender     1
#> 4 gender_child2 gender     2

Kahoo, a re shebeng mohato ka mohato liketso tse entsoeng ke khoutu e ka holimo.

  • pivot_longer_spec(-family) - theha tlhaloso e hatellang litšiea tsohle tse teng ntle le kholomo ea lelapa.
  • separate(col = name, into = c(".value", "child")) - arola kholomo .name, e nang le mabitso a libaka tsa mohloli, ho sebelisa underscore le ho kenya litekanyetso tse hlahisoang likholomong .boleng и Ngoana.
  • mutate(child = parse_number(child)) - fetola maemo a sebaka Ngoana ho tloha ho mongolo ho ea ho mofuta oa data oa linomoro.

Hona joale re ka sebelisa litlhaloso tse hlahisitsoeng ho dataframe ea pele mme re tlisa tafole ho foromo e lakatsehang.

family %>% 
    pivot_longer(spec = spec, na.rm = T)

#> # A tibble: 9 x 4
#>   family child dob        gender
#>    <int> <dbl> <date>      <int>
#> 1      1     1 1998-11-26      1
#> 2      1     2 2000-01-29      2
#> 3      2     1 1996-06-22      2
#> 4      3     1 2002-07-11      2
#> 5      3     2 2004-04-05      2
#> 6      4     1 2004-10-10      1
#> 7      4     2 2009-08-27      1
#> 8      5     1 2000-12-05      2
#> 9      5     2 2005-02-28      1

Re sebelisa khang na.rm = TRUE, hobane mofuta oa hajoale oa data o qobella ho theha mela e meng bakeng sa litebello tse seng teng. Hobane lelapa la 2 le na le ngoana a le mong, na.rm = TRUE e tiisa hore lelapa la 2 le tla ba le mola o le mong sephethong.

E fetolela liforeimi tsa matsatsi ho tloha ho tse telele ho ea ho tse pharaletseng

pivot_wider() - ke phetoho e fapaneng, 'me ka tsela e fapaneng e eketsa palo ea likholomo tsa foreimi ea letsatsi ka ho fokotsa palo ea mela.

R package tidyr le mesebetsi ea eona e mecha ea pivot_longer le pivot_wider

Phetoho ea mofuta ona ha e sebelisoe ka seoelo haholo ho tlisa data ka mokhoa o nepahetseng, leha ho le joalo, mokhoa ona o ka ba molemo bakeng sa ho theha litafole tsa pivot tse sebelisoang lipontšong, kapa bakeng sa ho hokahana le lisebelisoa tse ling.

Haele hantle mesebetsi pivot_longer() и pivot_wider() li na le symmetrical, 'me li hlahisa liketso tse fapaneng, ke hore: df %>% pivot_longer(spec = spec) %>% pivot_wider(spec = spec) и df %>% pivot_wider(spec = spec) %>% pivot_longer(spec = spec) e tla khutlisa df ea pele.

Mohlala o bonolo oa ho fetolela tafole ho sebopeho se pharaletseng

Ho bontša hore na mosebetsi o sebetsa joang pivot_wider() re tla sebelisa dataset tlhapi_dikopano, e bolokang tlhahisoleseding mabapi le hore na liteishene tse fapaneng li tlaleha ho tsamaea ha litlhapi haufi le nōka.

#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # … with 104 more rows

Maemong a mangata, tafole ena e tla ba e rutang haholoanyane, 'me ho be bonolo ho e sebelisa haeba u hlahisa tlhahisoleseling bakeng sa seteishene ka seng kholumong e fapaneng.

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)

fish_encounters %>% pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA
#> # … with 9 more rows, and 1 more variable: MAW <int>

Lintlha tsena tse behiloeng li tlaleha tlhahisoleseding feela ha litlhapi li fumanoe ke seteisheneng, i.e. haeba tlhapi leha e le efe e sa tlalehoa ke seteishene se seng, joale data ena e ke ke ea e-ba teng tafoleng. Sena se bolela hore tlhahiso e tla tlatsoa ka NA.

Leha ho le joalo, tabeng ena rea ​​tseba hore ho ba sieo ha tlaleho ho bolela hore litlhapi ha lia ka tsa bonoa, kahoo re ka sebelisa khang values_tlatsa mosebetsing pivot_wider() 'me u tlatse litekanyetso tsena tse sieo ka zero:

fish_encounters %>% pivot_wider(
  names_from = station, 
  values_from = seen,
  values_fill = list(seen = 0)
)

#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1
#>  4 4845        1     1      1     1       1     0     0     0     0     0
#>  5 4847        1     1      1     0       0     0     0     0     0     0
#>  6 4848        1     1      1     1       0     0     0     0     0     0
#>  7 4849        1     1      0     0       0     0     0     0     0     0
#>  8 4850        1     1      0     1       1     1     1     0     0     0
#>  9 4851        1     1      0     0       0     0     0     0     0     0
#> 10 4854        1     1      0     0       0     0     0     0     0     0
#> # … with 9 more rows, and 1 more variable: MAW <int>

Ho hlahisa lebitso la kholomo ho tsoa ho mefuta e mengata ea mehloli

Ak'u nahane re na le tafole e nang le motsoako oa lihlahisoa, naha le selemo. Ho etsa moralo oa letsatsi la teko, o ka sebelisa khoutu e latelang:

df <- expand_grid(
  product = c("A", "B"), 
  country = c("AI", "EI"), 
  year = 2000:2014
) %>%
  filter((product == "A" & country == "AI") | product == "B") %>% 
  mutate(value = rnorm(nrow(.)))

#> # A tibble: 45 x 4
#>    product country  year    value
#>    <chr>   <chr>   <int>    <dbl>
#>  1 A       AI       2000 -2.05   
#>  2 A       AI       2001 -0.676  
#>  3 A       AI       2002  1.60   
#>  4 A       AI       2003 -0.353  
#>  5 A       AI       2004 -0.00530
#>  6 A       AI       2005  0.442  
#>  7 A       AI       2006 -0.610  
#>  8 A       AI       2007 -2.77   
#>  9 A       AI       2008  0.899  
#> 10 A       AI       2009 -0.106  
#> # … with 35 more rows

Mosebetsi oa rona ke ho holisa moralo oa data e le hore kholomo e le 'ngoe e be le data bakeng sa motsoako o mong le o mong oa sehlahisoa le naha. Ho etsa sena, kenya feela moqoqong mabitso_ho tsoa vector e nang le mabitso a masimo a tla kopanngoa.

df %>% pivot_wider(names_from = c(product, country),
                 values_from = "value")

#> # A tibble: 15 x 4
#>     year     A_AI    B_AI    B_EI
#>    <int>    <dbl>   <dbl>   <dbl>
#>  1  2000 -2.05     0.607   1.20  
#>  2  2001 -0.676    1.65   -0.114 
#>  3  2002  1.60    -0.0245  0.501 
#>  4  2003 -0.353    1.30   -0.459 
#>  5  2004 -0.00530  0.921  -0.0589
#>  6  2005  0.442   -1.55    0.594 
#>  7  2006 -0.610    0.380  -1.28  
#>  8  2007 -2.77     0.830   0.637 
#>  9  2008  0.899    0.0175 -1.30  
#> 10  2009 -0.106   -0.195   1.03  
#> # … with 5 more rows

U ka boela ua sebelisa litlhaloso ho ts'ebetso pivot_wider(). Empa ha e fetisetsoa ho pivot_wider() tlhaloso e etsa phetoho e fapaneng pivot_longer(): Litšiea tse boletsoeng ho .name, ho sebelisa litekanyetso tse tsoang ho .boleng le litšiea tse ling.

Bakeng sa datha ena, o ka hlahisa litlhaloso tsa tloaelo haeba u batla hore naha e 'ngoe le e 'ngoe e ka khonehang le motsoako oa lihlahisoa li be le kholomo ea eona, eseng feela tse teng ho data:

spec <- df %>% 
  expand(product, country, .value = "value") %>% 
  unite(".name", product, country, remove = FALSE)

#> # A tibble: 4 x 4
#>   .name product country .value
#>   <chr> <chr>   <chr>   <chr> 
#> 1 A_AI  A       AI      value 
#> 2 A_EI  A       EI      value 
#> 3 B_AI  B       AI      value 
#> 4 B_EI  B       EI      value

df %>% pivot_wider(spec = spec) %>% head()

#> # A tibble: 6 x 5
#>    year     A_AI  A_EI    B_AI    B_EI
#>   <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2000 -2.05       NA  0.607   1.20  
#> 2  2001 -0.676      NA  1.65   -0.114 
#> 3  2002  1.60       NA -0.0245  0.501 
#> 4  2003 -0.353      NA  1.30   -0.459 
#> 5  2004 -0.00530    NA  0.921  -0.0589
#> 6  2005  0.442      NA -1.55    0.594

Mehlala e mengata e tsoetseng pele ea ho sebetsa le mohopolo o mocha oa tidyr

Ho hloekisa data ka ho sebelisa mohlala oa Lekeno la Census le Rent ea US e le mohlala.

Sete ya data rona_rente_motseno e na le lekeno la mahareng le tlhaiso-leseling ea rente bakeng sa naha e 'ngoe le e 'ngoe ea US bakeng sa 2017 (sete ea data e fumaneha ka har'a sephutheloana tidycensus).

us_rent_income
#> # A tibble: 104 x 5
#>    GEOID NAME       variable estimate   moe
#>    <chr> <chr>      <chr>       <dbl> <dbl>
#>  1 01    Alabama    income      24476   136
#>  2 01    Alabama    rent          747     3
#>  3 02    Alaska     income      32940   508
#>  4 02    Alaska     rent         1200    13
#>  5 04    Arizona    income      27517   148
#>  6 04    Arizona    rent          972     4
#>  7 05    Arkansas   income      23789   165
#>  8 05    Arkansas   rent          709     5
#>  9 06    California income      29454   109
#> 10 06    California rent         1358     3
#> # … with 94 more rows

Ka mokhoa oo data e bolokiloeng ka eona ho dataset rona_rente_motseno ho sebetsa le bona ha ho bonolo haholo, kahoo re ka rata ho theha sete ea data e nang le likholomo: rente, rent_moe, tla, chelete_moe. Ho na le mekhoa e mengata ea ho theha tlhaloso ena, empa ntlha ea bohlokoa ke hore re hloka ho hlahisa motsoako o mong le o mong oa litekanyetso tse fapaneng estimate/moeebe o hlahisa lebitso la kholomo.

  spec <- us_rent_income %>% 
    expand(variable, .value = c("estimate", "moe")) %>% 
    mutate(
      .name = paste0(variable, ifelse(.value == "moe", "_moe", ""))
    )

#> # A tibble: 4 x 3
#>   variable .value   .name     
#>   <chr>    <chr>    <chr>     
#> 1 income   estimate income    
#> 2 income   moe      income_moe
#> 3 rent     estimate rent      
#> 4 rent     moe      rent_moe

Ho fana ka tlhaloso ena pivot_wider() e re fa sephetho seo re se batlang:

us_rent_income %>% pivot_wider(spec = spec)

#> # A tibble: 52 x 6
#>    GEOID NAME                 income income_moe  rent rent_moe
#>    <chr> <chr>                 <dbl>      <dbl> <dbl>    <dbl>
#>  1 01    Alabama               24476        136   747        3
#>  2 02    Alaska                32940        508  1200       13
#>  3 04    Arizona               27517        148   972        4
#>  4 05    Arkansas              23789        165   709        5
#>  5 06    California            29454        109  1358        3
#>  6 08    Colorado              32401        109  1125        5
#>  7 09    Connecticut           35326        195  1123        5
#>  8 10    Delaware              31560        247  1076       10
#>  9 11    District of Columbia  43198        681  1424       17
#> 10 12    Florida               25952         70  1077        3
#> # … with 42 more rows

Banka ea Lefatše

Ka linako tse ling ho tlisa data e behiloeng ka foromo e lakatsehang ho hloka mehato e mengata.
Lethathamo la boitsebiso world_bank_pop e na le lintlha tsa Banka ea Lefatše mabapi le baahi ba naha ka 'ngoe pakeng tsa 2000 le 2018.

#> # A tibble: 1,056 x 20
#>    country indicator `2000` `2001` `2002` `2003`  `2004`  `2005`   `2006`
#>    <chr>   <chr>      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
#>  1 ABW     SP.URB.T… 4.24e4 4.30e4 4.37e4 4.42e4 4.47e+4 4.49e+4  4.49e+4
#>  2 ABW     SP.URB.G… 1.18e0 1.41e0 1.43e0 1.31e0 9.51e-1 4.91e-1 -1.78e-2
#>  3 ABW     SP.POP.T… 9.09e4 9.29e4 9.50e4 9.70e4 9.87e+4 1.00e+5  1.01e+5
#>  4 ABW     SP.POP.G… 2.06e0 2.23e0 2.23e0 2.11e0 1.76e+0 1.30e+0  7.98e-1
#>  5 AFG     SP.URB.T… 4.44e6 4.65e6 4.89e6 5.16e6 5.43e+6 5.69e+6  5.93e+6
#>  6 AFG     SP.URB.G… 3.91e0 4.66e0 5.13e0 5.23e0 5.12e+0 4.77e+0  4.12e+0
#>  7 AFG     SP.POP.T… 2.01e7 2.10e7 2.20e7 2.31e7 2.41e+7 2.51e+7  2.59e+7
#>  8 AFG     SP.POP.G… 3.49e0 4.25e0 4.72e0 4.82e0 4.47e+0 3.87e+0  3.23e+0
#>  9 AGO     SP.URB.T… 8.23e6 8.71e6 9.22e6 9.77e6 1.03e+7 1.09e+7  1.15e+7
#> 10 AGO     SP.URB.G… 5.44e0 5.59e0 5.70e0 5.76e0 5.75e+0 5.69e+0  4.92e+0
#> # … with 1,046 more rows, and 11 more variables: `2007` <dbl>,
#> #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#> #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>

Sepheo sa rona ke ho theha data e makhethe e nang le phapang e 'ngoe le e' ngoe ka har'a kholomo ea eona. Ha ho tsejoe hantle hore na ho hlokahala mehato efe, empa re tla qala ka bothata bo hlakileng ka ho fetesisa: selemo se phatlalalitsoe likholomong tse ngata.

Ho lokisa sena, o hloka ho sebelisa sesebelisoa pivot_longer().

pop2 <- world_bank_pop %>% 
  pivot_longer(`2000`:`2017`, names_to = "year")

#> # A tibble: 19,008 x 4
#>    country indicator   year  value
#>    <chr>   <chr>       <chr> <dbl>
#>  1 ABW     SP.URB.TOTL 2000  42444
#>  2 ABW     SP.URB.TOTL 2001  43048
#>  3 ABW     SP.URB.TOTL 2002  43670
#>  4 ABW     SP.URB.TOTL 2003  44246
#>  5 ABW     SP.URB.TOTL 2004  44669
#>  6 ABW     SP.URB.TOTL 2005  44889
#>  7 ABW     SP.URB.TOTL 2006  44881
#>  8 ABW     SP.URB.TOTL 2007  44686
#>  9 ABW     SP.URB.TOTL 2008  44375
#> 10 ABW     SP.URB.TOTL 2009  44052
#> # … with 18,998 more rows

Mohato o latelang ke ho sheba phapang ea indicator.
pop2 %>% count(indicator)

#> # A tibble: 4 x 2
#>   indicator       n
#>   <chr>       <int>
#> 1 SP.POP.GROW  4752
#> 2 SP.POP.TOTL  4752
#> 3 SP.URB.GROW  4752
#> 4 SP.URB.TOTL  4752

Moo SP.POP.GROW e leng kholo ea baahi, SP.POP.TOTL ke kakaretso ea baahi, le SP.URB. * ntho e tšoanang, empa bakeng sa libaka tsa litoropo feela. Ha re arole boleng bona ka mefuta e 'meli: sebaka - sebaka (kakaretso kapa toropo) le phapano e nang le data ea nnete (baahi kapa kholo):

pop3 <- pop2 %>% 
  separate(indicator, c(NA, "area", "variable"))

#> # A tibble: 19,008 x 5
#>    country area  variable year  value
#>    <chr>   <chr> <chr>    <chr> <dbl>
#>  1 ABW     URB   TOTL     2000  42444
#>  2 ABW     URB   TOTL     2001  43048
#>  3 ABW     URB   TOTL     2002  43670
#>  4 ABW     URB   TOTL     2003  44246
#>  5 ABW     URB   TOTL     2004  44669
#>  6 ABW     URB   TOTL     2005  44889
#>  7 ABW     URB   TOTL     2006  44881
#>  8 ABW     URB   TOTL     2007  44686
#>  9 ABW     URB   TOTL     2008  44375
#> 10 ABW     URB   TOTL     2009  44052
#> # … with 18,998 more rows

Hona joale sohle seo re lokelang ho se etsa ke ho arola phapang ka lihlopha tse peli:

pop3 %>% 
  pivot_wider(names_from = variable, values_from = value)

#> # A tibble: 9,504 x 5
#>    country area  year   TOTL    GROW
#>    <chr>   <chr> <chr> <dbl>   <dbl>
#>  1 ABW     URB   2000  42444  1.18  
#>  2 ABW     URB   2001  43048  1.41  
#>  3 ABW     URB   2002  43670  1.43  
#>  4 ABW     URB   2003  44246  1.31  
#>  5 ABW     URB   2004  44669  0.951 
#>  6 ABW     URB   2005  44889  0.491 
#>  7 ABW     URB   2006  44881 -0.0178
#>  8 ABW     URB   2007  44686 -0.435 
#>  9 ABW     URB   2008  44375 -0.698 
#> 10 ABW     URB   2009  44052 -0.731 
#> # … with 9,494 more rows

Lenane la mabitso

Mohlala o le mong oa ho qetela, nka hore u na le lethathamo la mabitso leo u le kopitsitseng le ho le ngolla webosaeteng:

contacts <- tribble(
  ~field, ~value,
  "name", "Jiena McLellan",
  "company", "Toyota", 
  "name", "John Smith", 
  "company", "google", 
  "email", "[email protected]",
  "name", "Huxley Ratcliffe"
)

Ho beha lenane lena lethathamong ho thata haholo hobane ha ho na phetoho e supang hore na data ke ea mang. Re ka lokisa sena ka ho hlokomela hore lintlha tsa lebitso le leng le le leng le lecha li qala ka "lebitso", kahoo re ka theha sekhetho se ikhethileng 'me ra se eketsa ka se le seng nako le nako ha kholomo ea tšimo e na le "lebitso" la bohlokoa:

contacts <- contacts %>% 
  mutate(
    person_id = cumsum(field == "name")
  )
contacts

#> # A tibble: 6 x 3
#>   field   value            person_id
#>   <chr>   <chr>                <int>
#> 1 name    Jiena McLellan           1
#> 2 company Toyota                   1
#> 3 name    John Smith               2
#> 4 company google                   2
#> 5 email   [email protected]          2
#> 6 name    Huxley Ratcliffe         3

Kaha joale re na le ID e ikhethang bakeng sa lebitso le leng le le leng, re ka fetola tšimo le boleng hore e be likholomo:

contacts %>% 
  pivot_wider(names_from = field, values_from = value)

#> # A tibble: 3 x 4
#>   person_id name             company email          
#>       <int> <chr>            <chr>   <chr>          
#> 1         1 Jiena McLellan   Toyota  <NA>           
#> 2         2 John Smith       google  [email protected]
#> 3         3 Huxley Ratcliffe <NA>    <NA>

fihlela qeto e

Maikutlo a ka ke hore mohopolo o mocha tidyr e hlakileng haholoanyane, 'me e phahame haholo ts'ebetsong ho feta mesebetsi e fetileng spread() и gather(). Ke tšepa hore sehlooho sena se u thusitse ho sebetsana le pivot_longer() и pivot_wider().

Source: www.habr.com

Eketsa ka tlhaloso