ProHoster > blog > Utawala > Kupanua safu wima zilizowekwa - orodha kwa kutumia lugha ya R (kifurushi cha tidyr na vitendaji vya familia isiyofaa)
Kupanua safu wima zilizowekwa - orodha kwa kutumia lugha ya R (kifurushi cha tidyr na vitendaji vya familia isiyofaa)
Katika hali nyingi, unapofanya kazi na jibu lililopokelewa kutoka kwa API, au kwa data nyingine yoyote ambayo ina muundo changamano wa mti, unakabiliwa na umbizo la JSON na XML.
Miundo hii ina faida nyingi: huhifadhi data kwa ukamilifu na inakuwezesha kuepuka kurudia kwa habari isiyo ya lazima.
Ubaya wa miundo hii ni ugumu wa usindikaji na uchambuzi wao. Data isiyo na muundo haiwezi kutumika katika mahesabu na taswira haiwezi kujengwa juu yake.
Makala hii ni mwendelezo wa kimantiki wa uchapishaji "R kifurushi tidyr na utendaji wake mpya pivot_longer na pivot_pana zaidi". Itakusaidia kuleta miundo ya data isiyo na muundo katika fomu inayofahamika na inayofaa kwa uchanganuzi wa jedwali kwa kutumia kifurushi tidyr, iliyojumuishwa katika msingi wa maktaba tidyverse, na familia ya majukumu yake unnest_*().
yaliyomo
Ikiwa una nia ya uchambuzi wa data, unaweza kupendezwa na yangu telegram и youtube njia. Maudhui mengi yamejitolea kwa lugha ya R.
Mstatili(maelezo ya mtafsiri, sikupata chaguo za kutosha za tafsiri kwa neno hili, kwa hivyo tutaliacha jinsi lilivyo.) ni mchakato wa kuleta data isiyo na muundo na safu zilizowekwa kwenye jedwali la pande mbili linalojumuisha safu mlalo na safu wima zinazojulikana. KATIKA tidyr Kuna chaguo kadhaa za kukokotoa ambazo zitakusaidia kupanua safu wima za orodha zilizowekwa na kupunguza data kuwa ya mstatili, fomu ya jedwali:
unnest_longer() inachukua kila kipengele cha orodha ya safu na kuunda safu mpya.
unnest_wider() inachukua kila kipengele cha orodha ya safu na kuunda safu mpya.
unnest_auto() huamua kiotomatiki kitendakazi kinafaa zaidi kutumia unnest_longer() au unnest_wider().
hoist() sawa na unnest_wider() lakini huchagua tu vijenzi vilivyoainishwa na hukuruhusu kufanya kazi na viwango kadhaa vya kuota.
Shida nyingi zinazohusiana na kuleta data isiyo na muundo na viwango kadhaa vya kuota kwenye jedwali la pande mbili zinaweza kutatuliwa kwa kuchanganya kazi zilizoorodheshwa na dplyr.
Ili kuonyesha mbinu hizi, tutatumia mfuko repurrrsive, ambayo hutoa orodha nyingi changamano, za ngazi nyingi zinazotokana na API ya wavuti.
Anza na gh_watumiaji, orodha ambayo ina habari kuhusu watumiaji sita wa GitHub. Kwanza hebu tubadilishe orodha gh_watumiaji в tibble fremu:
users <- tibble( user = gh_users )
Hii inaonekana kinyume kidogo: kwa nini kutoa orodha gh_watumiaji, kwa muundo changamano zaidi wa data? Lakini sura ya data ina faida kubwa: inachanganya vekta nyingi ili kila kitu kifuatiliwe katika kitu kimoja.
Kila kipengele cha kitu users ni orodha iliyopewa jina ambayo kila kipengele kinawakilisha safu.
Katika kesi hii, tunayo meza inayojumuisha safu wima 30, na hatutahitaji nyingi kati yao, kwa hivyo tunaweza badala yake. unnest_wider() kutumia hoist(). hoist() huturuhusu kutoa vipengele vilivyochaguliwa kwa kutumia sintaksia sawa na purrr::pluck():
users %>% hoist(user,
followers = "followers",
login = "login",
url = "html_url"
)
#> # A tibble: 6 x 4
#> followers login url user
#> <int> <chr> <chr> <list>
#> 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2 780 jennybc https://github.com/jennybc <named list [27]>
#> 3 3958 jtleek https://github.com/jtleek <named list [27]>
#> 4 115 juliasilge https://github.com/juliasilge <named list [27]>
#> 5 213 leeper https://github.com/leeper <named list [27]>
#> 6 34 masalmon https://github.com/masalmon <named list [27]>
hoist() huondoa vipengele vilivyotajwa vilivyotajwa kwenye orodha ya safu wima userili uweze kuzingatia hoist() kama vile kuhamisha vipengee kutoka kwa orodha ya ndani ya fremu ya tarehe hadi kiwango chake cha juu.
Hifadhi za Github
Mpangilio wa orodha gh_repos tunaanza vile vile kwa kuibadilisha kuwa tibble:
Wakati huu vipengele user wakilisha orodha ya hazina zinazomilikiwa na mtumiaji huyu. Kila hazina ni uchunguzi tofauti, kwa hivyo kulingana na wazo la data safi (takriban data safi) zinapaswa kuwa mistari mpya, ndiyo maana tunaitumia unnest_longer() na sio unnest_wider():
repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 x 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> 7 <named list [68]>
#> 8 <named list [68]>
#> 9 <named list [68]>
#> 10 <named list [68]>
#> # … with 166 more rows
Sasa tunaweza kutumia unnest_wider() au hoist() :
repos %>% hoist(repo,
login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)
#> # A tibble: 176 x 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> 7 gaborcsardi cmaker <NA> 0 <named list [65]>
#> 8 gaborcsardi cmark <NA> 0 <named list [65]>
#> 9 gaborcsardi conditions <NA> 0 <named list [65]>
#> 10 gaborcsardi crayon <NA> 52 <named list [65]>
#> # … with 166 more rows
Makini na matumizi c("owner", "login"): Hii inaturuhusu kupata thamani ya kiwango cha pili kutoka kwa orodha iliyoorodheshwa owner. Njia mbadala ni kupata orodha nzima owner na kisha kutumia kazi unnest_wider() weka kila moja ya vitu vyake kwenye safu:
Badala ya kufikiria kuchagua kazi sahihi unnest_longer() au unnest_wider() unaweza kutumia unnest_auto(). Kitendaji hiki kinatumia mbinu kadhaa za kiheuristic kuchagua kitendakazi kinachofaa zaidi kwa kubadilisha data, na huonyesha ujumbe kuhusu mbinu iliyochaguliwa.
got_chars ina muundo sawa na gh_users: Hii ni seti ya orodha zilizotajwa, ambapo kila kipengele cha orodha ya ndani kinaeleza baadhi ya sifa za mhusika wa Mchezo wa Viti vya Enzi. Kuleta got_chars Kwa mwonekano wa jedwali, tunaanza kwa kuunda fremu ya tarehe, kama vile katika mifano iliyopita, na kisha kubadilisha kila kipengele kuwa safu tofauti:
chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 x 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> 7 <named list [18]>
#> 8 <named list [18]>
#> 9 <named list [18]>
#> 10 <named list [18]>
#> # … with 20 more rows
chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 http… 1022 Theo… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 2 http… 1052 Tyri… Male "" In 2… "" TRUE <chr … <chr [… ""
#> 3 http… 1074 Vict… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 4 http… 1109 Will Male "" "" In 2… FALSE <chr … <chr [… ""
#> 5 http… 1166 Areo… Male Norvos… In 2… "" TRUE <chr … <chr [… ""
#> 6 http… 1267 Chett Male "" At H… In 2… FALSE <chr … <chr [… ""
#> 7 http… 1295 Cres… Male "" In 2… In 2… FALSE <chr … <chr [… ""
#> 8 http… 130 Aria… Female Dornish In 2… "" TRUE <chr … <chr [… ""
#> 9 http… 1303 Daen… Female Valyri… In 2… "" TRUE <chr … <chr [… ""
#> 10 http… 1319 Davo… Male Wester… In 2… "" TRUE <chr … <chr [… ""
#> # … with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>
Muundo got_chars ngumu zaidi kuliko gh_users, kwa sababu baadhi ya vipengele vya orodha char wenyewe ni orodha, matokeo yake tunapata nguzo - orodha:
Matendo yako zaidi yanategemea malengo ya uchambuzi. Labda unahitaji kuweka habari kwenye mistari kwa kila kitabu na safu ambayo mhusika anaonekana:
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 x 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> 7 Theon Greyjoy tvSeries Season 4
#> 8 Theon Greyjoy tvSeries Season 5
#> 9 Theon Greyjoy tvSeries Season 6
#> 10 Tyrion Lannister books A Feast for Crows
#> # … with 170 more rows
Au labda unataka kuunda meza ambayo hukuruhusu kulinganisha mhusika na kazi:
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
(Kumbuka maadili tupu "" shambani title, hii ni kutokana na makosa yaliyofanywa wakati wa kuingiza data ndani got_chars: kwa kweli, wahusika ambao hakuna vitabu sambamba na vichwa vya mfululizo wa TV kwenye uwanja title lazima iwe na vekta ya urefu 0, sio vekta ya urefu 1 iliyo na kamba tupu.)
Tunaweza kuandika tena mfano hapo juu kwa kutumia chaguo la kukokotoa unnest_auto(). Njia hii ni rahisi kwa uchambuzi wa wakati mmoja, lakini haupaswi kutegemea unnest_auto() kwa matumizi ya mara kwa mara. Jambo ni kwamba ikiwa muundo wako wa data utabadilika unnest_auto() inaweza kubadilisha utaratibu uliochaguliwa wa kubadilisha data ikiwa mwanzoni ilipanua safu wima za orodha kuwa safu mlalo kwa kutumia unnest_longer(), basi muundo wa data zinazoingia unapobadilika, mantiki inaweza kubadilishwa kwa manufaa unnest_wider(), na kutumia mbinu hii kwa kuendelea kunaweza kusababisha makosa yasiyotarajiwa.
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title)`; no element has names
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
Geocoding na Google
Kisha, tutaangalia muundo changamano zaidi wa data iliyopatikana kutoka kwa huduma ya Google ya kuweka misimbo. Sifa za kuhifadhi ni kinyume na sheria za kufanya kazi na API ya ramani za Google, kwa hivyo nitaandika kwanza karatasi rahisi kuzunguka API. Ambayo inategemea kuhifadhi ufunguo wa API ya Ramani za Google katika utofauti wa mazingira; Iwapo huna ufunguo wa kufanya kazi na API ya Ramani za Google iliyohifadhiwa katika anuwai ya mazingira yako, vipande vya msimbo vilivyowasilishwa katika sehemu hii havitatekelezwa.
has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
url <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
jsonlite::read_json(url)
}
Kwa bahati nzuri, tunaweza kutatua tatizo la kubadilisha data hii katika fomu ya tabular hatua kwa hatua kwa kutumia kazi tidyr. Ili kufanya kazi kuwa ngumu zaidi na ya kweli, nitaanza kwa kuweka misimbo ya miji michache:
city <- c ( "Houston" , "LA" , "New York" , "Chicago" , "Springfield" ) city_geo <- purrr::map (city, geocode)
Nitabadilisha matokeo kuwa tibble, kwa urahisi, nitaongeza safu na jina la jiji linalolingana.
loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 x 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>
Ngazi ya kwanza ina vipengele status и result, ambayo tunaweza kupanua nayo unnest_wider() :
loc %>%
unnest_wider(json)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OK
Tafadhali kumbuka kuwa results ni orodha ya ngazi mbalimbali. Miji mingi ina kipengele 1 (inayowakilisha thamani ya kipekee inayolingana na API ya geocoding), lakini Springfield ina mbili. Tunaweza kuwavuta katika mistari tofauti na unnest_longer() :
loc %>%
unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [5]> OK
Sasa wote wana vipengele sawa, ambavyo vinaweza kuthibitishwa kwa kutumia unnest_wider():
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 x 7
#> city address_componen… formatted_addre… geometry place_id types status
#> <chr> <list> <chr> <list> <chr> <lis> <chr>
#> 1 Houst… <list [4]> Houston, TX, USA <named … ChIJAYWN… <lis… OK
#> 2 LA <list [4]> Los Angeles, CA… <named … ChIJE9on… <lis… OK
#> 3 New Y… <list [3]> New York, NY, U… <named … ChIJOwg_… <lis… OK
#> 4 Chica… <list [4]> Chicago, IL, USA <named … ChIJ7cv0… <lis… OK
#> 5 Sprin… <list [5]> Springfield, MO… <named … ChIJP5jI… <lis… OK
Tunaweza kupata viwianishi vya latitudo na longitudo vya kila jiji kwa kupanua orodha geometry:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 x 10
#> city address_compone… formatted_addre… bounds location location_type
#> <chr> <list> <chr> <list> <list> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… <named … APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… <named … APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… <named … APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… <named … APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… <named … APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Na kisha eneo ambalo unahitaji kupanua location:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Tena tena, unnest_auto() hurahisisha utendakazi ulioelezewa na hatari kadhaa ambazo zinaweza kusababishwa na kubadilisha muundo wa data inayoingia:
loc %>%
unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 4 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Tunaweza pia kuangalia tu anwani ya kwanza kwa kila jiji:
loc %>%
unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Au tumia hoist() kwa kupiga mbizi kwa ngazi nyingi kwenda moja kwa moja lat и lng.
loc %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
#> # A tibble: 5 x 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 37.2 -93.3 <named list [2]>
Discografia ya Sharla Gelfand
Hatimaye, tutaangalia muundo ulio ngumu zaidi - taswira ya Sharla Gelfand. Kama ilivyo katika mifano hapo juu, tunaanza kwa kubadilisha orodha kuwa safu ya data ya safu moja, na kisha kuipanua ili kila sehemu iwe safu tofauti. Pia mimi hubadilisha safu date_added kwa muundo wa tarehe na wakati unaofaa katika R.
discs <- tibble(disc = discog) %>%
unnest_wider(disc) %>%
mutate(date_added = as.POSIXct(strptime(date_added, "%Y-%m-%dT%H:%M:%S")))
discs
#> # A tibble: 155 x 5
#> instance_id date_added basic_information id rating
#> <int> <dttm> <list> <int> <int>
#> 1 354823933 2019-02-16 17:48:59 <named list [11]> 7496378 0
#> 2 354092601 2019-02-13 14:13:11 <named list [11]> 4490852 0
#> 3 354091476 2019-02-13 14:07:23 <named list [11]> 9827276 0
#> 4 351244906 2019-02-02 11:39:58 <named list [11]> 9769203 0
#> 5 351244801 2019-02-02 11:39:37 <named list [11]> 7237138 0
#> 6 351052065 2019-02-01 20:40:53 <named list [11]> 13117042 0
#> 7 350315345 2019-01-29 15:48:37 <named list [11]> 7113575 0
#> 8 350315103 2019-01-29 15:47:22 <named list [11]> 10540713 0
#> 9 350314507 2019-01-29 15:44:08 <named list [11]> 11260950 0
#> 10 350314047 2019-01-29 15:41:35 <named list [11]> 11726853 0
#> # … with 145 more rows
Katika kiwango hiki, tunapata taarifa kuhusu wakati kila diski iliongezwa kwenye diski ya Sharla, lakini hatuoni data yoyote kuhusu diski hizo. Ili kufanya hivyo tunahitaji kupanua safu basic_information:
discs %>% unnest_wider(basic_information)
#> Column name `id` must not be duplicated.
#> Use .name_repair to specify repair.
Kwa bahati mbaya, tutapokea hitilafu, kwa sababu... ndani ya orodha basic_information kuna safu ya jina moja basic_information. Ikiwa hitilafu hiyo hutokea, ili kuamua haraka sababu yake, unaweza kutumia names_repair = "unique":
Kisha unaweza kuziunganisha kurudi kwenye mkusanyiko wa data halisi kama inavyohitajika.
Hitimisho
Kwa msingi wa maktaba tidyverse inajumuisha vifurushi vingi muhimu vilivyounganishwa na falsafa ya kawaida ya usindikaji wa data.
Katika makala hii tulichunguza familia ya kazi unnest_*(), ambayo inalenga kufanya kazi na kutoa vipengele kutoka kwa orodha zilizowekwa. Kifurushi hiki kina vipengele vingine vingi muhimu vinavyorahisisha kubadilisha data kulingana na dhana Data Nadhifu.