unnest_auto() txiav txim siab seb qhov twg yog qhov zoo tshaj plaws los siv unnest_longer() los yog unnest_wider().
hoist() zoo ib yam li unnest_wider() tab sis xaiv tsuas yog cov ntsiab lus teev tseg thiab tso cai rau koj ua haujlwm nrog ntau theem ntawm kev ua zes.
Feem ntau ntawm cov teeb meem cuam tshuam nrog kev nqa cov ntaub ntawv tsis muaj teeb meem nrog ntau theem ntawm kev ua zes rau hauv lub rooj ob sab tuaj yeem daws tau los ntawm kev sib txuas cov haujlwm teev nrog dplyr.
Txhawm rau ua kom pom cov txheej txheem no, peb yuav siv lub pob repurrrsive, uas muab ntau yam complex, ntau-qib cov npe muab los ntawm lub vev xaib API.
Lub sijhawm no cov ntsiab lus cov neeg siv sawv cev rau ib daim ntawv teev cov chaw khaws cia uas muaj los ntawm tus neeg siv no. Txhua qhov chaw khaws cia yog ib qho kev soj ntsuam cais, yog li raws li lub tswv yim ntawm cov ntaub ntawv zoo (approx. tidy data) lawv yuav tsum dhau los ua kab tshiab, uas yog vim li cas peb siv unnest_longer() tsis tau unnest_wider():
repos <- repos %>% unnest_longer(repo)
repos
#> # A tibble: 176 x 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> 7 <named list [68]>
#> 8 <named list [68]>
#> 9 <named list [68]>
#> 10 <named list [68]>
#> # … with 166 more rows
Tam sim no peb tuaj yeem siv tau unnest_wider() los yog hoist() :
repos %>% hoist(repo,
login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)
#> # A tibble: 176 x 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> 7 gaborcsardi cmaker <NA> 0 <named list [65]>
#> 8 gaborcsardi cmark <NA> 0 <named list [65]>
#> 9 gaborcsardi conditions <NA> 0 <named list [65]>
#> 10 gaborcsardi crayon <NA> 52 <named list [65]>
#> # … with 166 more rows
Them sai sai rau kev siv c("owner", "login"): Qhov no tso cai rau peb kom tau txais qib thib ob tus nqi los ntawm cov npe nested owner. Lwm txoj hauv kev yog kom tau txais tag nrho cov npe owner thiab tom qab ntawd siv cov haujlwm unnest_wider() muab txhua yam ntawm nws lub ntsiab rau hauv ib kab:
got_chars muaj ib tug zoo tib yam qauv rau gh_users: Nov yog cov npe teev npe, qhov twg txhua lub ntsiab lus ntawm cov npe sab hauv piav qhia qee tus cwj pwm ntawm Game of Thrones tus cwj pwm. nqa got_chars Rau lub rooj saib, peb pib los ntawm kev tsim ib lub hnub thav duab, ib yam li hauv cov piv txwv yav dhau los, thiab tom qab ntawd hloov txhua lub caij mus rau hauv ib kem cais:
chars <- tibble(char = got_chars)
chars
#> # A tibble: 30 x 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> 7 <named list [18]>
#> 8 <named list [18]>
#> 9 <named list [18]>
#> 10 <named list [18]>
#> # … with 20 more rows
chars2 <- chars %>% unnest_wider(char)
chars2
#> # A tibble: 30 x 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 http… 1022 Theo… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 2 http… 1052 Tyri… Male "" In 2… "" TRUE <chr … <chr [… ""
#> 3 http… 1074 Vict… Male Ironbo… In 2… "" TRUE <chr … <chr [… ""
#> 4 http… 1109 Will Male "" "" In 2… FALSE <chr … <chr [… ""
#> 5 http… 1166 Areo… Male Norvos… In 2… "" TRUE <chr … <chr [… ""
#> 6 http… 1267 Chett Male "" At H… In 2… FALSE <chr … <chr [… ""
#> 7 http… 1295 Cres… Male "" In 2… In 2… FALSE <chr … <chr [… ""
#> 8 http… 130 Aria… Female Dornish In 2… "" TRUE <chr … <chr [… ""
#> 9 http… 1303 Daen… Female Valyri… In 2… "" TRUE <chr … <chr [… ""
#> 10 http… 1319 Davo… Male Wester… In 2… "" TRUE <chr … <chr [… ""
#> # … with 20 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>
Qauv got_chars me ntsis nyuaj dua gh_users, vim ib co npe Cheebtsam char lawv tus kheej yog ib daim ntawv teev npe, vim li ntawd peb tau txais cov ncej - cov npe:
Koj qhov kev nqis tes ua ntxiv yog nyob ntawm lub hom phiaj ntawm kev txheeb xyuas. Tej zaum koj yuav tsum tau muab cov ntaub ntawv tso rau ntawm kab rau txhua phau ntawv thiab cov yeeb yam uas tus cwj pwm tshwm:
chars2 %>%
select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 x 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> 7 Theon Greyjoy tvSeries Season 4
#> 8 Theon Greyjoy tvSeries Season 5
#> 9 Theon Greyjoy tvSeries Season 6
#> 10 Tyrion Lannister books A Feast for Crows
#> # … with 170 more rows
Los yog tej zaum koj xav tsim ib lub rooj uas tso cai rau koj kom phim tus cwj pwm thiab kev ua haujlwm:
chars2 %>%
select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
(Nco ntsoov cov nqi khoob "" hauv teb title, qhov no yog vim qhov yuam kev ua thaum nkag mus rau hauv cov ntaub ntawv got_chars: Qhov tseeb, cov cim uas tsis muaj cov phau ntawv sib raug zoo thiab TV series lub npe hauv thaj teb title yuav tsum muaj vector ntawm qhov ntev 0, tsis yog vector ntawm qhov ntev 1 uas muaj cov hlua khoob.)
Peb tuaj yeem sau cov piv txwv saum toj no siv cov haujlwm unnest_auto(). Txoj hauv kev no yooj yim rau kev tshuaj xyuas ib zaug, tab sis koj yuav tsum tsis txhob cia siab rau unnest_auto() rau siv tsis tu ncua. Lub ntsiab lus yog tias yog tias koj cov ntaub ntawv hloov pauv unnest_auto() tuaj yeem hloov pauv cov ntaub ntawv hloov pauv hloov pauv yog tias nws pib nthuav cov kab ntawv rau hauv kab siv unnest_longer(), tom qab ntawd thaum cov qauv ntawm cov ntaub ntawv nkag tau hloov pauv, cov logic tuaj yeem hloov pauv tau zoo unnest_wider(), thiab siv txoj hauv kev no ua ntu zus tuaj yeem ua rau muaj qhov yuam kev tsis txaus ntseeg.
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title)`; no element has names
#> # A tibble: 60 x 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> 7 Victarion Greyjoy Master of the Iron Victory
#> 8 Will ""
#> 9 Areo Hotah Captain of the Guard at Sunspear
#> 10 Chett ""
#> # … with 50 more rows
Geocoding nrog Google
Tom ntej no, peb yuav saib ntau cov qauv ntawm cov ntaub ntawv tau los ntawm Google qhov kev pabcuam geocoding. Caching cov ntaub ntawv pov thawj yog tawm tsam cov cai ntawm kev ua haujlwm nrog Google maps API, yog li kuv yuav xub sau ib qho yooj yim wrapper nyob ib ncig ntawm API. Uas yog raws li khaws cia Google Maps API tus yuam sij hauv ib puag ncig hloov pauv; Yog tias koj tsis muaj tus yuam sij rau kev ua haujlwm nrog Google Maps API khaws cia hauv koj qhov kev hloov pauv ib puag ncig, cov lej tawg uas tau nthuav tawm hauv ntu no yuav tsis raug tua.
has_key <- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
geocode <- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
url <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
jsonlite::read_json(url)
}
Daim ntawv teev cov haujlwm no rov qab yog qhov nyuaj heev:
Hmoov zoo, peb tuaj yeem daws qhov teeb meem ntawm kev hloov cov ntaub ntawv no rau hauv daim ntawv tabular ib kauj ruam los ntawm kev siv cov haujlwm tidyr. Txhawm rau ua kom txoj haujlwm nyuaj me ntsis thiab muaj tseeb, Kuv yuav pib los ntawm geocoding ob peb lub nroog:
city <- c ( "Houston" , "LA" , "New York" , "Chicago" , "Springfield" ) city_geo <- purrr::map (city, geocode)
Kuv yuav hloov lub resulting result rau hauv tibble, kom yooj yim, kuv yuav ntxiv ib kab nrog rau lub nroog lub npe.
loc <- tibble(city = city, json = city_geo)
loc
#> # A tibble: 5 x 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>
Thawj theem muaj cov khoom xyaw status и result, uas peb tuaj yeem nthuav nrog unnest_wider() :
loc %>%
unnest_wider(json)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OK
loc %>%
unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 x 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [5]> OK
Tam sim no lawv txhua tus muaj tib yam khoom, uas tuaj yeem txheeb xyuas tau siv unnest_wider():
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 x 7
#> city address_componen… formatted_addre… geometry place_id types status
#> <chr> <list> <chr> <list> <chr> <lis> <chr>
#> 1 Houst… <list [4]> Houston, TX, USA <named … ChIJAYWN… <lis… OK
#> 2 LA <list [4]> Los Angeles, CA… <named … ChIJE9on… <lis… OK
#> 3 New Y… <list [3]> New York, NY, U… <named … ChIJOwg_… <lis… OK
#> 4 Chica… <list [4]> Chicago, IL, USA <named … ChIJ7cv0… <lis… OK
#> 5 Sprin… <list [5]> Springfield, MO… <named … ChIJP5jI… <lis… OK
Peb tuaj yeem pom cov kab latitude thiab longitude coordinates ntawm txhua lub nroog los ntawm kev nthuav cov npe geometry:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 x 10
#> city address_compone… formatted_addre… bounds location location_type
#> <chr> <list> <chr> <list> <list> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… <named … APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… <named … APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… <named … APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… <named … APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… <named … APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Thiab ces qhov chaw uas koj yuav tsum tau nthuav location:
loc %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Tseem rov, unnest_auto() ua kom yooj yim rau kev piav qhia ua haujlwm nrog qee qhov kev pheej hmoo uas yuav tshwm sim los ntawm kev hloov cov qauv ntawm cov ntaub ntawv tuaj:
loc %>%
unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 4 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
loc %>%
unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 x 11
#> city address_compone… formatted_addre… bounds lat lng location_type
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr>
#> 1 Hous… <list [4]> Houston, TX, USA <name… 29.8 -95.4 APPROXIMATE
#> 2 LA <list [4]> Los Angeles, CA… <name… 34.1 -118. APPROXIMATE
#> 3 New … <list [3]> New York, NY, U… <name… 40.7 -74.0 APPROXIMATE
#> 4 Chic… <list [4]> Chicago, IL, USA <name… 41.9 -87.6 APPROXIMATE
#> 5 Spri… <list [5]> Springfield, MO… <name… 37.2 -93.3 APPROXIMATE
#> # … with 4 more variables: viewport <list>, place_id <chr>, types <list>,
#> # status <chr>
Los yog siv hoist() rau ntau theem dhia dej ncaj qha mus lat и lng.
loc %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
#> # A tibble: 5 x 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 37.2 -93.3 <named list [2]>
Discography ntawm Sharla Gelfand
Thaum kawg, peb yuav saib cov txheej txheem nyuaj tshaj plaws - discography ntawm Sharla Gelfand. Raws li hauv cov piv txwv saum toj no, peb pib los ntawm kev hloov cov npe mus rau ib kab ntawv cov ntaub ntawv ib leeg, thiab tom qab ntawd txuas ntxiv nws kom txhua qhov sib xyaw ua ke sib cais. Tsis tas li ntawd kuv hloov lub kem date_added rau hnub tim thiab lub sijhawm tsim nyog hauv R.
discs <- tibble(disc = discog) %>%
unnest_wider(disc) %>%
mutate(date_added = as.POSIXct(strptime(date_added, "%Y-%m-%dT%H:%M:%S")))
discs
#> # A tibble: 155 x 5
#> instance_id date_added basic_information id rating
#> <int> <dttm> <list> <int> <int>
#> 1 354823933 2019-02-16 17:48:59 <named list [11]> 7496378 0
#> 2 354092601 2019-02-13 14:13:11 <named list [11]> 4490852 0
#> 3 354091476 2019-02-13 14:07:23 <named list [11]> 9827276 0
#> 4 351244906 2019-02-02 11:39:58 <named list [11]> 9769203 0
#> 5 351244801 2019-02-02 11:39:37 <named list [11]> 7237138 0
#> 6 351052065 2019-02-01 20:40:53 <named list [11]> 13117042 0
#> 7 350315345 2019-01-29 15:48:37 <named list [11]> 7113575 0
#> 8 350315103 2019-01-29 15:47:22 <named list [11]> 10540713 0
#> 9 350314507 2019-01-29 15:44:08 <named list [11]> 11260950 0
#> 10 350314047 2019-01-29 15:41:35 <named list [11]> 11726853 0
#> # … with 145 more rows
Nyob rau theem no, peb tau txais cov ntaub ntawv hais txog thaum twg txhua lub disc tau ntxiv rau Sharla lub discography, tab sis peb tsis pom cov ntaub ntawv hais txog cov discs. Ua li no peb yuav tsum tau nthuav cov kab ke basic_information:
discs %>% unnest_wider(basic_information)
#> Column name `id` must not be duplicated.
#> Use .name_repair to specify repair.
Hmoov tsis zoo, peb yuav tau txais qhov yuam kev, vim ... hauv daim ntawv basic_information muaj ib kem ntawm tib lub npe basic_information. Yog tias qhov kev ua yuam kev no tshwm sim, txhawm rau txiav txim siab sai sai, koj tuaj yeem siv names_repair = "unique":
Qhov teeb meem yog qhov ntawd basic_information rov ua dua kab ntawv id uas tseem khaws cia nyob rau theem sab saum toj, yog li peb tuaj yeem tshem nws yooj yim:
discs %>%
hoist(basic_information,
title = "title",
year = "year",
label = list("labels", 1, "name"),
artist = list("artists", 1, "name")
)
#> # A tibble: 155 x 9
#> instance_id date_added title year label artist
#> <int> <dttm> <chr> <int> <chr> <chr>
#> 1 354823933 2019-02-16 17:48:59 Demo 2015 Tobi… Mollot
#> 2 354092601 2019-02-13 14:13:11 Obse… 2013 La V… Una B…
#> 3 354091476 2019-02-13 14:07:23 I 2017 La V… S.H.I…
#> 4 351244906 2019-02-02 11:39:58 Oído… 2017 La V… Rata …
#> 5 351244801 2019-02-02 11:39:37 A Ca… 2015 Kato… Ivy (…
#> 6 351052065 2019-02-01 20:40:53 Tash… 2019 High… Tashme
#> 7 350315345 2019-01-29 15:48:37 Demo 2014 Mind… Desgr…
#> 8 350315103 2019-01-29 15:47:22 Let … 2015 Not … Phant…
#> 9 350314507 2019-01-29 15:44:08 Sub … 2017 Not … Sub S…
#> 10 350314047 2019-01-29 15:41:35 Demo 2017 Pres… Small…
#> # … with 145 more rows, and 3 more variables: basic_information <list>,
#> # id <int>, rating <int>
Ntawm no kuv ceev nrooj retrieve thawj daim ntawv lo thiab tus kws kos duab lub npe los ntawm kev ntsuas los ntawm kev dhia mus rau hauv daim ntawv teev npe.
Ib txoj hauv kev zoo dua yog los tsim cov rooj sib cais rau tus kws kos duab thiab daim ntawv lo:
Koj tuaj yeem koom nrog lawv rov qab mus rau qhov qub dataset raws li xav tau.
xaus
Rau lub hauv paus ntawm lub tsev qiv ntawv tidyverse suav nrog ntau cov pob khoom muaj txiaj ntsig sib koom ua ke los ntawm kev ua cov ntaub ntawv sib tham.
Hauv kab lus no peb tau tshuaj xyuas tsev neeg ntawm kev ua haujlwm unnest_*(), uas yog tsom rau kev ua haujlwm nrog kev rho tawm cov ntsiab lus los ntawm cov npe nested. Cov pob no muaj ntau lwm yam tseem ceeb uas ua rau nws yooj yim dua los hloov cov ntaub ntawv raws li lub tswv yim Tidy Data.