Hei Habr!
Kudonha kwekupedzisira, Kaggle akaita makwikwi ekurongedza mapikicha akadhonzwa nemaoko, Kurumidza Dhirowa Doodle Recognition, umo, pakati pevamwe, timu yeR-sainzi yakatora chikamu:
Panguva ino hazvina kushanda nekurima menduru, asi ruzivo rwakawanda rwakakosha rwakawanikwa, saka ndinoda kuudza nharaunda pamusoro pezvizhinji zvinofadza uye zvinobatsira zvinhu paKagle uye mubasa remazuva ose. Pakati pemisoro yakakurukurwa: hupenyu hwakaoma pasina OpenCV, JSON parsing (iyi mienzaniso inoongorora kubatanidzwa kweC++ kodhi mune zvinyorwa kana mapakeji muR uchishandisa Rcpp), parameterization yezvinyorwa uye dockerization yemhinduro yekupedzisira. Yese kodhi kubva kune meseji mune fomu yakakodzera kuurayiwa inowanikwa mukati
Zviri Mukati:
Nekugona kurodha data kubva kuCSV kuenda kuMonetDB Kugadzirira mabheti Iterators yekuburitsa mabhechi kubva kudhatabhesi Kusarudza Model Architecture Script parameterization Dockerization yezvinyorwa Kushandisa akawanda maGPU paGoogle Cloud Pane mhedziso
1. Kurodha data kubva kuCSV mudura reMonetDB
Iyo data mumakwikwi aya inopihwa kwete muchimiro chemifananidzo yakagadzirwa, asi muchimiro che340 CSV mafaera (faira rimwe rekirasi yega yega) ine maJSON ane mapoinzi coordinates. Nekubatanidza mapoinzi aya nemitsara, tinowana mufananidzo wekupedzisira unoyera 256x256 pixels. Zvakare parekodhi yega yega pane chikwangwani chinoratidza kana mufananidzo wacho wakanyatso zivikanwa neclassifier yakashandiswa panguva yakaunganidzwa dhata, kodhi ine mavara maviri yenyika yekugara yemunyori wemufananidzo, chiziviso chakasiyana, chidhindo chenguva. uye zita rekirasi rinoenderana nezita refaira. Shanduro yakareruka yedata rekutanga inorema 7.4 GB mudura uye ingangoita makumi maviri GB mushure mekuburitsa, iyo data yakazara mushure mekuburitsa inotora 20 GB. Varongi vakave nechokwadi chekuti mavhezheni ese ari maviri akadhirowa zvakafanana, zvichireva kuti iyo yakazara vhezheni yaive isina basa. Chero zvazvingava, kuchengetedza mamirioni makumi mashanu emifananidzo mumafaira emifananidzo kana muchimiro chezvirongwa zvakabva zvaonekwa sezvisingabatsiri, uye isu takasarudza kubatanidza ese CSV mafaera kubva mudura. train_simplified.zip mune dhatabhesi ine chizvarwa chinotevera chemifananidzo yehukuru hunodiwa "pane nhunzi" yebatch yega yega.
Iyo yakanyatso kuratidzwa sisitimu yakasarudzwa seDBMS MonetDB, kureva kuita kweR sepakeji
con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))
Tichada kugadzira matafura maviri: imwe yedata rese, imwe yeruzivo rwesevhisi nezve akadhawunirodha mafaera (anobatsira kana chimwe chinhu chikatadza uye maitiro acho anofanirwa kutangwazve mushure mekurodha akati wandei mafaera):
Kugadzira matafura
if (!DBI::dbExistsTable(con, "doodles")) {
DBI::dbCreateTable(
con = con,
name = "doodles",
fields = c(
"countrycode" = "char(2)",
"drawing" = "text",
"key_id" = "bigint",
"recognized" = "bool",
"timestamp" = "timestamp",
"word" = "text"
)
)
}
if (!DBI::dbExistsTable(con, "upload_log")) {
DBI::dbCreateTable(
con = con,
name = "upload_log",
fields = c(
"id" = "serial",
"file_name" = "text UNIQUE",
"uploaded" = "bool DEFAULT false"
)
)
}
Nzira yekukurumidza kurodha data mudhatabhesi yaive yekukopa zvakananga CSV mafaera uchishandisa SQL - command COPY OFFSET 2 INTO tablename FROM path USING DELIMITERS ',','n','"' NULL AS '' BEST EFFORT
kupi tablename
- tafura zita uye path
- nzira yefaira. Ndichiri kushanda neiyo archive, zvakaonekwa kuti yakavakirwa-mukati kuita unzip
muR haishande nemazvo nehuwandu hwemafaira kubva mudura, saka takashandisa system unzip
(uchishandisa parameter getOption("unzip")
).
Basa rekunyora kune database
#' @title ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ ΠΈ Π·Π°Π³ΡΡΠ·ΠΊΠ° ΡΠ°ΠΉΠ»ΠΎΠ²
#'
#' @description
#' ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ CSV-ΡΠ°ΠΉΠ»ΠΎΠ² ΠΈΠ· ZIP-Π°ΡΡ
ΠΈΠ²Π° ΠΈ Π·Π°Π³ΡΡΠ·ΠΊΠ° ΠΈΡ
Π² Π±Π°Π·Ρ Π΄Π°Π½Π½ΡΡ
#'
#' @param con ΠΠ±ΡΠ΅ΠΊΡ ΠΏΠΎΠ΄ΠΊΠ»ΡΡΠ΅Π½ΠΈΡ ΠΊ Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
(ΠΊΠ»Π°ΡΡ `MonetDBEmbeddedConnection`).
#' @param tablename ΠΠ°Π·Π²Π°Π½ΠΈΠ΅ ΡΠ°Π±Π»ΠΈΡΡ Π² Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
.
#' @oaram zipfile ΠΡΡΡ ΠΊ ZIP-Π°ΡΡ
ΠΈΠ²Ρ.
#' @oaram filename ΠΠΌΡ ΡΠ°ΠΉΠ»Π° Π²Π½ΡΡΠΈ ZIP-Π°ΡΡ
ΠΈΠ²Π°.
#' @param preprocess Π€ΡΠ½ΠΊΡΠΈΡ ΠΏΡΠ΅Π΄ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ, ΠΊΠΎΡΠΎΡΠ°Ρ Π±ΡΠ΄Π΅Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½Π° ΠΈΠ·Π²Π»Π΅ΡΡΠ½Π½ΠΎΠΌΡ ΡΠ°ΠΉΠ»Ρ.
#' ΠΠΎΠ»ΠΆΠ½Π° ΠΏΡΠΈΠ½ΠΈΠΌΠ°ΡΡ ΠΎΠ΄ΠΈΠ½ Π°ΡΠ³ΡΠΌΠ΅Π½Ρ `data` (ΠΎΠ±ΡΠ΅ΠΊΡ `data.table`).
#'
#' @return `TRUE`.
#'
upload_file <- function(con, tablename, zipfile, filename, preprocess = NULL) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_class(con, "MonetDBEmbeddedConnection")
checkmate::assert_string(tablename)
checkmate::assert_string(filename)
checkmate::assert_true(DBI::dbExistsTable(con, tablename))
checkmate::assert_file_exists(zipfile, access = "r", extension = "zip")
checkmate::assert_function(preprocess, args = c("data"), null.ok = TRUE)
# ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ ΡΠ°ΠΉΠ»Π°
path <- file.path(tempdir(), filename)
unzip(zipfile, files = filename, exdir = tempdir(),
junkpaths = TRUE, unzip = getOption("unzip"))
on.exit(unlink(file.path(path)))
# ΠΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΠΏΡΠ΅Π΄ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ
if (!is.null(preprocess)) {
.data <- data.table::fread(file = path)
.data <- preprocess(data = .data)
data.table::fwrite(x = .data, file = path, append = FALSE)
rm(.data)
}
# ΠΠ°ΠΏΡΠΎΡ ΠΊ ΠΠ Π½Π° ΠΈΠΌΠΏΠΎΡΡ CSV
sql <- sprintf(
"COPY OFFSET 2 INTO %s FROM '%s' USING DELIMITERS ',','n','"' NULL AS '' BEST EFFORT",
tablename, path
)
# ΠΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅ Π·Π°ΠΏΡΠΎΡΠ° ΠΊ ΠΠ
DBI::dbExecute(con, sql)
# ΠΠΎΠ±Π°Π²Π»Π΅Π½ΠΈΠ΅ Π·Π°ΠΏΠΈΡΠΈ ΠΎΠ± ΡΡΠΏΠ΅ΡΠ½ΠΎΠΉ Π·Π°Π³ΡΡΠ·ΠΊΠ΅ Π² ΡΠ»ΡΠΆΠ΅Π±Π½ΡΡ ΡΠ°Π±Π»ΠΈΡΡ
DBI::dbExecute(con, sprintf("INSERT INTO upload_log(file_name, uploaded) VALUES('%s', true)",
filename))
return(invisible(TRUE))
}
Kana iwe uchida kushandura tafura usati wanyora kune database, zvakakwana kuti upfuure mukupokana preprocess
basa rinozoshandura data.
Kodhi yekuteedzera data mudhatabhesi:
Kunyora data kune database
# Π‘ΠΏΠΈΡΠΎΠΊ ΡΠ°ΠΉΠ»ΠΎΠ² Π΄Π»Ρ Π·Π°ΠΏΠΈΡΠΈ
files <- unzip(zipfile, list = TRUE)$Name
# Π‘ΠΏΠΈΡΠΎΠΊ ΠΈΡΠΊΠ»ΡΡΠ΅Π½ΠΈΠΉ, Π΅ΡΠ»ΠΈ ΡΠ°ΡΡΡ ΡΠ°ΠΉΠ»ΠΎΠ² ΡΠΆΠ΅ Π±ΡΠ»Π° Π·Π°Π³ΡΡΠΆΠ΅Π½Π°
to_skip <- DBI::dbGetQuery(con, "SELECT file_name FROM upload_log")[[1L]]
files <- setdiff(files, to_skip)
if (length(files) > 0L) {
# ΠΠ°ΠΏΡΡΠΊΠ°Π΅ΠΌ ΡΠ°ΠΉΠΌΠ΅Ρ
tictoc::tic()
# ΠΡΠΎΠ³ΡΠ΅ΡΡ Π±Π°Ρ
pb <- txtProgressBar(min = 0L, max = length(files), style = 3)
for (i in seq_along(files)) {
upload_file(con = con, tablename = "doodles",
zipfile = zipfile, filename = files[i])
setTxtProgressBar(pb, i)
}
close(pb)
# ΠΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅ΠΌ ΡΠ°ΠΉΠΌΠ΅Ρ
tictoc::toc()
}
# 526.141 sec elapsed - ΠΊΠΎΠΏΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ SSD->SSD
# 558.879 sec elapsed - ΠΊΠΎΠΏΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ USB->SSD
Nguva yekurodha data inogona kusiyana zvichienderana nekumhanya hunhu hwedhiraivha inoshandiswa. Kwatiri, kuverenga nekunyora mukati meSSD imwe kana kubva kune flash drive (source file) kune SSD (DB) inotora isingasviki maminetsi gumi.
Zvinotora mamwe masekonzi mashoma kugadzira koramu ine integer class label uye index column (ORDERED INDEX
) nenhamba dzemutsara dzinozotariswa nadzo pakugadzira mabhechi:
Kugadzira Mamwe Makoramu uye Index
message("Generate lables")
invisible(DBI::dbExecute(con, "ALTER TABLE doodles ADD label_int int"))
invisible(DBI::dbExecute(con, "UPDATE doodles SET label_int = dense_rank() OVER (ORDER BY word) - 1"))
message("Generate row numbers")
invisible(DBI::dbExecute(con, "ALTER TABLE doodles ADD id serial"))
invisible(DBI::dbExecute(con, "CREATE ORDERED INDEX doodles_id_ord_idx ON doodles(id)"))
Kugadzirisa dambudziko rekugadzira batch panhunzi, isu taifanira kuwana iyo yakanyanya kumhanya yekubvisa mitsetse isina kurongeka kubva patafura. doodles
. Nokuda kweizvi takashandisa 3 tricks. Yekutanga yaive yekudzikisa dimensionality yemhando inochengeta iyo yekutarisa ID. Mune yekutanga data set, mhando inodiwa kuchengetedza ID ndeye bigint
, asi nhamba yezvakacherechedzwa inoita kuti zvikwanise kukwana zviziviso zvavo, zvakaenzana nenhamba ye ordinal, murudzi. int
. Kutsvaga kunokurumidza zvikuru munyaya iyi. Chechipiri chaive chekushandisa ORDERED INDEX
- takasvika kune iyi sarudzo zvine simba, tapfuura zvese zviripo PREPARE
nekushandiswa kunotevera kwekutaura kwakagadzirirwa paunenge uchigadzira boka remibvunzo yemhando imwe chete, asi kutaura zvazviri pane mukana kana uchienzaniswa neyakapusa. SELECT
zvakava mukati mehuwandu hwekukanganisa kwenhamba.
Maitiro ekurodha data haadyi anopfuura 450 MB ye RAM. Ndokunge, iyo yakatsanangurwa nzira inobvumidza iwe kufambisa datasets inorema makumi emagigabytes pane angangoita chero bhajeti hardware, kusanganisira imwe single-board zvishandiso, izvo zvinotonhorera.
Chasara kuyera kumhanya kwekutora (random) data uye kuongorora kuyera kana sampling mabheji ehukuru hwakasiyana:
Database benchmark
library(ggplot2)
set.seed(0)
# ΠΠΎΠ΄ΠΊΠ»ΡΡΠ΅Π½ΠΈΠ΅ ΠΊ Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
con <- DBI::dbConnect(MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))
# Π€ΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠΈ Π·Π°ΠΏΡΠΎΡΠ° Π½Π° ΡΡΠΎΡΠΎΠ½Π΅ ΡΠ΅ΡΠ²Π΅ΡΠ°
prep_sql <- function(batch_size) {
sql <- sprintf("PREPARE SELECT id FROM doodles WHERE id IN (%s)",
paste(rep("?", batch_size), collapse = ","))
res <- DBI::dbSendQuery(con, sql)
return(res)
}
# Π€ΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΡΡ
fetch_data <- function(rs, batch_size) {
ids <- sample(seq_len(n), batch_size)
res <- DBI::dbFetch(DBI::dbBind(rs, as.list(ids)))
return(res)
}
# ΠΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π·Π°ΠΌΠ΅ΡΠ°
res_bench <- bench::press(
batch_size = 2^(4:10),
{
rs <- prep_sql(batch_size)
bench::mark(
fetch_data(rs, batch_size),
min_iterations = 50L
)
}
)
# ΠΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π±Π΅Π½ΡΠΌΠ°ΡΠΊΠ°
cols <- c("batch_size", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]
# batch_size min median max `itr/sec` total_time n_itr
# <dbl> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 16 23.6ms 54.02ms 93.43ms 18.8 2.6s 49
# 2 32 38ms 84.83ms 151.55ms 11.4 4.29s 49
# 3 64 63.3ms 175.54ms 248.94ms 5.85 8.54s 50
# 4 128 83.2ms 341.52ms 496.24ms 3.00 16.69s 50
# 5 256 232.8ms 653.21ms 847.44ms 1.58 31.66s 50
# 6 512 784.6ms 1.41s 1.98s 0.740 1.1m 49
# 7 1024 681.7ms 2.72s 4.06s 0.377 2.16m 49
ggplot(res_bench, aes(x = factor(batch_size), y = median, group = 1)) +
geom_point() +
geom_line() +
ylab("median time, s") +
theme_minimal()
DBI::dbDisconnect(con, shutdown = TRUE)
2. Kugadzirira mabheti
Iyo yese batch yekugadzirira maitiro ine anotevera matanho:
- Kuisa akati wandei maJSON ane mavector etambo ane coordes yemapoinzi.
- Kudhirowa mitsetse ine mavara zvichienderana nekurongeka kwemapoinzi pamufananidzo wehukuru hunodiwa (semuenzaniso, 256 Γ 256 kana 128 Γ 128).
- Kushandura mifananidzo inoguma kuita tensor.
Sechikamu chemakwikwi pakati pePython kernels, dambudziko rakagadziriswa kunyanya kushandisa OpenCV. Imwe yeakareruka uye ari pachena analogues muR angaite seizvi:
Kushandisa JSON kune Tensor Shanduko muR
r_process_json_str <- function(json, line.width = 3,
color = TRUE, scale = 1) {
# ΠΠ°ΡΡΠΈΠ½Π³ JSON
coords <- jsonlite::fromJSON(json, simplifyMatrix = FALSE)
tmp <- tempfile()
# Π£Π΄Π°Π»ΡΠ΅ΠΌ Π²ΡΠ΅ΠΌΠ΅Π½Π½ΡΠΉ ΡΠ°ΠΉΠ» ΠΏΠΎ Π·Π°Π²Π΅ΡΡΠ΅Π½ΠΈΡ ΡΡΠ½ΠΊΡΠΈΠΈ
on.exit(unlink(tmp))
png(filename = tmp, width = 256 * scale, height = 256 * scale, pointsize = 1)
# ΠΡΡΡΠΎΠΉ Π³ΡΠ°ΡΠΈΠΊ
plot.new()
# Π Π°Π·ΠΌΠ΅Ρ ΠΎΠΊΠ½Π° Π³ΡΠ°ΡΠΈΠΊΠ°
plot.window(xlim = c(256 * scale, 0), ylim = c(256 * scale, 0))
# Π¦Π²Π΅ΡΠ° Π»ΠΈΠ½ΠΈΠΉ
cols <- if (color) rainbow(length(coords)) else "#000000"
for (i in seq_along(coords)) {
lines(x = coords[[i]][[1]] * scale, y = coords[[i]][[2]] * scale,
col = cols[i], lwd = line.width)
}
dev.off()
# ΠΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π² 3-Ρ
ΠΌΠ΅ΡΠ½ΡΠΉ ΠΌΠ°ΡΡΠΈΠ²
res <- png::readPNG(tmp)
return(res)
}
r_process_json_vector <- function(x, ...) {
res <- lapply(x, r_process_json_str, ...)
# ΠΠ±ΡΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΠ΅ 3-Ρ
ΠΌΠ΅ΡΠ½ΡΡ
ΠΌΠ°ΡΡΠΈΠ²ΠΎΠ² ΠΊΠ°ΡΡΠΈΠ½ΠΎΠΊ Π² 4-Ρ
ΠΌΠ΅ΡΠ½ΡΠΉ Π² ΡΠ΅Π½Π·ΠΎΡ
res <- do.call(abind::abind, c(res, along = 0))
return(res)
}
Dhirowa inoitwa uchishandisa yakajairwa maturusi eR uye yakachengetedzwa kune yechinguva PNG yakachengetwa muRAM (paLinux, yenguva pfupi madhairekitori ari mudhairekitori. /tmp
, yakaiswa mu RAM). Iri faira rinobva raverengwa sematatu-dimensional array ane nhamba kubva 0 kusvika 1. Izvi zvakakosha nekuti imwe yakajairika BMP yaizoverengerwa kuita mbishi array ine hex color codes.
Ngatiedze mhedzisiro:
zip_file <- file.path("data", "train_simplified.zip")
csv_file <- "cat.csv"
unzip(zip_file, files = csv_file, exdir = tempdir(),
junkpaths = TRUE, unzip = getOption("unzip"))
tmp_data <- data.table::fread(file.path(tempdir(), csv_file), sep = ",",
select = "drawing", nrows = 10000)
arr <- r_process_json_str(tmp_data[4, drawing])
dim(arr)
# [1] 256 256 3
plot(magick::image_read(arr))
Iyo batch pachayo ichagadzirwa seinotevera:
res <- r_process_json_vector(tmp_data[1:4, drawing], scale = 0.5)
str(res)
# num [1:4, 1:128, 1:128, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
# - attr(*, "dimnames")=List of 4
# ..$ : NULL
# ..$ : NULL
# ..$ : NULL
# ..$ : NULL
Kuita uku kwaiita sekusina kunaka kwatiri, sezvo kuumbwa kwemabhechi makuru kunotora nguva yakareba zvisina hunhu, uye takasarudza kutora mukana wechiitiko chevatinoshanda navo nekushandisa raibhurari ine simba. OpenCV. Panguva iyoyo pakanga pasina pasuru yakagadzirwa-yakagadzirirwa yeR (hapana ikozvino), saka kuita kushoma kwekuita kwaidiwa kwakanyorwa muC ++ nekubatanidzwa muR kodhi uchishandisa. Rcpp.
Kugadzirisa dambudziko, mapakeji anotevera nemaraibhurari akashandiswa:
-
OpenCV yekushanda nemifananidzo uye mitsara yekudhirowa. Yakashandiswa pre-yakaiswa system maraibhurari uye misoro mafaera, pamwe neinosimba yekubatanidza.
-
xtensor yekushanda nemultidimensional arrays uye tensor. Isu takashandisa emusoro mafaera akabatanidzwa muR pasuru yezita rimwe chete. Iyo raibhurari inobvumidza iwe kushanda neakawanda madhimensional arrays, ese ari mumutsara wakakura uye koramu huru kurongeka.
-
ndjson yekutsanangura JSON. Raibhurari iyi inoshandiswa mu xtensor otomatiki kana iripo muprojekiti.
-
RcppThread yekuronga multi-threaded processing yevector kubva kuJSON. Washandisa iwo musoro mafaira akapihwa nepasuru iyi. Kubva kune mukurumbira RcppParallel Iyo package, pakati pezvimwe zvinhu, ine yakavakirwa-mukati loop yekuvhiringidza michina.
Izvo zvinofanirwa kucherechedzwa kuti xtensor yakazova godsend: mukuwedzera kune chokwadi chekuti ine basa rakawanda uye kuita kwepamusoro, vagadziri vayo vakave vanopindura uye vakapindura mibvunzo nekukasira uye zvakadzama. Nerubatsiro rwavo, zvakagoneka kuita shanduko yeOpenCV matrices kuita xtensor tensors, pamwe nenzira yekubatanidza 3-dimensional image tensor kuita 4-dimensional tensor yeiyo dimension chaiyo (batch pachayo).
Zvishandiso zvekudzidza Rcpp, xtensor uye RcppThread
Kuunganidza mafaera anoshandisa masisitimu mafaera uye ane simba kubatanidza nemaraibhurari akaiswa pane sisitimu, isu takashandisa plugin nzira inoshandiswa mupakeji. Rcpp. Kuti tiwane otomatiki nzira uye mireza, takashandisa yakakurumbira Linux utility pkg-gadziriso.
Kuitwa kweiyo Rcpp plugin yekushandisa iyo OpenCV raibhurari
Rcpp::registerPlugin("opencv", function() {
# ΠΠΎΠ·ΠΌΠΎΠΆΠ½ΡΠ΅ Π½Π°Π·Π²Π°Π½ΠΈΡ ΠΏΠ°ΠΊΠ΅ΡΠ°
pkg_config_name <- c("opencv", "opencv4")
# ΠΠΈΠ½Π°ΡΠ½ΡΠΉ ΡΠ°ΠΉΠ» ΡΡΠΈΠ»ΠΈΡΡ pkg-config
pkg_config_bin <- Sys.which("pkg-config")
# ΠΡΠΎΠ²ΡΠ΅ΠΊΠ° Π½Π°Π»ΠΈΡΠΈΡ ΡΡΠΈΠ»ΠΈΡΡ Π² ΡΠΈΡΡΠ΅ΠΌΠ΅
checkmate::assert_file_exists(pkg_config_bin, access = "x")
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π°Π»ΠΈΡΠΈΡ ΡΠ°ΠΉΠ»Π° Π½Π°ΡΡΡΠΎΠ΅ΠΊ OpenCV Π΄Π»Ρ pkg-config
check <- sapply(pkg_config_name,
function(pkg) system(paste(pkg_config_bin, pkg)))
if (all(check != 0)) {
stop("OpenCV config for the pkg-config not found", call. = FALSE)
}
pkg_config_name <- pkg_config_name[check == 0]
list(env = list(
PKG_CXXFLAGS = system(paste(pkg_config_bin, "--cflags", pkg_config_name),
intern = TRUE),
PKG_LIBS = system(paste(pkg_config_bin, "--libs", pkg_config_name),
intern = TRUE)
))
})
Nekuda kwekushanda kwe plugin, zvinotevera zvakakosha zvichatsiviwa panguva yekuunganidza maitiro:
Rcpp:::.plugins$opencv()$env
# $PKG_CXXFLAGS
# [1] "-I/usr/include/opencv"
#
# $PKG_LIBS
# [1] "-lopencv_shape -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_datasets -lopencv_dpm -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_line_descriptor -lopencv_optflow -lopencv_video -lopencv_plot -lopencv_reg -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_rgbd -lopencv_viz -lopencv_surface_matching -lopencv_text -lopencv_ximgproc -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_xobjdetect -lopencv_objdetect -lopencv_ml -lopencv_xphoto -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_photo -lopencv_imgproc -lopencv_core"
Iyo kodhi yekushandisa yekuisa JSON uye kugadzira batch yekufambisa kune modhi inopihwa pasi pemuparadzi. Kutanga, wedzera dhairekitori yepurojekiti yemunharaunda kutsvaga mafaira emusoro (inodiwa ndjson):
Sys.setenv("PKG_CXXFLAGS" = paste0("-I", normalizePath(file.path("src"))))
Kuitwa kweJSON kune tensor shanduko muC++
// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::plugins(opencv)]]
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::depends(RcppThread)]]
#include <xtensor/xjson.hpp>
#include <xtensor/xadapt.hpp>
#include <xtensor/xview.hpp>
#include <xtensor-r/rtensor.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <Rcpp.h>
#include <RcppThread.h>
// Π‘ΠΈΠ½ΠΎΠ½ΠΈΠΌΡ Π΄Π»Ρ ΡΠΈΠΏΠΎΠ²
using RcppThread::parallelFor;
using json = nlohmann::json;
using points = xt::xtensor<double,2>; // ΠΠ·Π²Π»Π΅ΡΡΠ½Π½ΡΠ΅ ΠΈΠ· JSON ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ ΡΠΎΡΠ΅ΠΊ
using strokes = std::vector<points>; // ΠΠ·Π²Π»Π΅ΡΡΠ½Π½ΡΠ΅ ΠΈΠ· JSON ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ ΡΠΎΡΠ΅ΠΊ
using xtensor3d = xt::xtensor<double, 3>; // Π’Π΅Π½Π·ΠΎΡ Π΄Π»Ρ Ρ
ΡΠ°Π½Π΅Π½ΠΈΡ ΠΌΠ°ΡΡΠΈΡΡ ΠΈΠ·ΠΎΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ
using xtensor4d = xt::xtensor<double, 4>; // Π’Π΅Π½Π·ΠΎΡ Π΄Π»Ρ Ρ
ΡΠ°Π½Π΅Π½ΠΈΡ ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ²Π° ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ
using rtensor3d = xt::rtensor<double, 3>; // ΠΠ±ΡΡΡΠΊΠ° Π΄Π»Ρ ΡΠΊΡΠΏΠΎΡΡΠ° Π² R
using rtensor4d = xt::rtensor<double, 4>; // ΠΠ±ΡΡΡΠΊΠ° Π΄Π»Ρ ΡΠΊΡΠΏΠΎΡΡΠ° Π² R
// Π‘ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΊΠΎΠ½ΡΡΠ°Π½ΡΡ
// Π Π°Π·ΠΌΠ΅Ρ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π² ΠΏΠΈΠΊΡΠ΅Π»ΡΡ
const static int SIZE = 256;
// Π’ΠΈΠΏ Π»ΠΈΠ½ΠΈΠΈ
// Π‘ΠΌ. https://en.wikipedia.org/wiki/Pixel_connectivity#2-dimensional
const static int LINE_TYPE = cv::LINE_4;
// Π’ΠΎΠ»ΡΠΈΠ½Π° Π»ΠΈΠ½ΠΈΠΈ Π² ΠΏΠΈΠΊΡΠ΅Π»ΡΡ
const static int LINE_WIDTH = 3;
// ΠΠ»Π³ΠΎΡΠΈΡΠΌ ΡΠ΅ΡΠ°ΠΉΠ·Π°
// https://docs.opencv.org/3.1.0/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121
const static int RESIZE_TYPE = cv::INTER_LINEAR;
// Π¨Π°Π±Π»ΠΎΠ½ Π΄Π»Ρ ΠΊΠΎΠ½Π²Π΅ΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ OpenCV-ΠΌΠ°ΡΡΠΈΡΡ Π² ΡΠ΅Π½Π·ΠΎΡ
template <typename T, int NCH, typename XT=xt::xtensor<T,3,xt::layout_type::column_major>>
XT to_xt(const cv::Mat_<cv::Vec<T, NCH>>& src) {
// Π Π°Π·ΠΌΠ΅ΡΠ½ΠΎΡΡΡ ΡΠ΅Π»Π΅Π²ΠΎΠ³ΠΎ ΡΠ΅Π½Π·ΠΎΡΠ°
std::vector<int> shape = {src.rows, src.cols, NCH};
// ΠΠ±ΡΠ΅Π΅ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠ»Π΅ΠΌΠ΅Π½ΡΠΎΠ² Π² ΠΌΠ°ΡΡΠΈΠ²Π΅
size_t size = src.total() * NCH;
// ΠΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ cv::Mat Π² xt::xtensor
XT res = xt::adapt((T*) src.data, size, xt::no_ownership(), shape);
return res;
}
// ΠΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ JSON Π² ΡΠΏΠΈΡΠΎΠΊ ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°Ρ ΡΠΎΡΠ΅ΠΊ
strokes parse_json(const std::string& x) {
auto j = json::parse(x);
// Π Π΅Π·ΡΠ»ΡΡΠ°Ρ ΠΏΠ°ΡΡΠΈΠ½Π³Π° Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±ΡΡΡ ΠΌΠ°ΡΡΠΈΠ²ΠΎΠΌ
if (!j.is_array()) {
throw std::runtime_error("'x' must be JSON array.");
}
strokes res;
res.reserve(j.size());
for (const auto& a: j) {
// ΠΠ°ΠΆΠ΄ΡΠΉ ΡΠ»Π΅ΠΌΠ΅Π½Ρ ΠΌΠ°ΡΡΠΈΠ²Π° Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±ΡΡΡ 2-ΠΌΠ΅ΡΠ½ΡΠΌ ΠΌΠ°ΡΡΠΈΠ²ΠΎΠΌ
if (!a.is_array() || a.size() != 2) {
throw std::runtime_error("'x' must include only 2d arrays.");
}
// ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ Π²Π΅ΠΊΡΠΎΡΠ° ΡΠΎΡΠ΅ΠΊ
auto p = a.get<points>();
res.push_back(p);
}
return res;
}
// ΠΡΡΠΈΡΠΎΠ²ΠΊΠ° Π»ΠΈΠ½ΠΈΠΉ
// Π¦Π²Π΅ΡΠ° HSV
cv::Mat ocv_draw_lines(const strokes& x, bool color = true) {
// ΠΡΡ
ΠΎΠ΄Π½ΡΠΉ ΡΠΈΠΏ ΠΌΠ°ΡΡΠΈΡΡ
auto stype = color ? CV_8UC3 : CV_8UC1;
// ΠΡΠΎΠ³ΠΎΠ²ΡΠΉ ΡΠΈΠΏ ΠΌΠ°ΡΡΠΈΡΡ
auto dtype = color ? CV_32FC3 : CV_32FC1;
auto bg = color ? cv::Scalar(0, 0, 255) : cv::Scalar(255);
auto col = color ? cv::Scalar(0, 255, 220) : cv::Scalar(0);
cv::Mat img = cv::Mat(SIZE, SIZE, stype, bg);
// ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π»ΠΈΠ½ΠΈΠΉ
size_t n = x.size();
for (const auto& s: x) {
// ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠΎΡΠ΅ΠΊ Π² Π»ΠΈΠ½ΠΈΠΈ
size_t n_points = s.shape()[1];
for (size_t i = 0; i < n_points - 1; ++i) {
// Π’ΠΎΡΠΊΠ° Π½Π°ΡΠ°Π»Π° ΡΡΡΠΈΡ
Π°
cv::Point from(s(0, i), s(1, i));
// Π’ΠΎΡΠΊΠ° ΠΎΠΊΠΎΠ½ΡΠ°Π½ΠΈΡ ΡΡΡΠΈΡ
Π°
cv::Point to(s(0, i + 1), s(1, i + 1));
// ΠΡΡΠΈΡΠΎΠ²ΠΊΠ° Π»ΠΈΠ½ΠΈΠΈ
cv::line(img, from, to, col, LINE_WIDTH, LINE_TYPE);
}
if (color) {
// ΠΠ΅Π½ΡΠ΅ΠΌ ΡΠ²Π΅Ρ Π»ΠΈΠ½ΠΈΠΈ
col[0] += 180 / n;
}
}
if (color) {
// ΠΠ΅Π½ΡΠ΅ΠΌ ΡΠ²Π΅ΡΠΎΠ²ΠΎΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ Π½Π° RGB
cv::cvtColor(img, img, cv::COLOR_HSV2RGB);
}
// ΠΠ΅Π½ΡΠ΅ΠΌ ΡΠΎΡΠΌΠ°Ρ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΡ Π½Π° float32 Ρ Π΄ΠΈΠ°ΠΏΠ°Π·ΠΎΠ½ΠΎΠΌ [0, 1]
img.convertTo(img, dtype, 1 / 255.0);
return img;
}
// ΠΠ±ΡΠ°Π±ΠΎΡΠΊΠ° JSON ΠΈ ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΈΠ΅ ΡΠ΅Π½Π·ΠΎΡΠ° Ρ Π΄Π°Π½Π½ΡΠΌΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ
xtensor3d process(const std::string& x, double scale = 1.0, bool color = true) {
auto p = parse_json(x);
auto img = ocv_draw_lines(p, color);
if (scale != 1) {
cv::Mat out;
cv::resize(img, out, cv::Size(), scale, scale, RESIZE_TYPE);
cv::swap(img, out);
out.release();
}
xtensor3d arr = color ? to_xt<double,3>(img) : to_xt<double,1>(img);
return arr;
}
// [[Rcpp::export]]
rtensor3d cpp_process_json_str(const std::string& x,
double scale = 1.0,
bool color = true) {
xtensor3d res = process(x, scale, color);
return res;
}
// [[Rcpp::export]]
rtensor4d cpp_process_json_vector(const std::vector<std::string>& x,
double scale = 1.0,
bool color = false) {
size_t n = x.size();
size_t dim = floor(SIZE * scale);
size_t channels = color ? 3 : 1;
xtensor4d res({n, dim, dim, channels});
parallelFor(0, n, [&x, &res, scale, color](int i) {
xtensor3d tmp = process(x[i], scale, color);
auto view = xt::view(res, i, xt::all(), xt::all(), xt::all());
view = tmp;
});
return res;
}
Iyi kodhi inofanira kuiswa mufaira src/cv_xt.cpp
uye unganidza nemurairo Rcpp::sourceCpp(file = "src/cv_xt.cpp", env = .GlobalEnv)
; inodiwawo kubasa nlohmann/json.hpp
kubva
-
to_xt
- templated basa rekushandura mufananidzo matrix (cv::Mat
) kune tensorxt::xtensor
; -
parse_json
- basa racho rinoparadzanisa tambo yeJSON, inobvisa marongerwo emapoinzi, kuaisa muvector; -
ocv_draw_lines
- kubva kune yakaguma vector yemapoinzi, inodhirowa mitsetse yakawanda-mavara; -
process
- inosanganisa mabasa ari pamusoro uye zvakare inowedzera kugona kuyera iyo inoguma mufananidzo; -
cpp_process_json_str
- wrapper pamusoro pebasaprocess
, iyo inotumira kunze mhedzisiro kune R-chinhu (multidimensional array); -
cpp_process_json_vector
- wrapper pamusoro pebasacpp_process_json_str
, iyo inokutendera kuti ugadzirise tambo vector mune yakawanda-yakarukwa maitiro.
Kudhirowa mitsara ine mavara mazhinji, iyo HSV color modhi yakashandiswa, ichiteverwa nekushandurwa kuRGB. Ngatiedze mhedzisiro:
arr <- cpp_process_json_str(tmp_data[4, drawing])
dim(arr)
# [1] 256 256 3
plot(magick::image_read(arr))
Kuenzanisa kwekumhanya kwekuita muR uye C ++
res_bench <- bench::mark(
r_process_json_str(tmp_data[4, drawing], scale = 0.5),
cpp_process_json_str(tmp_data[4, drawing], scale = 0.5),
check = FALSE,
min_iterations = 100
)
# ΠΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π±Π΅Π½ΡΠΌΠ°ΡΠΊΠ°
cols <- c("expression", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]
# expression min median max `itr/sec` total_time n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 r_process_json_str 3.49ms 3.55ms 4.47ms 273. 490ms 134
# 2 cpp_process_json_str 1.94ms 2.02ms 5.32ms 489. 497ms 243
library(ggplot2)
# ΠΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π·Π°ΠΌΠ΅ΡΠ°
res_bench <- bench::press(
batch_size = 2^(4:10),
{
.data <- tmp_data[sample(seq_len(.N), batch_size), drawing]
bench::mark(
r_process_json_vector(.data, scale = 0.5),
cpp_process_json_vector(.data, scale = 0.5),
min_iterations = 50,
check = FALSE
)
}
)
res_bench[, cols]
# expression batch_size min median max `itr/sec` total_time n_itr
# <chr> <dbl> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 r 16 50.61ms 53.34ms 54.82ms 19.1 471.13ms 9
# 2 cpp 16 4.46ms 5.39ms 7.78ms 192. 474.09ms 91
# 3 r 32 105.7ms 109.74ms 212.26ms 7.69 6.5s 50
# 4 cpp 32 7.76ms 10.97ms 15.23ms 95.6 522.78ms 50
# 5 r 64 211.41ms 226.18ms 332.65ms 3.85 12.99s 50
# 6 cpp 64 25.09ms 27.34ms 32.04ms 36.0 1.39s 50
# 7 r 128 534.5ms 627.92ms 659.08ms 1.61 31.03s 50
# 8 cpp 128 56.37ms 58.46ms 66.03ms 16.9 2.95s 50
# 9 r 256 1.15s 1.18s 1.29s 0.851 58.78s 50
# 10 cpp 256 114.97ms 117.39ms 130.09ms 8.45 5.92s 50
# 11 r 512 2.09s 2.15s 2.32s 0.463 1.8m 50
# 12 cpp 512 230.81ms 235.6ms 261.99ms 4.18 11.97s 50
# 13 r 1024 4s 4.22s 4.4s 0.238 3.5m 50
# 14 cpp 1024 410.48ms 431.43ms 462.44ms 2.33 21.45s 50
ggplot(res_bench, aes(x = factor(batch_size), y = median,
group = expression, color = expression)) +
geom_point() +
geom_line() +
ylab("median time, s") +
theme_minimal() +
scale_color_discrete(name = "", labels = c("cpp", "r")) +
theme(legend.position = "bottom")
Sezvauri kuona, kukurumidza kwekuwedzera kwakave kwakakosha, uye hazvigoneke kubata C ++ kodhi nekufananidza R kodhi.
3. Iterators yekuburitsa mabhechi kubva mudhatabhesi
R ine mukurumbira wakanyatsokodzera wekugadzirisa data inokodzera mu RAM, nepo Python inonyanya kuratidzwa ne iterative data processing, ichikutendera iwe kuti uite nyore uye nemasikirwo kuita kunze-kwe-core calculations (kuverenga uchishandisa ndangariro yekunze). Muenzaniso wekare uye wakakodzera kwatiri mumamiriro edambudziko rakatsanangurwa yakadzika neural network yakadzidziswa neiyo gradient descent nzira ine fungidziro yegradient padanho rega rega uchishandisa chikamu chidiki chekutarisa, kana mini-batch.
Dzidziso dzakadzama dzakanyorwa muPython dzine makirasi akakosha anoshandisa iterators zvichienderana nedata: matafura, mapikicha mumaforodha, mabhinari mafomati, nezvimwe. Unogona kushandisa zvakagadzirirwa-zvakagadzirwa sarudzo kana kunyora yako wega kune chaiwo mabasa. MuR tinogona kutora mukana wezvese maficha ePython raibhurari kera ine mabackends ayo akasiyana-siyana uchishandisa pasuru yezita rimwe chete, iro rinoshanda pamusoro pepakiti dzokorora zvakare. Iyo yekupedzisira inofanirwa neyakasiyana refu chinyorwa; haingokubvumiri chete kuti umhanye Python kodhi kubva kuR, asi zvakare inobvumidza iwe kuendesa zvinhu pakati peR nePython zvikamu, uchiita otomatiki ese anodiwa mhando shanduko.
Isu takabvisa kukosha kwekuchengeta data rese muRAM nekushandisa MonetDBLite, ese "neural network" basa richaitwa neiyo yekutanga kodhi muPython, isu tinongofanirwa kunyora iterator pamusoro peiyo data, sezvo pasina chakagadzirira. yemamiriro akadai mune R kana Python. Pane zvinongodiwa zviviri chete pairi: inofanirwa kudzorera mabhechi mune isingaperi loop uye kuchengetedza mamiriro ayo pakati pekudzokorora (iyo yekupedzisira muR inoshandiswa nenzira yakapusa uchishandisa kuvhara). Pakutanga, zvaidiwa kushandura zvakajeka R arrays kuita numpy arrays mukati me iterator, asi iyo yazvino vhezheni yepakeji. kera anozviita pachake.
Iyo iterator yekudzidziswa uye yekusimbisa data yakaita seinotevera:
Iterator yekudzidziswa uye yekusimbisa data
train_generator <- function(db_connection = con,
samples_index,
num_classes = 340,
batch_size = 32,
scale = 1,
color = FALSE,
imagenet_preproc = FALSE) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_class(con, "DBIConnection")
checkmate::assert_integerish(samples_index)
checkmate::assert_count(num_classes)
checkmate::assert_count(batch_size)
checkmate::assert_number(scale, lower = 0.001, upper = 5)
checkmate::assert_flag(color)
checkmate::assert_flag(imagenet_preproc)
# ΠΠ΅ΡΠ΅ΠΌΠ΅ΡΠΈΠ²Π°Π΅ΠΌ, ΡΡΠΎΠ±Ρ Π±ΡΠ°ΡΡ ΠΈ ΡΠ΄Π°Π»ΡΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΈΠ½Π΄Π΅ΠΊΡΡ Π±Π°ΡΡΠ΅ΠΉ ΠΏΠΎ ΠΏΠΎΡΡΠ΄ΠΊΡ
dt <- data.table::data.table(id = sample(samples_index))
# ΠΡΠΎΡΡΠ°Π²Π»ΡΠ΅ΠΌ Π½ΠΎΠΌΠ΅ΡΠ° Π±Π°ΡΡΠ΅ΠΉ
dt[, batch := (.I - 1L) %/% batch_size + 1L]
# ΠΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΡΠΎΠ»ΡΠΊΠΎ ΠΏΠΎΠ»Π½ΡΠ΅ Π±Π°ΡΡΠΈ ΠΈ ΠΈΠ½Π΄Π΅ΠΊΡΠΈΡΡΠ΅ΠΌ
dt <- dt[, if (.N == batch_size) .SD, keyby = batch]
# Π£ΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅ΠΌ ΡΡΡΡΡΠΈΠΊ
i <- 1
# ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π±Π°ΡΡΠ΅ΠΉ
max_i <- dt[, max(batch)]
# ΠΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΡ Π΄Π»Ρ Π²ΡΠ³ΡΡΠ·ΠΊΠΈ
sql <- sprintf(
"PREPARE SELECT drawing, label_int FROM doodles WHERE id IN (%s)",
paste(rep("?", batch_size), collapse = ",")
)
res <- DBI::dbSendQuery(con, sql)
# ΠΠ½Π°Π»ΠΎΠ³ keras::to_categorical
to_categorical <- function(x, num) {
n <- length(x)
m <- numeric(n * num)
m[x * n + seq_len(n)] <- 1
dim(m) <- c(n, num)
return(m)
}
# ΠΠ°ΠΌΡΠΊΠ°Π½ΠΈΠ΅
function() {
# ΠΠ°ΡΠΈΠ½Π°Π΅ΠΌ Π½ΠΎΠ²ΡΡ ΡΠΏΠΎΡ
Ρ
if (i > max_i) {
dt[, id := sample(id)]
data.table::setkey(dt, batch)
# Π‘Π±ΡΠ°ΡΡΠ²Π°Π΅ΠΌ ΡΡΡΡΡΠΈΠΊ
i <<- 1
max_i <<- dt[, max(batch)]
}
# ID Π΄Π»Ρ Π²ΡΠ³ΡΡΠ·ΠΊΠΈ Π΄Π°Π½Π½ΡΡ
batch_ind <- dt[batch == i, id]
# ΠΡΠ³ΡΡΠ·ΠΊΠ° Π΄Π°Π½Π½ΡΡ
batch <- DBI::dbFetch(DBI::dbBind(res, as.list(batch_ind)), n = -1)
# Π£Π²Π΅Π»ΠΈΡΠΈΠ²Π°Π΅ΠΌ ΡΡΡΡΡΠΈΠΊ
i <<- i + 1
# ΠΠ°ΡΡΠΈΠ½Π³ JSON ΠΈ ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° ΠΌΠ°ΡΡΠΈΠ²Π°
batch_x <- cpp_process_json_vector(batch$drawing, scale = scale, color = color)
if (imagenet_preproc) {
# Π¨ΠΊΠ°Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ c ΠΈΠ½ΡΠ΅ΡΠ²Π°Π»Π° [0, 1] Π½Π° ΠΈΠ½ΡΠ΅ΡΠ²Π°Π» [-1, 1]
batch_x <- (batch_x - 0.5) * 2
}
batch_y <- to_categorical(batch$label_int, num_classes)
result <- list(batch_x, batch_y)
return(result)
}
}
Basa racho rinotora sekuisa shanduko ine chinongedzo kune dhatabhesi, nhamba dzemitsara inoshandiswa, nhamba yemakirasi, batch size, chiyero (scale = 1
zvinoenderana nekupa mifananidzo ye256x256 pixels, scale = 0.5
- 128x128 pixels), chiratidzo chemavara (color = FALSE
inotsanangura kupa mu grayscale kana yashandiswa color = TRUE
sitiroko yega yega inodhirowewa muruvara rutsva) uye chinoratidza preprocessing yemanetiweki akafanodzidziswa pa imagenet. Iyo yekupedzisira inodiwa kuitira kuyera pixel values ββkubva panguva [0, 1] kusvika panguva [-1, 1], iyo yakashandiswa pakudzidzisa iyo yakapihwa. kera mienzaniso.
Basa rekunze rine nharo yemhando yekutarisa, tafura data.table
nenhamba dzisina kurongeka dzakasanganiswa kubva samples_index
uye nhamba dzebatch, counter uye huwandu hwehuwandu hwemabhechi, pamwe nekutaura kweSQL kwekuburitsa data kubva kudhatabhesi. Pamusoro pezvo, isu takatsanangura kukurumidza analogue yebasa mukati keras::to_categorical()
. Isu takashandisa rinenge data rese rekudzidziswa, tichisiya hafu muzana kuti isimbiswe, saka epoch saizi yakaganhurwa neparameter. steps_per_epoch
kana adanwa keras::fit_generator()
, uye mamiriro acho if (i > max_i)
yakangoshanda kune yekusimbisa iterator.
Mune basa remukati, mitsara indexes inodzoserwa yeinotevera batch, marekodhi anoburitswa kubva mudhatabhesi ine batch counter inowedzera, JSON parsing (basa. cpp_process_json_vector()
, yakanyorwa muC ++) uye kugadzira mitsara inoenderana nemifananidzo. Ipapo mavheji-anopisa ane mavara ekirasi anogadzirwa, arrays ane pixel values ββuye mavara anosanganiswa kuita runyorwa, inova kukosha kwekudzoka. Kuti tikurumidze basa, takashandisa kusikwa kwe indexes mumatafura data.table
uye kugadziridzwa kuburikidza neiyi link - pasina aya mapakeji "chips" data.table Zvakaoma kufungidzira kushanda zvinobudirira nechero yakakosha data muR.
Mhedzisiro yezviyero zvekumhanya paCore i5 laptop ndeiyi:
Iterator benchmark
library(Rcpp)
library(keras)
library(ggplot2)
source("utils/rcpp.R")
source("utils/keras_iterator.R")
con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))
ind <- seq_len(DBI::dbGetQuery(con, "SELECT count(*) FROM doodles")[[1L]])
num_classes <- DBI::dbGetQuery(con, "SELECT max(label_int) + 1 FROM doodles")[[1L]]
# ΠΠ½Π΄Π΅ΠΊΡΡ Π΄Π»Ρ ΠΎΠ±ΡΡΠ°ΡΡΠ΅ΠΉ Π²ΡΠ±ΠΎΡΠΊΠΈ
train_ind <- sample(ind, floor(length(ind) * 0.995))
# ΠΠ½Π΄Π΅ΠΊΡΡ Π΄Π»Ρ ΠΏΡΠΎΠ²Π΅ΡΠΎΡΠ½ΠΎΠΉ Π²ΡΠ±ΠΎΡΠΊΠΈ
val_ind <- ind[-train_ind]
rm(ind)
# ΠΠΎΡΡΡΠΈΡΠΈΠ΅Π½Ρ ΠΌΠ°ΡΡΡΠ°Π±Π°
scale <- 0.5
# ΠΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π·Π°ΠΌΠ΅ΡΠ°
res_bench <- bench::press(
batch_size = 2^(4:10),
{
it1 <- train_generator(
db_connection = con,
samples_index = train_ind,
num_classes = num_classes,
batch_size = batch_size,
scale = scale
)
bench::mark(
it1(),
min_iterations = 50L
)
}
)
# ΠΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π±Π΅Π½ΡΠΌΠ°ΡΠΊΠ°
cols <- c("batch_size", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]
# batch_size min median max `itr/sec` total_time n_itr
# <dbl> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 16 25ms 64.36ms 92.2ms 15.9 3.09s 49
# 2 32 48.4ms 118.13ms 197.24ms 8.17 5.88s 48
# 3 64 69.3ms 117.93ms 181.14ms 8.57 5.83s 50
# 4 128 157.2ms 240.74ms 503.87ms 3.85 12.71s 49
# 5 256 359.3ms 613.52ms 988.73ms 1.54 30.5s 47
# 6 512 884.7ms 1.53s 2.07s 0.674 1.11m 45
# 7 1024 2.7s 3.83s 5.47s 0.261 2.81m 44
ggplot(res_bench, aes(x = factor(batch_size), y = median, group = 1)) +
geom_point() +
geom_line() +
ylab("median time, s") +
theme_minimal()
DBI::dbDisconnect(con, shutdown = TRUE)
Kana uine huwandu hwakakwana hwe RAM, unogona kumhanyisa zvakanyanya kushanda kwedhatabhesi nekuiendesa kune imwecheteyo RAM (32 GB inokwana basa redu). MuLinux, chikamu chinoiswa nekusarudzika /dev/shm
, inotora inosvika hafu ye RAM. Unogona kuratidza zvimwe nekugadzirisa /etc/fstab
kuti uwane rekodhi se tmpfs /dev/shm tmpfs defaults,size=25g 0 0
. Iva neshuwa kuti reboot uye tarisa mhedzisiro nekumhanyisa rairo df -h
.
Iyo iterator yedata rekuyedza inotaridzika zvakapfava, sezvo dataset yebvunzo inokodzera zvakakwana muRAM:
Iterator yedata rebvunzo
test_generator <- function(dt,
batch_size = 32,
scale = 1,
color = FALSE,
imagenet_preproc = FALSE) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_data_table(dt)
checkmate::assert_count(batch_size)
checkmate::assert_number(scale, lower = 0.001, upper = 5)
checkmate::assert_flag(color)
checkmate::assert_flag(imagenet_preproc)
# ΠΡΠΎΡΡΠ°Π²Π»ΡΠ΅ΠΌ Π½ΠΎΠΌΠ΅ΡΠ° Π±Π°ΡΡΠ΅ΠΉ
dt[, batch := (.I - 1L) %/% batch_size + 1L]
data.table::setkey(dt, batch)
i <- 1
max_i <- dt[, max(batch)]
# ΠΠ°ΠΌΡΠΊΠ°Π½ΠΈΠ΅
function() {
batch_x <- cpp_process_json_vector(dt[batch == i, drawing],
scale = scale, color = color)
if (imagenet_preproc) {
# Π¨ΠΊΠ°Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ c ΠΈΠ½ΡΠ΅ΡΠ²Π°Π»Π° [0, 1] Π½Π° ΠΈΠ½ΡΠ΅ΡΠ²Π°Π» [-1, 1]
batch_x <- (batch_x - 0.5) * 2
}
result <- list(batch_x)
i <<- i + 1
return(result)
}
}
4. Kusarudzwa kwemuenzaniso wezvivakwa
Mavakirwo ekutanga akashandiswa aive (batch, height, width, 3)
, ndiko kuti, nhamba yematanho haigoni kuchinjwa. Iko hakuna muganho wakadaro muPython, saka takamhanya ndokunyora zvedu kuita kwekuvaka uku, tichitevera chinyorwa chepakutanga (pasina kudonhedza kuri mune keras vhezheni):
Mobilenet v1 zvivakwa
library(keras)
top_3_categorical_accuracy <- custom_metric(
name = "top_3_categorical_accuracy",
metric_fn = function(y_true, y_pred) {
metric_top_k_categorical_accuracy(y_true, y_pred, k = 3)
}
)
layer_sep_conv_bn <- function(object,
filters,
alpha = 1,
depth_multiplier = 1,
strides = c(2, 2)) {
# NB! depth_multiplier != resolution multiplier
# https://github.com/keras-team/keras/issues/10349
layer_depthwise_conv_2d(
object = object,
kernel_size = c(3, 3),
strides = strides,
padding = "same",
depth_multiplier = depth_multiplier
) %>%
layer_batch_normalization() %>%
layer_activation_relu() %>%
layer_conv_2d(
filters = filters * alpha,
kernel_size = c(1, 1),
strides = c(1, 1)
) %>%
layer_batch_normalization() %>%
layer_activation_relu()
}
get_mobilenet_v1 <- function(input_shape = c(224, 224, 1),
num_classes = 340,
alpha = 1,
depth_multiplier = 1,
optimizer = optimizer_adam(lr = 0.002),
loss = "categorical_crossentropy",
metrics = c("categorical_crossentropy",
top_3_categorical_accuracy)) {
inputs <- layer_input(shape = input_shape)
outputs <- inputs %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), strides = c(2, 2), padding = "same") %>%
layer_batch_normalization() %>%
layer_activation_relu() %>%
layer_sep_conv_bn(filters = 64, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 128, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 128, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 256, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 256, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 1024, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 1024, strides = c(1, 1)) %>%
layer_global_average_pooling_2d() %>%
layer_dense(units = num_classes) %>%
layer_activation_softmax()
model <- keras_model(
inputs = inputs,
outputs = outputs
)
model %>% compile(
optimizer = optimizer,
loss = loss,
metrics = metrics
)
return(model)
}
Kuipa kweiyi nzira kuri pachena. Ini ndinoda kuyedza akawanda mamodheru, asi zvakapesana, ini handidi kunyora imwe neimwe yekuvakisa nemawoko. Isu takanyimwawo mukana wekushandisa huremu hwemhando dzakambodzidziswa pa imagenet. Semazuva ose, kudzidza magwaro kwakabatsira. Function get_config()
inokubvumira kuti uwane tsananguro yemuenzaniso mune fomu yakakodzera kugadzirisa (base_model_conf$layers
- yenguva dzose R runyorwa), uye basa from_config()
inoita shandurudzo yekudzosera kuchinhu chemuenzaniso:
base_model_conf <- get_config(base_model)
base_model_conf$layers[[1]]$config$batch_input_shape[[4]] <- 1L
base_model <- from_config(base_model_conf)
Iye zvino hazvina kuoma kunyora basa repasi rose kuti uwane chero chinopihwa kera mhando dzine kana dzisina huremu dzakadzidziswa pa imagenet:
Basa rekurodha yakagadzirira-yakagadzirwa zvivakwa
get_model <- function(name = "mobilenet_v2",
input_shape = NULL,
weights = "imagenet",
pooling = "avg",
num_classes = NULL,
optimizer = keras::optimizer_adam(lr = 0.002),
loss = "categorical_crossentropy",
metrics = NULL,
color = TRUE,
compile = FALSE) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_string(name)
checkmate::assert_integerish(input_shape, lower = 1, upper = 256, len = 3)
checkmate::assert_count(num_classes)
checkmate::assert_flag(color)
checkmate::assert_flag(compile)
# ΠΠΎΠ»ΡΡΠ°Π΅ΠΌ ΠΎΠ±ΡΠ΅ΠΊΡ ΠΈΠ· ΠΏΠ°ΠΊΠ΅ΡΠ° keras
model_fun <- get0(paste0("application_", name), envir = asNamespace("keras"))
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π°Π»ΠΈΡΠΈΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ° Π² ΠΏΠ°ΠΊΠ΅ΡΠ΅
if (is.null(model_fun)) {
stop("Model ", shQuote(name), " not found.", call. = FALSE)
}
base_model <- model_fun(
input_shape = input_shape,
include_top = FALSE,
weights = weights,
pooling = pooling
)
# ΠΡΠ»ΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠ΅ Π½Π΅ ΡΠ²Π΅ΡΠ½ΠΎΠ΅, ΠΌΠ΅Π½ΡΠ΅ΠΌ ΡΠ°Π·ΠΌΠ΅ΡΠ½ΠΎΡΡΡ Π²Ρ
ΠΎΠ΄Π°
if (!color) {
base_model_conf <- keras::get_config(base_model)
base_model_conf$layers[[1]]$config$batch_input_shape[[4]] <- 1L
base_model <- keras::from_config(base_model_conf)
}
predictions <- keras::get_layer(base_model, "global_average_pooling2d_1")$output
predictions <- keras::layer_dense(predictions, units = num_classes, activation = "softmax")
model <- keras::keras_model(
inputs = base_model$input,
outputs = predictions
)
if (compile) {
keras::compile(
object = model,
optimizer = optimizer,
loss = loss,
metrics = metrics
)
}
return(model)
}
Paunenge uchishandisa mifananidzo ye-single-channel, hapana uremu hwakafanodzidziswa hunoshandiswa. Izvi zvinogona kugadziriswa: kushandisa basa get_weights()
tora uremu hwemuenzaniso muchimiro cherondedzero yeR arrays, shandura dimension yechinhu chekutanga cherunyorwa urwu (nekutora chiteshi cheruvara rumwe kana kuenzana ese ari matatu), wozoremedza zviremu kudzoka mumuenzaniso nebasa racho. set_weights()
. Hatina kumbowedzera mashandiro aya, nekuti panguva ino zvaive zvatove pachena kuti zvaive zvakanyanya kuita basa nemifananidzo yemavara.
Takaita akawanda ekuedza tichishandisa mobilenet shanduro 1 uye 2, pamwe neresnet34. Zvimwe zvivakwa zvemazuvano zvakaita seSE-ResNeXt zvakaita zvakanaka mumakwikwi aya. Nehurombo, isu takanga tisina kugadzirira-kuitwa kwatinayo, uye isu hatina kunyora zvedu (asi isu tichanyora zvechokwadi).
5. Parameterization yezvinyorwa
Kuti zvive nyore, kodhi yese yekutanga kudzidziswa yakagadzirwa sechinyorwa chimwe chete, parameterized uchishandisa
doc <- '
Usage:
train_nn.R --help
train_nn.R --list-models
train_nn.R [options]
Options:
-h --help Show this message.
-l --list-models List available models.
-m --model=<model> Neural network model name [default: mobilenet_v2].
-b --batch-size=<size> Batch size [default: 32].
-s --scale-factor=<ratio> Scale factor [default: 0.5].
-c --color Use color lines [default: FALSE].
-d --db-dir=<path> Path to database directory [default: Sys.getenv("db_dir")].
-r --validate-ratio=<ratio> Validate sample ratio [default: 0.995].
-n --n-gpu=<number> Number of GPUs [default: 1].
'
args <- docopt::docopt(doc)
Package docopt inomiririra kushandiswa Rscript bin/train_nn.R -m resnet50 -c -d /home/andrey/doodle_db
kana ./bin/train_nn.R -m resnet50 -c -d /home/andrey/doodle_db
, kana faira train_nn.R
inoitwa (uyu murairo uchatanga kudzidzisa modhi resnet50
pamifananidzo ine mavara matatu anoyera 128x128 pixels, dhatabhesi rinofanira kunge riri muforodha. /home/andrey/doodle_db
) Iwe unogona kuwedzera kumhanya yekudzidza, optimizer mhando, uye chero mamwe magadzirirwo akajairika paramita kune iyo rondedzero. Mukati mekugadzirira kubudiswa, zvakazoitika kuti zvivakwa mobilenet_v2
kubva kune yazvino vhezheni kera muR kushandisa
Iyi nzira yakaita kuti zvikwanisike kukurumidzira kuyedza nemhando dzakasiyana zvichienzaniswa neakawanda echinyakare kuvhurwa kwezvinyorwa muRStudio (isu tinocherekedza pasuru yacho seimwe nzira inogoneka.
6. Dockerization yezvinyorwa
Isu takashandisa Docker kuona kutakurika kwenzvimbo yekudzidzira modhi pakati penhengo dzechikwata uye nekukasira kutumirwa mugore. Iwe unogona kutanga kujairana nechishandiso ichi, icho chisina kujairika kune R programmer, ine
Docker inokutendera kuti ugadzire mese mifananidzo yako kubva kutanga uye shandisa mimwe mifananidzo sehwaro hwekugadzira yako. Pakuongorora sarudzo dziripo, takasvika pamhedzisiro yekuti kuisa NVIDIA, CUDA + cuDNN madhiraibhurari uye Python raibhurari chikamu chakajeka chemufananidzo, uye isu takasarudza kutora iyo yepamutemo mufananidzo sehwaro. tensorflow/tensorflow:1.12.0-gpu
, kuwedzera anodiwa R mapakeji ipapo.
Iyo yekupedzisira docker faira yakaita seizvi:
dockerfile
FROM tensorflow/tensorflow:1.12.0-gpu
MAINTAINER Artem Klevtsov <[email protected]>
SHELL ["/bin/bash", "-c"]
ARG LOCALE="en_US.UTF-8"
ARG APT_PKG="libopencv-dev r-base r-base-dev littler"
ARG R_BIN_PKG="futile.logger checkmate data.table rcpp rapidjsonr dbi keras jsonlite curl digest remotes"
ARG R_SRC_PKG="xtensor RcppThread docopt MonetDBLite"
ARG PY_PIP_PKG="keras"
ARG DIRS="/db /app /app/data /app/models /app/logs"
RUN source /etc/os-release &&
echo "deb https://cloud.r-project.org/bin/linux/ubuntu ${UBUNTU_CODENAME}-cran35/" > /etc/apt/sources.list.d/cran35.list &&
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 &&
add-apt-repository -y ppa:marutter/c2d4u3.5 &&
add-apt-repository -y ppa:timsc/opencv-3.4 &&
apt-get update &&
apt-get install -y locales &&
locale-gen ${LOCALE} &&
apt-get install -y --no-install-recommends ${APT_PKG} &&
ln -s /usr/lib/R/site-library/littler/examples/install.r /usr/local/bin/install.r &&
ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r &&
ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r &&
echo 'options(Ncpus = parallel::detectCores())' >> /etc/R/Rprofile.site &&
echo 'options(repos = c(CRAN = "https://cloud.r-project.org"))' >> /etc/R/Rprofile.site &&
apt-get install -y $(printf "r-cran-%s " ${R_BIN_PKG}) &&
install.r ${R_SRC_PKG} &&
pip install ${PY_PIP_PKG} &&
mkdir -p ${DIRS} &&
chmod 777 ${DIRS} &&
rm -rf /tmp/downloaded_packages/ /tmp/*.rds &&
rm -rf /var/lib/apt/lists/*
COPY utils /app/utils
COPY src /app/src
COPY tests /app/tests
COPY bin/*.R /app/
ENV DBDIR="/db"
ENV CUDA_HOME="/usr/local/cuda"
ENV PATH="/app:${PATH}"
WORKDIR /app
VOLUME /db
VOLUME /app
CMD bash
Kuti zvive nyore, mapakeji akashandiswa akaiswa mumhando dzakasiyana; iyo yakawanda yezvinyorwa zvakanyorwa zvinokopwa mukati memidziyo panguva yekuungana. Isu takashandurawo ganda rekuraira kuti /bin/bash
kuitira nyore kushandisa zvemukati /etc/os-release
. Izvi zvakadzivirira kukosha kwekutsanangura OS vhezheni mukodhi.
Pamusoro pezvo, diki bash script rakanyorwa rinokutendera kuti utange mudziyo une mirairo yakasiyana. Semuenzaniso, aya anogona kunge ari magwaro ekudzidzisa neural network akamboiswa mukati memudziyo, kana goko rekuraira rekugadzirisa uye kutarisa kushanda kwemudziyo:
Script yekuvhura mudziyo
#!/bin/sh
DBDIR=${PWD}/db
LOGSDIR=${PWD}/logs
MODELDIR=${PWD}/models
DATADIR=${PWD}/data
ARGS="--runtime=nvidia --rm -v ${DBDIR}:/db -v ${LOGSDIR}:/app/logs -v ${MODELDIR}:/app/models -v ${DATADIR}:/app/data"
if [ -z "$1" ]; then
CMD="Rscript /app/train_nn.R"
elif [ "$1" = "bash" ]; then
ARGS="${ARGS} -ti"
else
CMD="Rscript /app/train_nn.R $@"
fi
docker run ${ARGS} doodles-tf ${CMD}
Kana iyi bash script ichiitwa isina paramita, iyo script inodanwa mukati memudziyo train_nn.R
ine default values; kana iyo yekutanga yekupokana nharo iri "bash", ipapo mudziyo uchatanga kupindirana negomba rekuraira. Mune zvimwe zviitiko zvese, kukosha kwekupokana kunotsiviwa: CMD="Rscript /app/train_nn.R $@"
.
Zvakakosha kucherechedza kuti madhairekitori ane dhata uye dhatabhesi, pamwe nedhairekitori rekuchengetedza mamodheru akadzidziswa, akaiswa mukati memudziyo kubva kune iyo host system, iyo inokutendera iwe kuti uwane mibairo yezvinyorwa pasina manipulations asina kufanira.
7. Kushandisa maGPU akawanda paGoogle Cloud
Chimwe chezvinhu zvemakwikwi chaive data rine ruzha (ona mufananidzo wemusoro, wakakweretwa kubva @Leigh.plt kubva kuODS slack). Mabheji mahombe anobatsira kurwisa izvi, uye mushure mekuyedza paPC ine 1 GPU, takafunga kugona mamodheru ekudzidzisa pane akati wandei maGPU mugore. Yakashandiswa GoogleCloud (dev/shm
.
Chinonyanya kufarirwa ikodhi chidimbu chine chekuita nekushandisa akawanda maGPU. Kutanga, modhi inogadzirwa paCPU uchishandisa maneja wemamiriro ezvinhu, sezvakaita muPython:
with(tensorflow::tf$device("/cpu:0"), {
model_cpu <- get_model(
name = model_name,
input_shape = input_shape,
weights = weights,
metrics =(top_3_categorical_accuracy,
compile = FALSE
)
})
Ipapo iyo isina kunyorwa (iyi yakakosha) modhi inokopwa kune yakapihwa nhamba yeGPUs iripo, uye chete mushure meizvozvo inounganidzwa:
model <- keras::multi_gpu_model(model_cpu, gpus = n_gpu)
keras::compile(
object = model,
optimizer = keras::optimizer_adam(lr = 0.0004),
loss = "categorical_crossentropy",
metrics = c(top_3_categorical_accuracy)
)
Iyo yemhando yepamusoro nzira yekuomesa ese maseru kunze kweiyo yekupedzisira, kudzidzisa iyo yekupedzisira dhizaini, kusunungura uye kudzidzisazve modhi yese kune akati wandei maGPU haigone kuitwa.
Kudzidziswa kwakatariswa pasina kushandiswa. tensorboard, tichizvimisira kurekodha matanda uye mamodheru ekuchengetedza ane mazita anodzidzisa mushure menguva yega yega:
Callbacks
# Π¨Π°Π±Π»ΠΎΠ½ ΠΈΠΌΠ΅Π½ΠΈ ΡΠ°ΠΉΠ»Π° Π»ΠΎΠ³Π°
log_file_tmpl <- file.path("logs", sprintf(
"%s_%d_%dch_%s.csv",
model_name,
dim_size,
channels,
format(Sys.time(), "%Y%m%d%H%M%OS")
))
# Π¨Π°Π±Π»ΠΎΠ½ ΠΈΠΌΠ΅Π½ΠΈ ΡΠ°ΠΉΠ»Π° ΠΌΠΎΠ΄Π΅Π»ΠΈ
model_file_tmpl <- file.path("models", sprintf(
"%s_%d_%dch_{epoch:02d}_{val_loss:.2f}.h5",
model_name,
dim_size,
channels
))
callbacks_list <- list(
keras::callback_csv_logger(
filename = log_file_tmpl
),
keras::callback_early_stopping(
monitor = "val_loss",
min_delta = 1e-4,
patience = 8,
verbose = 1,
mode = "min"
),
keras::callback_reduce_lr_on_plateau(
monitor = "val_loss",
factor = 0.5, # ΡΠΌΠ΅Π½ΡΡΠ°Π΅ΠΌ lr Π² 2 ΡΠ°Π·Π°
patience = 4,
verbose = 1,
min_delta = 1e-4,
mode = "min"
),
keras::callback_model_checkpoint(
filepath = model_file_tmpl,
monitor = "val_loss",
save_best_only = FALSE,
save_weights_only = FALSE,
mode = "min"
)
)
8. Panzvimbo pemhedziso
Matambudziko akati wandei atakasangana nawo haasati akundwa:
- Π² kera hapana chakagadzirira-chakagadzirwa basa rekutsvaga otomatiki iyo yakakwana yekudzidza mwero (analogue
lr_finder
muraibhurari fast.ai); Nekumwe kuedza, zvinokwanisika kuendesa yechitatu-bato kuita kuR, semuenzaniso,izvi ; - semugumisiro wepoindi yapfuura, zvaisaita kuti usarudze kumhanya chaiko kwekudzidzira paunenge uchishandisa akati wandei maGPU;
- pane kushomeka kweazvino neural network architectures, kunyanya ayo akafanodzidziswa pa imagenet;
- hapana mutemo wekutenderera uye kusarura kwemazinga ekudzidza (cosine annealing yaive pakukumbira kwedu
itwa , Ndatendaskydan ).
Ndezvipi zvinhu zvinobatsira zvakadzidzwa kubva mumakwikwi aya:
- Pane zvine simba-yakaderera-simba, unogona kushanda neane hunhu (kazhinji saizi yeRAM) mavhoriyamu edata pasina kurwadziwa. Plastic bag data.table inochengetedza ndangariro nekuda kwekugadziridzwa kwenzvimbo kwematafura, izvo zvinodzivirira kuakopa, uye kana akashandiswa nemazvo, kugona kwaro kunenge nguva dzose kunoratidza kumhanya kwepamusoro pakati pezvishandiso zvese zvatinoziva pamitauro yekunyora. Kuchengetedza dhata mudhatabhesi kunobvumira iwe, muzviitiko zvakawanda, kusafunga zvachose nezve kukosha kwekudzvanya dataset rese mu RAM.
- Anononoka mabasa muR anogona kutsiviwa neanokurumidza muC ++ uchishandisa package Rcpp. Kana kuwedzera kushandisa RcppThread kana RcppParallel, isu tinowana muchinjika-chikuva akawanda-akarukwa mashandisirwo, saka hapana chikonzero chekufananidza kodhi paR level.
- Package Rcpp inogona kushandiswa pasina ruzivo rwakakomba rweC ++, hushoma hunodiwa hunotsanangurwa
pano . Mafaira emusoro ehuwandu hweC-raibhurari inotonhorera senge xtensor inowanika paCRAN, kureva kuti, chivakwa chiri kuumbwa kuti chiitwe mapurojekiti anobatanidza akagadzirira-akagadzirwa akakwira-kuita C++ kodhi muR. Kuwedzera kuve nyore ndeye syntax kuratidza uye static C ++ kodhi kodhi muRStudio. - docopt inokutendera iwe kuti umhanye-ega zvinyorwa zvine parameter. Izvi zvakanakira kushandiswa pane iri kure server, incl. pasi pedocker. MuRStudio, hazvina kunaka kuitisa maawa akawanda ekuedza nekudzidzisa neural network, uye kuisa IDE pane server pachayo haiwanzo ruramiswa.
- Docker inovimbisa kutakurika kwekodhi uye kudzokororwa kwemhedzisiro pakati pevagadziri vane shanduro dzakasiyana dzeOS uye maraibhurari, pamwe nekureruka kwekuita pamaseva. Iwe unogona kuvhura iyo yese pombi yekudzidzisa nekuraira mumwechete.
- Google Cloud inzira yebhajeti-inoshamwaridzika yekuyedza pane inodhura Hardware, asi iwe unofanirwa kusarudza zvigadziriso nekungwarira.
- Kuyera kumhanya kwezvimedu zvekodhi kunobatsira zvakanyanya, kunyanya kana uchibatanidza R uye C ++, uye nepakeji. bench - zvakare nyore kwazvo.
Pakazara chiitiko ichi chaive nemubairo mukuru uye tinoenderera mberi nekushanda kugadzirisa dzimwe dzenyaya dzakasimudzwa.
Source: www.habr.com