α αα α!
ααΆαααΈαααΌαααααΉαααΎαααα»αααααααα
Kaggle ααΆααααα
αααΆαααααα½ααααααααα½αααΎααααΈα
αΆααααααΆααααΌαααΆααααααΌαααααα Quick Draw Doodle Recognition ααααααα»αααααααα»αα’ααααα·ααααΆααΆααααα R αα½ααααα»αααΆαα
αΌααα½αα
ααΎααααααΆαα·αααααΎαααΆαααΆαα½αααΉαααΆαααααΎααα·ααααααααΆααα ααα»αααααααα·αααααααααΆααααααααΆα
αααΎαααααΌαααΆαααα½α ααΌα
αααααααα»αα
ααααααΆαααα ααααα’αααΈα’αααΈααααα½αα±ααα
αΆααα’αΆαααααα αα·αααΆααααααααααααα»ααα½αα
ααα½ααα
ααΎ Kagle αα·ααααα»αααΆαααΆααααα
αΆαααααα αααα»αα
αααααααααΆααααααααΆααα·ααΆααααΆα ααΈαα·αααααΆααααααααΆα α’αΌαααΈαααΈαααΈαααΆαααα JSON (α§ααΆα αααααΆααααααα·αα·αααααΎαααΆααα½ααααα
αΌαααΌα C++ αα
αααα»αααααααΈα α¬αααα
αααααα»α R αααααααΎ Rcpp), ααΆαααααααααΆαααΆααααααααααααααΈα αα·α dockerization αααααααααααΆαα
α»ααααααα ααΌαααΆααα’ααααΈααΆααααα»ααααααααααααααααααΆααααΆαααααα·ααααα·ααΊα’αΆα
ααααΆααα
αααα»α
ααΆαα·ααΆ:
αααα»ααα·ααααααααΈ CSV αα αααα»α MonetDB ααααααααααααα·αααααΆα αααα»ααααα αααΆα α Iterators αααααΆαα unloading batches ααΈ database ααΆαααααΎαααΎαααααΆααααααααααααΌ ααΆααααααααααααΈα Dockerization ααααααααΈα ααΆαααααΎααααΆαα GPU ααΆα αααΎααα ααΎ Google Cloud αααα½αα±ααααΆαααααα·ααααΆααα½α
1. αααα»ααα·ααααααααΈ CSV αααΆαααΆαααααα·αααααΆααα αααα»αααΌαααααΆααα·αααααα MonetDB
αα·αααααααα αααα»αααΆαααααα½ααααααααααααααΌαααΆααααααααΌααα·αααααααα»αααααααααΆααΌαααΆααααααααααα½α ααΆααααα ααααα ααα»ααααααΆααααααα―αααΆα CSV α ααα½α 340 (α―αααΆααα½ααααααΆααααααΆααααΈαα½αα) αααααΆα JSONs ααΆαα½αα ααα»α ααΌα’αααααα αααααααΆααα ααα»α ααΆαααααααΆαα½ααααααΆαα ααΎαααα½αααΆαααΌαααΆαα α»αααααααααααΆαα 256x256 ααΈααααα ααααααααααΆαααααααααααΆααΈαα½ααααΆαααααΆααα½αααααααα αΆαααΆααΌαααΆαααααΌαααΆαααα½αααααΆααααααΉαααααΌααααα’αααα αΆααααααΆαααααααΆαααααΎαα ααααααα»ααα·ααααααααααΌαααΆααααααΌα αααααΌαααΈαα’αααααααααααααααααααα ααααα’ααααα·ααααααΌαααΆα α’ααααααααΆααααα½αααα ααααΆαααααααΆα αα·ααααααααααΆαααααααααΌαααΉααααααα―αααΆαα ααααααΆαααααααα·ααααααααΎαααΆααααααα 7.4 GB αα αααα»αααααααΆα α αΎααααα αα 20 GB αααααΆααααΈαααααΆ αα·αααααααααααααααααΆααααΈααΆααααααΆαα 240 GB α α’ααααααα αααΆαααΆααΆααΆ ααααααΆααααΈααααααα·αα‘αΎααα·αααΌαααααΌαααΌα ααααΆ αααααΆααααααΆααααααααααααΊαααααααΌαααΆαααα αααα αααα»αααααΈααΆααααα ααΆααααααΆαα»αααΌαααΆαα ααα½α 50 ααΆααα αααα»αα―αααΆαααααΆα ααα·α α¬αααα»αααααααααΆα’αΆααααααΆααααααΌαααΆαα αΆαααα»αααΆαα·αααΆαααα αααα α αΎαααΎαααΆααααααα α α·ααααααα αΌαα―αααΆα CSV ααΆααα’ααααΈααααααΆα train_simplified.zip αα αααα»αααΌαααααΆααα·ααααααααΆαα½αααΉαααΌαααΆαααααΆαααααααΆααααααα ααααααααΌαααΆα "αα ααΎααΆαα ααα αΎα" αααααΆααααΆα αααΈαα½ααα
αααααααααααααΆααααααΆαααααΆαα
αααΆααααααΌαααΆαααααΎαααΎαααΆ DBMS MonetDBαααααΊααΆαα’αα»αααααααααΆαα R ααΆαααα
αααα½αα
con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))
ααΎαααΉαααααΌαααΆααααααΎαααΆααΆαααΈαα αα½ααααααΆαααα·ααααααααΆααα’αα αα½αααααααααΆααααααααΆαααααΆααααα’αααΈα―αααΆααααααΆαααΆααα (ααΆαααααααααααααα·αααΎααΆαα’αααΈαα»α α αΎαααααΎαααΆαααααΌααααααααααααΆααααΈααΆαααα―αααΆαααΆα αααΎα)α
ααΆααααααΎαααΆααΆα
if (!DBI::dbExistsTable(con, "doodles")) {
DBI::dbCreateTable(
con = con,
name = "doodles",
fields = c(
"countrycode" = "char(2)",
"drawing" = "text",
"key_id" = "bigint",
"recognized" = "bool",
"timestamp" = "timestamp",
"word" = "text"
)
)
}
if (!DBI::dbExistsTable(con, "upload_log")) {
DBI::dbCreateTable(
con = con,
name = "upload_log",
fields = c(
"id" = "serial",
"file_name" = "text UNIQUE",
"uploaded" = "bool DEFAULT false"
)
)
}
αααααααΆαααΏααααα»ααααα»αααΆααααα»ααα·αααααααα
αααα»αααΌαααααΆααα·ααααααααΊα
ααααα―αααΆα CSV αααααααΆαααααααααΎααΆααααααααΆ SQL COPY OFFSET 2 INTO tablename FROM path USING DELIMITERS ',','n','"' NULL AS '' BEST EFFORT
αααααΆαααααα tablename
- αααααααΆααΆααα·α path
- ααααΌααα
ααΆααα―αααΆαα αααααααααα»αααααΎααΆαααΆαα½αααααααΆα ααΆααααΌαααΆαααααααΎαααΆααΆαα’αα»αααααααααααΆαααααΆααααααΆαα½α unzip
αα
αααα»α R αα·αααααΎαααΆαααααΉαααααΌαααΆαα½αα―αααΆααα½αα
ααα½αααΈααααααΆα ααΌα
ααααααΎαααΆαααααΎαααααααα unzip
(αααααααΎαααΆαααΆαααααα getOption("unzip")
).
αα»αααΆααααααΆααααααααα ααΌαααααΆααα·αααααα
#' @title ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ ΠΈ Π·Π°Π³ΡΡΠ·ΠΊΠ° ΡΠ°ΠΉΠ»ΠΎΠ²
#'
#' @description
#' ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ CSV-ΡΠ°ΠΉΠ»ΠΎΠ² ΠΈΠ· ZIP-Π°ΡΡ
ΠΈΠ²Π° ΠΈ Π·Π°Π³ΡΡΠ·ΠΊΠ° ΠΈΡ
Π² Π±Π°Π·Ρ Π΄Π°Π½Π½ΡΡ
#'
#' @param con ΠΠ±ΡΠ΅ΠΊΡ ΠΏΠΎΠ΄ΠΊΠ»ΡΡΠ΅Π½ΠΈΡ ΠΊ Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
(ΠΊΠ»Π°ΡΡ `MonetDBEmbeddedConnection`).
#' @param tablename ΠΠ°Π·Π²Π°Π½ΠΈΠ΅ ΡΠ°Π±Π»ΠΈΡΡ Π² Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
.
#' @oaram zipfile ΠΡΡΡ ΠΊ ZIP-Π°ΡΡ
ΠΈΠ²Ρ.
#' @oaram filename ΠΠΌΡ ΡΠ°ΠΉΠ»Π° Π²Π½ΡΡΠΈ ZIP-Π°ΡΡ
ΠΈΠ²Π°.
#' @param preprocess Π€ΡΠ½ΠΊΡΠΈΡ ΠΏΡΠ΅Π΄ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ, ΠΊΠΎΡΠΎΡΠ°Ρ Π±ΡΠ΄Π΅Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½Π° ΠΈΠ·Π²Π»Π΅ΡΡΠ½Π½ΠΎΠΌΡ ΡΠ°ΠΉΠ»Ρ.
#' ΠΠΎΠ»ΠΆΠ½Π° ΠΏΡΠΈΠ½ΠΈΠΌΠ°ΡΡ ΠΎΠ΄ΠΈΠ½ Π°ΡΠ³ΡΠΌΠ΅Π½Ρ `data` (ΠΎΠ±ΡΠ΅ΠΊΡ `data.table`).
#'
#' @return `TRUE`.
#'
upload_file <- function(con, tablename, zipfile, filename, preprocess = NULL) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_class(con, "MonetDBEmbeddedConnection")
checkmate::assert_string(tablename)
checkmate::assert_string(filename)
checkmate::assert_true(DBI::dbExistsTable(con, tablename))
checkmate::assert_file_exists(zipfile, access = "r", extension = "zip")
checkmate::assert_function(preprocess, args = c("data"), null.ok = TRUE)
# ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ ΡΠ°ΠΉΠ»Π°
path <- file.path(tempdir(), filename)
unzip(zipfile, files = filename, exdir = tempdir(),
junkpaths = TRUE, unzip = getOption("unzip"))
on.exit(unlink(file.path(path)))
# ΠΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌ ΡΡΠ½ΠΊΡΠΈΡ ΠΏΡΠ΅Π΄ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ
if (!is.null(preprocess)) {
.data <- data.table::fread(file = path)
.data <- preprocess(data = .data)
data.table::fwrite(x = .data, file = path, append = FALSE)
rm(.data)
}
# ΠΠ°ΠΏΡΠΎΡ ΠΊ ΠΠ Π½Π° ΠΈΠΌΠΏΠΎΡΡ CSV
sql <- sprintf(
"COPY OFFSET 2 INTO %s FROM '%s' USING DELIMITERS ',','n','"' NULL AS '' BEST EFFORT",
tablename, path
)
# ΠΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅ Π·Π°ΠΏΡΠΎΡΠ° ΠΊ ΠΠ
DBI::dbExecute(con, sql)
# ΠΠΎΠ±Π°Π²Π»Π΅Π½ΠΈΠ΅ Π·Π°ΠΏΠΈΡΠΈ ΠΎΠ± ΡΡΠΏΠ΅ΡΠ½ΠΎΠΉ Π·Π°Π³ΡΡΠ·ΠΊΠ΅ Π² ΡΠ»ΡΠΆΠ΅Π±Π½ΡΡ ΡΠ°Π±Π»ΠΈΡΡ
DBI::dbExecute(con, sprintf("INSERT INTO upload_log(file_name, uploaded) VALUES('%s', true)",
filename))
return(invisible(TRUE))
}
ααααα·αααΎα’αααααααΌαααΆααααααααααΆααΆααα»αααΉααααααααΆαα
ααΌαααααΆααα·αααααα ααΆαααααααααΆαααααα»αααΆαααααααΆαααααα»αα’αΆαα»ααααα preprocess
αα»αααΆααααααΉαααααααααα·ααααααα
ααΌααααααΆαααααα»ααα·ααααααααΆαααααααααΆαααα αααα»αααΌαααααΆααα·ααααααα
ααΆαααααααα·αααααααα ααΌαααααΆααα·αααααα
# Π‘ΠΏΠΈΡΠΎΠΊ ΡΠ°ΠΉΠ»ΠΎΠ² Π΄Π»Ρ Π·Π°ΠΏΠΈΡΠΈ
files <- unzip(zipfile, list = TRUE)$Name
# Π‘ΠΏΠΈΡΠΎΠΊ ΠΈΡΠΊΠ»ΡΡΠ΅Π½ΠΈΠΉ, Π΅ΡΠ»ΠΈ ΡΠ°ΡΡΡ ΡΠ°ΠΉΠ»ΠΎΠ² ΡΠΆΠ΅ Π±ΡΠ»Π° Π·Π°Π³ΡΡΠΆΠ΅Π½Π°
to_skip <- DBI::dbGetQuery(con, "SELECT file_name FROM upload_log")[[1L]]
files <- setdiff(files, to_skip)
if (length(files) > 0L) {
# ΠΠ°ΠΏΡΡΠΊΠ°Π΅ΠΌ ΡΠ°ΠΉΠΌΠ΅Ρ
tictoc::tic()
# ΠΡΠΎΠ³ΡΠ΅ΡΡ Π±Π°Ρ
pb <- txtProgressBar(min = 0L, max = length(files), style = 3)
for (i in seq_along(files)) {
upload_file(con = con, tablename = "doodles",
zipfile = zipfile, filename = files[i])
setTxtProgressBar(pb, i)
}
close(pb)
# ΠΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅ΠΌ ΡΠ°ΠΉΠΌΠ΅Ρ
tictoc::toc()
}
# 526.141 sec elapsed - ΠΊΠΎΠΏΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ SSD->SSD
# 558.879 sec elapsed - ΠΊΠΎΠΏΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ USB->SSD
αααααααΆαααα»ααα·ααααααα’αΆα αααααααα½αα’αΆαααααααΎααααααααααΏαααααααΆααααααΆαααααΎα αααα»αααααΈααααααΎα ααΆαα’αΆα αα·αααααααααα»α SSD αα½α α¬ααΈ flash drive (α―αααΆαααααα) αα SSD (DB) α αααΆαααααα·α ααΆα 10 ααΆααΈα
ααΆααααΌαααΆααααααΈαααΈαα·ααΆααΈαααααΎααααΈαααααΎααα½ααααααααΆαααααΆαααααΆααα
ααα½αααα αα·ααα½ααααα·αα·αααα (ORDERED INDEX
) ααΆαα½αβααΉαβαααβαααααΆααβαααβααΆαβααααααβααΉαβααααΌαβααΆαβααβααααΌβααΆαβαααβαααααΎαβααααα»αβα
ααΆααααααΎααα½ααα αα·ααα·αα·αααααααααα
message("Generate lables")
invisible(DBI::dbExecute(con, "ALTER TABLE doodles ADD label_int int"))
invisible(DBI::dbExecute(con, "UPDATE doodles SET label_int = dense_rank() OVER (ORDER BY word) - 1"))
message("Generate row numbers")
invisible(DBI::dbExecute(con, "ALTER TABLE doodles ADD id serial"))
invisible(DBI::dbExecute(con, "CREATE ORDERED INDEX doodles_id_ord_idx ON doodles(id)"))
ααΎααααΈαααααααΆααααα αΆααααΆααααααΎαααααα»αααααΆαα ααΎαααααΌααααααα
ααΆαααΌαααααΏαα’αα·ααααΆααααΆαααααααα½ααααα
αααααα
ααααΈααΆααΆαα doodles
. αααααΆααααΏααααααΎαααΆαααααΎαααα·α
3 α ααΈαα½αααΊααΆααααααααα·ααΆαααααααααααααααααααΆαα»αααααααααΆααααααααα αα
αααα»ααααα»ααα·ααααααααΎα αααααααααααααΌαααΆαααΎααααΈαααααΆαα»αααααααααΆααααΊ bigint
ααα»ααααα
ααα½αααααΆαααααααααααΎα±ααααΆα’αΆα
ααααΎαα
ααΆαααΎααααΈα±ααααααΉαααΆααααααα’ααααααααΆααααααα½ααα ααααΎααΉαα
ααα½αααααααΆαα
αααα»ααααααα int
. ααΆααααααααααΊααΏαααΆααααα»αααααΈαααα αααα·α
ααΈααΈαααΊααααΌαααααΎ ORDERED INDEX
- ααΎαααΆαααΆααα
αααααΆααααααα
α
α·ααααααααΆααααααα αααααΆαααααααΆααααααααααααααΆαααΆααα’ααα PREPARE
ααΆαα½αααΉαααΆαααααΎααααΆααααΆαααααααααΆαααααααααααααααΆααααα
ααα
ααααααααΎαααααα»ααααααα½αααααααααααΌα
ααααΆ ααα»ααααααΆααα·αααΆααΆαα’ααααααααααααα½ααααα»αααΆαααααααααααΆαα½αααΆαααααα½αα SELECT
ααΆαααααααααΆαααΆαααα·ααα
αααα»ααα½αααααα α»ααααα·αα·α
ααααΎαααΆαααααΆααααα»αα‘αΎααα·ααααααααααΎααααΆαα RAM αα·αααΎαααΈ 450 MB α αααααΊααΆαα·ααΈααΆααααααααααΆααα·αααααΆα’αα»ααααΆαα±ααα’αααααααΆααααΈαααα»ααα·αααααααααααΆαααααααααΆαααα·αααΈα αααΆαααα ααΎααααΎαααααααααααααααΉαααα·ααΆ αα½αααΆααα§αααααααααααααα½αααααα ααααα·αααΆα‘αΌαααΆααα
α’αααΈααααα ααααααααΊααΎααααΈααΆααααααΏαααααΆαααΆααααα·αααααα (α ααααα) αα·αααΆααααααααΆαααααΎααΆαααααααΆααα αααααααΆαααααΌαααααΆαααα ααα»ααααααΆα
αααααααΆαααΌαααααΆααα·αααααα
library(ggplot2)
set.seed(0)
# ΠΠΎΠ΄ΠΊΠ»ΡΡΠ΅Π½ΠΈΠ΅ ΠΊ Π±Π°Π·Π΅ Π΄Π°Π½Π½ΡΡ
con <- DBI::dbConnect(MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))
# Π€ΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠΈ Π·Π°ΠΏΡΠΎΡΠ° Π½Π° ΡΡΠΎΡΠΎΠ½Π΅ ΡΠ΅ΡΠ²Π΅ΡΠ°
prep_sql <- function(batch_size) {
sql <- sprintf("PREPARE SELECT id FROM doodles WHERE id IN (%s)",
paste(rep("?", batch_size), collapse = ","))
res <- DBI::dbSendQuery(con, sql)
return(res)
}
# Π€ΡΠ½ΠΊΡΠΈΡ Π΄Π»Ρ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ Π΄Π°Π½Π½ΡΡ
fetch_data <- function(rs, batch_size) {
ids <- sample(seq_len(n), batch_size)
res <- DBI::dbFetch(DBI::dbBind(rs, as.list(ids)))
return(res)
}
# ΠΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π·Π°ΠΌΠ΅ΡΠ°
res_bench <- bench::press(
batch_size = 2^(4:10),
{
rs <- prep_sql(batch_size)
bench::mark(
fetch_data(rs, batch_size),
min_iterations = 50L
)
}
)
# ΠΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π±Π΅Π½ΡΠΌΠ°ΡΠΊΠ°
cols <- c("batch_size", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]
# batch_size min median max `itr/sec` total_time n_itr
# <dbl> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 16 23.6ms 54.02ms 93.43ms 18.8 2.6s 49
# 2 32 38ms 84.83ms 151.55ms 11.4 4.29s 49
# 3 64 63.3ms 175.54ms 248.94ms 5.85 8.54s 50
# 4 128 83.2ms 341.52ms 496.24ms 3.00 16.69s 50
# 5 256 232.8ms 653.21ms 847.44ms 1.58 31.66s 50
# 6 512 784.6ms 1.41s 1.98s 0.740 1.1m 49
# 7 1024 681.7ms 2.72s 4.06s 0.377 2.16m 49
ggplot(res_bench, aes(x = factor(batch_size), y = median, group = 1)) +
geom_point() +
geom_line() +
ylab("median time, s") +
theme_minimal()
DBI::dbDisconnect(con, shutdown = TRUE)
2. ααΆααααα αααΆα α
ααααΎαααΆααααα αααααα»αααΆααααΌαααΆαααα αΆαααΌα ααΆααααααα
- ααΆαααα JSON ααΆα αααΎααααααΆαααα·α αααααααααα’ααααααΆαα½αααΉαααΌα’αααααααα ααα»α α
- ααΌααααααΆααααααααααα’ααααΎααΌα’αααααααα ααα»α αα ααΎααΌαααΆαααααα ααααααααΌαααΆα (α§ααΆα ααα 256 Γ 256 α¬ 128 Γ 128) α
- ααΆααααααααααΌαααΆααααααααα ααΆ tensor α
ααΆααααααα½αααααΆαααααα½ααααααααααα»αα ααααααΊααα Python αααα αΆααααΌαααΆααααααααΆαααΆα αααααααααααΎ α’αΌαααΈαααΈαααΈα. αα½ααα analogues ααΆαααααααα»ααα·αααΆααααααααααα»ααα αααα»α R ααΉαααΎααα ααΌα ααα:
ααΆαα’αα»αααα JSON αα ααΆααΆαααααααα Tensor αααα»α R
r_process_json_str <- function(json, line.width = 3,
color = TRUE, scale = 1) {
# ΠΠ°ΡΡΠΈΠ½Π³ JSON
coords <- jsonlite::fromJSON(json, simplifyMatrix = FALSE)
tmp <- tempfile()
# Π£Π΄Π°Π»ΡΠ΅ΠΌ Π²ΡΠ΅ΠΌΠ΅Π½Π½ΡΠΉ ΡΠ°ΠΉΠ» ΠΏΠΎ Π·Π°Π²Π΅ΡΡΠ΅Π½ΠΈΡ ΡΡΠ½ΠΊΡΠΈΠΈ
on.exit(unlink(tmp))
png(filename = tmp, width = 256 * scale, height = 256 * scale, pointsize = 1)
# ΠΡΡΡΠΎΠΉ Π³ΡΠ°ΡΠΈΠΊ
plot.new()
# Π Π°Π·ΠΌΠ΅Ρ ΠΎΠΊΠ½Π° Π³ΡΠ°ΡΠΈΠΊΠ°
plot.window(xlim = c(256 * scale, 0), ylim = c(256 * scale, 0))
# Π¦Π²Π΅ΡΠ° Π»ΠΈΠ½ΠΈΠΉ
cols <- if (color) rainbow(length(coords)) else "#000000"
for (i in seq_along(coords)) {
lines(x = coords[[i]][[1]] * scale, y = coords[[i]][[2]] * scale,
col = cols[i], lwd = line.width)
}
dev.off()
# ΠΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π² 3-Ρ
ΠΌΠ΅ΡΠ½ΡΠΉ ΠΌΠ°ΡΡΠΈΠ²
res <- png::readPNG(tmp)
return(res)
}
r_process_json_vector <- function(x, ...) {
res <- lapply(x, r_process_json_str, ...)
# ΠΠ±ΡΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΠ΅ 3-Ρ
ΠΌΠ΅ΡΠ½ΡΡ
ΠΌΠ°ΡΡΠΈΠ²ΠΎΠ² ΠΊΠ°ΡΡΠΈΠ½ΠΎΠΊ Π² 4-Ρ
ΠΌΠ΅ΡΠ½ΡΠΉ Π² ΡΠ΅Π½Π·ΠΎΡ
res <- do.call(abind::abind, c(res, along = 0))
return(res)
}
ααΆαααΌαααααΌαααΆαα’αα»αααααααααααΎα§ααααα R αααααααΆα α αΎααααααΆαα»ααααα»α PNG ααααααα’αΆααααααααααααΆαα»ααααα»α RAM (αα
ααΎ Linux αα R ααααααα’αΆααααααΆαααΈααΆαααα
αααα»ααα /tmp
ααα‘αΎααααα»α RAM) α αααααΆααααα―αααΆααααααααΌαααΆαα’αΆαααΆα’αΆααααΈαα·ααΆααααααααΆααααα
αΆααααΈ 0 ααα 1α αααααΊααααΆααααααα BMP ααΆααααααΆαααΉαααααΌαααΆαα’αΆααα
αααα»αα’αΆαααα
αααααΆααααααΌααααααααααααααΆααα½αα
αααααΆαααααααααααα
zip_file <- file.path("data", "train_simplified.zip")
csv_file <- "cat.csv"
unzip(zip_file, files = csv_file, exdir = tempdir(),
junkpaths = TRUE, unzip = getOption("unzip"))
tmp_data <- data.table::fread(file.path(tempdir(), csv_file), sep = ",",
select = "drawing", nrows = 10000)
arr <- r_process_json_str(tmp_data[4, drawing])
dim(arr)
# [1] 256 256 3
plot(magick::image_read(arr))
ααααα»ααααα½αααΆααΉαααααΌαααΆααααααΎαα‘αΎαααΌα ααΆααααααα
res <- r_process_json_vector(tmp_data[1:4, drawing], scale = 0.5)
str(res)
# num [1:4, 1:128, 1:128, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
# - attr(*, "dimnames")=List of 4
# ..$ : NULL
# ..$ : NULL
# ..$ : NULL
# ..$ : NULL
ααΆαα’αα»αααααααα αΆααααΌα ααΆαααααααααα»ααααααΆααααΎα αααααΆαααΆααααααΎαααααα»ααααα αααΆααααααΌααα·ααααααα α αΎαααΎαααΆααααααα α α·αααααΆαααα’αααααααααααααΈαααα·ααααααααααα ααΆααΈααααααΎααααααααΎαααααΆαααααααΆαα₯αααα·ααα α’αΌαααΈαααΈαααΈα. αα αααααααα·αααΆααααα αααααααααααα½α ααΆααααα αααααΆαα R (αα·αααΆαα₯α‘αΌαααααα) ααΌα ααααααΆαα’αα»αααααα·α αα½α αααα»αααΆααααααααΌαααΆαααααΌαααΆαααααααα αααα»α C ++ ααΆαα½αααΉαααΆααααα αΌααα αααα»αααΌα R αααααααΎ Rcpp.
ααΎααααΈαααααααΆααααα αΆ αααα αα αα·ααααααΆαααααΆααααααααααΌαααΆαααααΎααααΆααα
-
α’αΌαααΈαααΈαααΈα αααααΆααααααΎααΆαααΆαα½αααΌαααΆα αα·αααΌααααααΆααα ααΆαααααΎαααααΆααααααααααααααααΆαααα‘αΎαααΆαα»α αα·αα―αααΆαααααααΆ ααααΌα ααΆααΆαααααΆααααΆαααααα
-
xtensor αααααΆααααααΎααΆαααΆαα½αα’αΆαααα α»αα·ααΆααα αα·αααααααΈααα ααΎαααΆαααααΎα―αααΆαααααααΆααααα½ααααα αΌααααα»ααααα αα R αααααααααΌα ααααΆα αααααΆαααα’αα»ααααΆαα±ααα’αααααααΎααΆαααΆαα½αα’αΆαααα α»αα·ααΆααα ααΆαααα αααα»αααααΆαααα½ααα αα·ααα½αααα
-
ndjson αααααΆααααα JSON α αααααΆααααααααααΌαααΆαααααΎαα αααα»α xtensor αααααααααααααααα·ααααα·αααΎααΆααΆαααααααΆααα αααα»αααααααα
-
RcppThread αααααΆααααΆααααα αααααΎαααΆααα α»ααααααααα·α αααααΈ JSON α ααΆαααααΎα―αααΆαααααααΆααααααααααααααα αααααα ααΈβααΆαβαααβαα·ααβααΆααβααβα αααΎαβ RcppParallel αααα αααααα»αα ααααααααααααααααααΆαααααααΆαααααΆαααααα·ααα»ααααααααΆααααααΆαα½αα
ααΆαα½αα±ααααααααααΆααααΆ xtensor ααααααααΆαααΆααΆ godsendα ααααααααΈααΎααΆααα·ααααααΆααΆααΆααα»αααΆαααΌααααΌααΆα αα·αααααΎαααΆαααααα α’αααα’αα·ααααααααααααΆααΆαααααΎααααααΆαααααΉαααααΌα αα·αααααΎααααα½αααααΆαα αα·ααααα’α·αα αααααΆααααα½ααααααα½ααα ααΆα’αΆα α’αα»ααααααΆαααααααααααΆααααΈα OpenCV αα ααΆ xtensor tensors ααααΌα ααΆαα·ααΈαα½αααΎααααΈαααα αΌαααααΆααΌαααΌαααΆα 3-dimensional tensor αα ααΆ tensor 4-dimensional αααα·ααΆαααααααΉαααααΌα (ααΆα ααααα½αααΆααααΆαα)α
αααααΆαααααααΆααααα Rcpp, xtensor αα·α RcppThread
ααΎααααΈα αααααα―αααΆααααααααΎα―αααΆααααααααα αα·αααΆαααααΆααααΆαααααααΆαα½ααααααΆααααααααΆαααα‘αΎααα ααΎαααααααα ααΎαααΆαααααΎααααααΆααααααα·ααΈαααα½ααααααΆαα’αα»αααααα αααα»ααααα αα Rcpp. ααΎααααΈαααααααααααΌα αα·αααααααααααααααααααα· ααΎαααΆαααααΎα§αααααααααΎααααΆααααΈαα»α ααααααα·αα pkg- ααααααα ααΆαααααααα.
ααΆαα’αα»αααααααααα·ααΈαααα½α Rcpp αααααΆααααΆαααααΎααααΆαααααααΆααα OpenCV
Rcpp::registerPlugin("opencv", function() {
# ΠΠΎΠ·ΠΌΠΎΠΆΠ½ΡΠ΅ Π½Π°Π·Π²Π°Π½ΠΈΡ ΠΏΠ°ΠΊΠ΅ΡΠ°
pkg_config_name <- c("opencv", "opencv4")
# ΠΠΈΠ½Π°ΡΠ½ΡΠΉ ΡΠ°ΠΉΠ» ΡΡΠΈΠ»ΠΈΡΡ pkg-config
pkg_config_bin <- Sys.which("pkg-config")
# ΠΡΠΎΠ²ΡΠ΅ΠΊΠ° Π½Π°Π»ΠΈΡΠΈΡ ΡΡΠΈΠ»ΠΈΡΡ Π² ΡΠΈΡΡΠ΅ΠΌΠ΅
checkmate::assert_file_exists(pkg_config_bin, access = "x")
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π°Π»ΠΈΡΠΈΡ ΡΠ°ΠΉΠ»Π° Π½Π°ΡΡΡΠΎΠ΅ΠΊ OpenCV Π΄Π»Ρ pkg-config
check <- sapply(pkg_config_name,
function(pkg) system(paste(pkg_config_bin, pkg)))
if (all(check != 0)) {
stop("OpenCV config for the pkg-config not found", call. = FALSE)
}
pkg_config_name <- pkg_config_name[check == 0]
list(env = list(
PKG_CXXFLAGS = system(paste(pkg_config_bin, "--cflags", pkg_config_name),
intern = TRUE),
PKG_LIBS = system(paste(pkg_config_bin, "--libs", pkg_config_name),
intern = TRUE)
))
})
ααΆααααααααααααα·ααααα·ααΆααααααααααα·ααΈαααα½α αααααααΆααααααααΉαααααΌαααΆααααα½ααααα»αα’αα‘α»ααααααααΎαααΆαα αααααα
Rcpp:::.plugins$opencv()$env
# $PKG_CXXFLAGS
# [1] "-I/usr/include/opencv"
#
# $PKG_LIBS
# [1] "-lopencv_shape -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_datasets -lopencv_dpm -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_line_descriptor -lopencv_optflow -lopencv_video -lopencv_plot -lopencv_reg -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_rgbd -lopencv_viz -lopencv_surface_matching -lopencv_text -lopencv_ximgproc -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_xobjdetect -lopencv_objdetect -lopencv_ml -lopencv_xphoto -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_photo -lopencv_imgproc -lopencv_core"
αααααΌαα’αα»αααααααααΆααααΆαααα JSON αα·ααααααΎαααααα»ααααααΆααααΆααααααΌααα ααΆαααααΌαααααααΌαααΆααααααα±αααα ααααα spoiler α ααΆααααΌα αααααααααααααααααα»αααααα ααΎααααΈαααααααα―αααΆαααααααΆ (ααααΌαααΆααααααΆαα ndjson)α
Sys.setenv("PKG_CXXFLAGS" = paste0("-I", normalizePath(file.path("src"))))
ααΆαα’αα»αααα JSON αα ααΆααΆααααααα tensor αα αααα»α C++
// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::plugins(opencv)]]
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::depends(RcppThread)]]
#include <xtensor/xjson.hpp>
#include <xtensor/xadapt.hpp>
#include <xtensor/xview.hpp>
#include <xtensor-r/rtensor.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <Rcpp.h>
#include <RcppThread.h>
// Π‘ΠΈΠ½ΠΎΠ½ΠΈΠΌΡ Π΄Π»Ρ ΡΠΈΠΏΠΎΠ²
using RcppThread::parallelFor;
using json = nlohmann::json;
using points = xt::xtensor<double,2>; // ΠΠ·Π²Π»Π΅ΡΡΠ½Π½ΡΠ΅ ΠΈΠ· JSON ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ ΡΠΎΡΠ΅ΠΊ
using strokes = std::vector<points>; // ΠΠ·Π²Π»Π΅ΡΡΠ½Π½ΡΠ΅ ΠΈΠ· JSON ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°ΡΡ ΡΠΎΡΠ΅ΠΊ
using xtensor3d = xt::xtensor<double, 3>; // Π’Π΅Π½Π·ΠΎΡ Π΄Π»Ρ Ρ
ΡΠ°Π½Π΅Π½ΠΈΡ ΠΌΠ°ΡΡΠΈΡΡ ΠΈΠ·ΠΎΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ
using xtensor4d = xt::xtensor<double, 4>; // Π’Π΅Π½Π·ΠΎΡ Π΄Π»Ρ Ρ
ΡΠ°Π½Π΅Π½ΠΈΡ ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ²Π° ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ
using rtensor3d = xt::rtensor<double, 3>; // ΠΠ±ΡΡΡΠΊΠ° Π΄Π»Ρ ΡΠΊΡΠΏΠΎΡΡΠ° Π² R
using rtensor4d = xt::rtensor<double, 4>; // ΠΠ±ΡΡΡΠΊΠ° Π΄Π»Ρ ΡΠΊΡΠΏΠΎΡΡΠ° Π² R
// Π‘ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΊΠΎΠ½ΡΡΠ°Π½ΡΡ
// Π Π°Π·ΠΌΠ΅Ρ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π² ΠΏΠΈΠΊΡΠ΅Π»ΡΡ
const static int SIZE = 256;
// Π’ΠΈΠΏ Π»ΠΈΠ½ΠΈΠΈ
// Π‘ΠΌ. https://en.wikipedia.org/wiki/Pixel_connectivity#2-dimensional
const static int LINE_TYPE = cv::LINE_4;
// Π’ΠΎΠ»ΡΠΈΠ½Π° Π»ΠΈΠ½ΠΈΠΈ Π² ΠΏΠΈΠΊΡΠ΅Π»ΡΡ
const static int LINE_WIDTH = 3;
// ΠΠ»Π³ΠΎΡΠΈΡΠΌ ΡΠ΅ΡΠ°ΠΉΠ·Π°
// https://docs.opencv.org/3.1.0/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121
const static int RESIZE_TYPE = cv::INTER_LINEAR;
// Π¨Π°Π±Π»ΠΎΠ½ Π΄Π»Ρ ΠΊΠΎΠ½Π²Π΅ΡΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ OpenCV-ΠΌΠ°ΡΡΠΈΡΡ Π² ΡΠ΅Π½Π·ΠΎΡ
template <typename T, int NCH, typename XT=xt::xtensor<T,3,xt::layout_type::column_major>>
XT to_xt(const cv::Mat_<cv::Vec<T, NCH>>& src) {
// Π Π°Π·ΠΌΠ΅ΡΠ½ΠΎΡΡΡ ΡΠ΅Π»Π΅Π²ΠΎΠ³ΠΎ ΡΠ΅Π½Π·ΠΎΡΠ°
std::vector<int> shape = {src.rows, src.cols, NCH};
// ΠΠ±ΡΠ΅Π΅ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠ»Π΅ΠΌΠ΅Π½ΡΠΎΠ² Π² ΠΌΠ°ΡΡΠΈΠ²Π΅
size_t size = src.total() * NCH;
// ΠΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ cv::Mat Π² xt::xtensor
XT res = xt::adapt((T*) src.data, size, xt::no_ownership(), shape);
return res;
}
// ΠΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ JSON Π² ΡΠΏΠΈΡΠΎΠΊ ΠΊΠΎΠΎΡΠ΄ΠΈΠ½Π°Ρ ΡΠΎΡΠ΅ΠΊ
strokes parse_json(const std::string& x) {
auto j = json::parse(x);
// Π Π΅Π·ΡΠ»ΡΡΠ°Ρ ΠΏΠ°ΡΡΠΈΠ½Π³Π° Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±ΡΡΡ ΠΌΠ°ΡΡΠΈΠ²ΠΎΠΌ
if (!j.is_array()) {
throw std::runtime_error("'x' must be JSON array.");
}
strokes res;
res.reserve(j.size());
for (const auto& a: j) {
// ΠΠ°ΠΆΠ΄ΡΠΉ ΡΠ»Π΅ΠΌΠ΅Π½Ρ ΠΌΠ°ΡΡΠΈΠ²Π° Π΄ΠΎΠ»ΠΆΠ΅Π½ Π±ΡΡΡ 2-ΠΌΠ΅ΡΠ½ΡΠΌ ΠΌΠ°ΡΡΠΈΠ²ΠΎΠΌ
if (!a.is_array() || a.size() != 2) {
throw std::runtime_error("'x' must include only 2d arrays.");
}
// ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ Π²Π΅ΠΊΡΠΎΡΠ° ΡΠΎΡΠ΅ΠΊ
auto p = a.get<points>();
res.push_back(p);
}
return res;
}
// ΠΡΡΠΈΡΠΎΠ²ΠΊΠ° Π»ΠΈΠ½ΠΈΠΉ
// Π¦Π²Π΅ΡΠ° HSV
cv::Mat ocv_draw_lines(const strokes& x, bool color = true) {
// ΠΡΡ
ΠΎΠ΄Π½ΡΠΉ ΡΠΈΠΏ ΠΌΠ°ΡΡΠΈΡΡ
auto stype = color ? CV_8UC3 : CV_8UC1;
// ΠΡΠΎΠ³ΠΎΠ²ΡΠΉ ΡΠΈΠΏ ΠΌΠ°ΡΡΠΈΡΡ
auto dtype = color ? CV_32FC3 : CV_32FC1;
auto bg = color ? cv::Scalar(0, 0, 255) : cv::Scalar(255);
auto col = color ? cv::Scalar(0, 255, 220) : cv::Scalar(0);
cv::Mat img = cv::Mat(SIZE, SIZE, stype, bg);
// ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π»ΠΈΠ½ΠΈΠΉ
size_t n = x.size();
for (const auto& s: x) {
// ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠΎΡΠ΅ΠΊ Π² Π»ΠΈΠ½ΠΈΠΈ
size_t n_points = s.shape()[1];
for (size_t i = 0; i < n_points - 1; ++i) {
// Π’ΠΎΡΠΊΠ° Π½Π°ΡΠ°Π»Π° ΡΡΡΠΈΡ
Π°
cv::Point from(s(0, i), s(1, i));
// Π’ΠΎΡΠΊΠ° ΠΎΠΊΠΎΠ½ΡΠ°Π½ΠΈΡ ΡΡΡΠΈΡ
Π°
cv::Point to(s(0, i + 1), s(1, i + 1));
// ΠΡΡΠΈΡΠΎΠ²ΠΊΠ° Π»ΠΈΠ½ΠΈΠΈ
cv::line(img, from, to, col, LINE_WIDTH, LINE_TYPE);
}
if (color) {
// ΠΠ΅Π½ΡΠ΅ΠΌ ΡΠ²Π΅Ρ Π»ΠΈΠ½ΠΈΠΈ
col[0] += 180 / n;
}
}
if (color) {
// ΠΠ΅Π½ΡΠ΅ΠΌ ΡΠ²Π΅ΡΠΎΠ²ΠΎΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ Π½Π° RGB
cv::cvtColor(img, img, cv::COLOR_HSV2RGB);
}
// ΠΠ΅Π½ΡΠ΅ΠΌ ΡΠΎΡΠΌΠ°Ρ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΡ Π½Π° float32 Ρ Π΄ΠΈΠ°ΠΏΠ°Π·ΠΎΠ½ΠΎΠΌ [0, 1]
img.convertTo(img, dtype, 1 / 255.0);
return img;
}
// ΠΠ±ΡΠ°Π±ΠΎΡΠΊΠ° JSON ΠΈ ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΈΠ΅ ΡΠ΅Π½Π·ΠΎΡΠ° Ρ Π΄Π°Π½Π½ΡΠΌΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ
xtensor3d process(const std::string& x, double scale = 1.0, bool color = true) {
auto p = parse_json(x);
auto img = ocv_draw_lines(p, color);
if (scale != 1) {
cv::Mat out;
cv::resize(img, out, cv::Size(), scale, scale, RESIZE_TYPE);
cv::swap(img, out);
out.release();
}
xtensor3d arr = color ? to_xt<double,3>(img) : to_xt<double,1>(img);
return arr;
}
// [[Rcpp::export]]
rtensor3d cpp_process_json_str(const std::string& x,
double scale = 1.0,
bool color = true) {
xtensor3d res = process(x, scale, color);
return res;
}
// [[Rcpp::export]]
rtensor4d cpp_process_json_vector(const std::vector<std::string>& x,
double scale = 1.0,
bool color = false) {
size_t n = x.size();
size_t dim = floor(SIZE * scale);
size_t channels = color ? 3 : 1;
xtensor4d res({n, dim, dim, channels});
parallelFor(0, n, [&x, &res, scale, color](int i) {
xtensor3d tmp = process(x[i], scale, color);
auto view = xt::view(res, i, xt::all(), xt::all(), xt::all());
view = tmp;
});
return res;
}
αααααΌαααααα½αααααααΌαααΆαααΆαααααα»αα―αααΆα src/cv_xt.cpp
α αΎαα
αααααααΆαα½αααΆααααααααΆ Rcpp::sourceCpp(file = "src/cv_xt.cpp", env = .GlobalEnv)
; ααΆαααΆααααααΆααααΆαααΆααααααα nlohmann/json.hpp
ααΈ
-
to_xt
- αα»αααΆαααααΌαααααΆααααααααααααΆααααΈαααΌαααΆα (cv::Mat
) αα tensor αα½ααxt::xtensor
; -
parse_json
- αα»αααΆααααααααα’αααα JSON αααααααΌα’αααααααα ααα»α αααααα ααααΆαα ααΆααα·α αααα -
ocv_draw_lines
- ααΈααα·α αααααααααααα ααα»α ααΌααααααΆαααα α»ααα; -
process
- αα½ααααα αΌαααααΆααΌααα»αααΆαααΆαααΎ α αΎααααααΆαααααααααααααααΆααααα»αααΆαααααΎααΆαααααααΆαααΌαααΆαααααααα -
cpp_process_json_str
- αα»αααΎαα»αααΆαprocess
αααααΆαα αααααααααα ααΆααααα» R (α’αΆαααα α»αα·ααΆααα); -
cpp_process_json_vector
- αα»αααΎαα»αααΆαcpp_process_json_str
αααα’αα»ααααΆαα±ααα’αααααααΎαααΆαααα·α αααααααα’αααααααα»ααααααα α»ααααα
ααΎααααΈααΌααααααΆαααα α»ααα ααααΌααα HSV ααααΌαααΆαααααΎ αααααΆαααααααααΆαααααααααα ααΆ RGB α αααααΆαααααααααααα
arr <- cpp_process_json_str(tmp_data[4, drawing])
dim(arr)
# [1] 256 256 3
plot(magick::image_read(arr))
ααΆαααααααααααααΏαααααΆαα’αα»αααααα
αααα»α R αα·α C ++
res_bench <- bench::mark(
r_process_json_str(tmp_data[4, drawing], scale = 0.5),
cpp_process_json_str(tmp_data[4, drawing], scale = 0.5),
check = FALSE,
min_iterations = 100
)
# ΠΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π±Π΅Π½ΡΠΌΠ°ΡΠΊΠ°
cols <- c("expression", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]
# expression min median max `itr/sec` total_time n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 r_process_json_str 3.49ms 3.55ms 4.47ms 273. 490ms 134
# 2 cpp_process_json_str 1.94ms 2.02ms 5.32ms 489. 497ms 243
library(ggplot2)
# ΠΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π·Π°ΠΌΠ΅ΡΠ°
res_bench <- bench::press(
batch_size = 2^(4:10),
{
.data <- tmp_data[sample(seq_len(.N), batch_size), drawing]
bench::mark(
r_process_json_vector(.data, scale = 0.5),
cpp_process_json_vector(.data, scale = 0.5),
min_iterations = 50,
check = FALSE
)
}
)
res_bench[, cols]
# expression batch_size min median max `itr/sec` total_time n_itr
# <chr> <dbl> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 r 16 50.61ms 53.34ms 54.82ms 19.1 471.13ms 9
# 2 cpp 16 4.46ms 5.39ms 7.78ms 192. 474.09ms 91
# 3 r 32 105.7ms 109.74ms 212.26ms 7.69 6.5s 50
# 4 cpp 32 7.76ms 10.97ms 15.23ms 95.6 522.78ms 50
# 5 r 64 211.41ms 226.18ms 332.65ms 3.85 12.99s 50
# 6 cpp 64 25.09ms 27.34ms 32.04ms 36.0 1.39s 50
# 7 r 128 534.5ms 627.92ms 659.08ms 1.61 31.03s 50
# 8 cpp 128 56.37ms 58.46ms 66.03ms 16.9 2.95s 50
# 9 r 256 1.15s 1.18s 1.29s 0.851 58.78s 50
# 10 cpp 256 114.97ms 117.39ms 130.09ms 8.45 5.92s 50
# 11 r 512 2.09s 2.15s 2.32s 0.463 1.8m 50
# 12 cpp 512 230.81ms 235.6ms 261.99ms 4.18 11.97s 50
# 13 r 1024 4s 4.22s 4.4s 0.238 3.5m 50
# 14 cpp 1024 410.48ms 431.43ms 462.44ms 2.33 21.45s 50
ggplot(res_bench, aes(x = factor(batch_size), y = median,
group = expression, color = expression)) +
geom_point() +
geom_line() +
ylab("median time, s") +
theme_minimal() +
scale_color_discrete(name = "", labels = c("cpp", "r")) +
theme(legend.position = "bottom")
ααΌα αααα’αααα’αΆα ααΎαααΆααααααΎαααααΏαααΆααααααα ααΆααΆαααΆααααααΆααααααΆααααΆααα αΎαααΆαα·αα’αΆα αα αα½α αααααα»αααΆαα αΆαααααααΌα C ++ αααααΆααααααααααααααΌα R α
3. Iterators αααααΆαα unloading batches ααΈ database
R ααΆααααααα·αααααααα½ααααααααΆααααααΎαααΆααα·αααααααααααααΉα RAM ααααααααα Python ααααΌαααΆαααααααααααααααααααΎαααΆααα·ααααααααααα αααα’αα»ααααΆαα±ααα’αααα’αα»ααααααΆαααααΆαααα ααααΌαααΆααααΆαααΆααααα½α αα·ααααααααααΆαα· (ααΆαααααΆαααααααΎα’αααα αα αΆαααΆααααα ) α ααααΌαα»ααΆα αα·αααΆααααααααααααΆααααΎααα αααα»αααα·αααααααα αΆαααααΆααα·αααααΆααΊαααααΆααααααααααΆααααα αααααααΌαααΆαααααα»ααααααΆαααααα·ααΈααΆαααααα α»ααααααΆαααΆαα½αααΉαααΆααααΆαααααααΆααααααααΆααα ααα αΆαααΈαα½αααααααααΎαααααααΌα αα½αααααΆααααααα α¬ααααα»αααααΆαααΌα α
αααααααααα·ααααΆαααα αααααααααααα»α Python ααΆαααααΆαααα·ααααααα’αα»αααα iterators αααααα’ααααΎαα·ααααααα ααΆααΆα ααΌαααΆααααα»αααα―αααΆα αααααααααααΈαααα α’αααα’αΆα ααααΎαααααΎααααααααααα½α ααΆααααα α¬αααααααααααα½αα―ααααααΆαααα·α αα ααΆαααΆααααΆααα αα αααα»α R ααΎαα’αΆα ααΆαααα’αααααααααααααΈαααααααα·αααααΆααα’αααααααααΆααα Python keras ααΆαα½αααΉααααααααΆααααααααααααααααααΆ αααααααΎαααα αααααααααααΌα ααααΆ αααααΆααααΎαααΆααα ααΎααααΌααααααα αα ααααΎα‘αΎααα·α. ααααααααααααααΉαααα½αααΆαα’ααααααααααΆα ααααα‘αααα½α; ααΆαα·αααααΉαααα’αα»ααααΆαα±ααα’αααααααΎαααΆαααΌα Python ααΈ R ααα»αααααααααα»αααααααααΆααα’αα»ααααΆαα±ααα’ααααααααααααα»αααΆα R αα·α Python sessions αααααααΎαααΆααααααααααααααααα·ααΌαααΆααααααααααααα αΆαααΆα αααΆααα’ααα
ααΎαααΆααααα αΆαααααααΌαααΆααααα»αααΆααααααΆαα»ααα·ααααααααΆααα’αααα αααα»α RAM αααααααΎ MonetDBLite ααΆαααΆα "αααααΆααααααααααΆα" ααΆααα’ααααΉαααααΌαααΆαα’αα»αααααααααΌαααΎααα αααα»α Python ααΎαααααΆααααααααΌαααααα iterator ααΎαα·αααααα ααααααα·αααΆαα’αααΈαα½α ααΆααα αααααΆααααααΆαααΆααααααααα αααα»α R α¬ Python α ααΆααΆααααααΌαααΆαααΈααααΆαααααΆαααααααΆααααΆα ααΆααααΌααααααα‘ααααΆααΆα ααα αααα»αααααα·ααα»αααααΆαααΈαααα αα α αΎααααααΆαα»αααααΆαααΆαααααααΆαααΆαααΆαααααΎααααααα (αααααααααααα αααα»α R ααααΌαααΆαα’αα»ααααααΆααα·ααΈααΆαααααααα»ααααααααΎααΆααα·α)α ααΈαα»α ααΆααααΌαααΆααααααΌαα±αααααααααα’αΆαα R αααΆαα αααΆαααα ααΆα’αΆαα numpy αα ααΆααααα»α iterator ααα»αααααααααα αα α»αααααααααααα αα keras ααααΎααΆααααααα½αα―αα
α’ααααααα½αααααΆαααααΌααααααΆααααΆαααααα»ααααααΆα αα·ααα·αααααααα»ααααΆαααΆαααααααααΆαααΌα ααΆααααααα
Iterator αααααΆααααΆαααααα»ααααααΆα αα·ααα·αααααααα»ααααΆα
train_generator <- function(db_connection = con,
samples_index,
num_classes = 340,
batch_size = 32,
scale = 1,
color = FALSE,
imagenet_preproc = FALSE) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_class(con, "DBIConnection")
checkmate::assert_integerish(samples_index)
checkmate::assert_count(num_classes)
checkmate::assert_count(batch_size)
checkmate::assert_number(scale, lower = 0.001, upper = 5)
checkmate::assert_flag(color)
checkmate::assert_flag(imagenet_preproc)
# ΠΠ΅ΡΠ΅ΠΌΠ΅ΡΠΈΠ²Π°Π΅ΠΌ, ΡΡΠΎΠ±Ρ Π±ΡΠ°ΡΡ ΠΈ ΡΠ΄Π°Π»ΡΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΈΠ½Π΄Π΅ΠΊΡΡ Π±Π°ΡΡΠ΅ΠΉ ΠΏΠΎ ΠΏΠΎΡΡΠ΄ΠΊΡ
dt <- data.table::data.table(id = sample(samples_index))
# ΠΡΠΎΡΡΠ°Π²Π»ΡΠ΅ΠΌ Π½ΠΎΠΌΠ΅ΡΠ° Π±Π°ΡΡΠ΅ΠΉ
dt[, batch := (.I - 1L) %/% batch_size + 1L]
# ΠΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΡΠΎΠ»ΡΠΊΠΎ ΠΏΠΎΠ»Π½ΡΠ΅ Π±Π°ΡΡΠΈ ΠΈ ΠΈΠ½Π΄Π΅ΠΊΡΠΈΡΡΠ΅ΠΌ
dt <- dt[, if (.N == batch_size) .SD, keyby = batch]
# Π£ΡΡΠ°Π½Π°Π²Π»ΠΈΠ²Π°Π΅ΠΌ ΡΡΡΡΡΠΈΠΊ
i <- 1
# ΠΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π±Π°ΡΡΠ΅ΠΉ
max_i <- dt[, max(batch)]
# ΠΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΡ Π΄Π»Ρ Π²ΡΠ³ΡΡΠ·ΠΊΠΈ
sql <- sprintf(
"PREPARE SELECT drawing, label_int FROM doodles WHERE id IN (%s)",
paste(rep("?", batch_size), collapse = ",")
)
res <- DBI::dbSendQuery(con, sql)
# ΠΠ½Π°Π»ΠΎΠ³ keras::to_categorical
to_categorical <- function(x, num) {
n <- length(x)
m <- numeric(n * num)
m[x * n + seq_len(n)] <- 1
dim(m) <- c(n, num)
return(m)
}
# ΠΠ°ΠΌΡΠΊΠ°Π½ΠΈΠ΅
function() {
# ΠΠ°ΡΠΈΠ½Π°Π΅ΠΌ Π½ΠΎΠ²ΡΡ ΡΠΏΠΎΡ
Ρ
if (i > max_i) {
dt[, id := sample(id)]
data.table::setkey(dt, batch)
# Π‘Π±ΡΠ°ΡΡΠ²Π°Π΅ΠΌ ΡΡΡΡΡΠΈΠΊ
i <<- 1
max_i <<- dt[, max(batch)]
}
# ID Π΄Π»Ρ Π²ΡΠ³ΡΡΠ·ΠΊΠΈ Π΄Π°Π½Π½ΡΡ
batch_ind <- dt[batch == i, id]
# ΠΡΠ³ΡΡΠ·ΠΊΠ° Π΄Π°Π½Π½ΡΡ
batch <- DBI::dbFetch(DBI::dbBind(res, as.list(batch_ind)), n = -1)
# Π£Π²Π΅Π»ΠΈΡΠΈΠ²Π°Π΅ΠΌ ΡΡΡΡΡΠΈΠΊ
i <<- i + 1
# ΠΠ°ΡΡΠΈΠ½Π³ JSON ΠΈ ΠΏΠΎΠ΄Π³ΠΎΡΠΎΠ²ΠΊΠ° ΠΌΠ°ΡΡΠΈΠ²Π°
batch_x <- cpp_process_json_vector(batch$drawing, scale = scale, color = color)
if (imagenet_preproc) {
# Π¨ΠΊΠ°Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ c ΠΈΠ½ΡΠ΅ΡΠ²Π°Π»Π° [0, 1] Π½Π° ΠΈΠ½ΡΠ΅ΡΠ²Π°Π» [-1, 1]
batch_x <- (batch_x - 0.5) * 2
}
batch_y <- to_categorical(batch$label_int, num_classes)
result <- list(batch_x, batch_y)
return(result)
}
}
α’αα»ααααβααααΌαβααααΎβααΆβααΆαβαααα
αΌαβα’αααβαα½αβααΆαα½αβααΉαβααΆαβαααααΆααβαα
βααΌαααααΆαβαα·ααααααβα
ααα½αβαααααΆααβαααβααΆαβααααΎβα
ααα½αβααααΆααβααα αβααΆα
αβ ααΆαααααααΆα (scale = 1
ααααΌαβααΉαβααΆαβαααα αΆαβααΌαααΆαβαα 256x256 ααΈααααβ, scale = 0.5
- 128x128 ααΈαααα) ααΌα
ααΆααααα (color = FALSE
αααααΆααααΆααααα αΆαααΆααΆαααααααΆααααααααα
αααααααΎ color = TRUE
ααΆαααΆα
αααααααΆααα½αααααΆαααΈαα½ααααααΌαααΆαααΌαααΆαααααααΈ) αα·αααΌα
ααΆααααααΎαααΆααα»ααααααΆαααααααΆααααααααΌαααΆαααααα»ααααααΆαααΆαα»ααα
ααΎ imagenet α ααααααααααααΊααααΌαααΆαααΆα
αΆαααΆα
αααΎααααΈααααΎααΆαααααααΆααααααααΈααααααΈα
ααααα [0, 1] αα
α
ααααα [-1, 1] αααααααΌαααΆαααααΎαα
αααααααα»ααααααΆαα§ααααααααααΆαααααααααααα keras αααΌααα
αα»αααΆαααΆααααα
ααΆαααΆααααα½ααα·αα·αααααααααα’αΆαα»ααααα ααΆααΆααα½αα data.table
ααΆαα½αβααΉαβαααβαααααΆααβα
αααα»αβαααβα
αααααβααΈ samples_index
αα·ααααααΆα
α ααΆαα αα·αα
ααα½αα’αα·ααααΆααααΆα
α ααααΌα
ααΆαααααα SQL αααααΆαααααα»ααα·ααααααα
ααααΈααΌαααααΆααα·ααααααα ααΎαααΈααααααααΎαααΆαααααα analogue ααΏααααα»αααΆααα
ααΆααααα»α keras::to_categorical()
. ααΎαααΆαααααΎααααΆαααα·ααααααααααΎαααααΆααα’αααααααΆααααΆαααααα»ααααααΆα ααααααααααα»αααΆαααααααΆαααΆααααααααΆαααα»ααααΆα ααΌα
ααααααα αααααααααΌαααΆααααααααααααΆαααΆαααααα steps_per_epoch
αα
αααα α
keras::fit_generator()
αα·ααααααααα if (i > max_i)
ααααΎααΆααααααΆαααααααααα·ααΈαααααΆαααα»ααααΆαααα»αααααα
αα
αααα»ααα»αααΆαααΆααααα»α αα·αα·αααααα½αααααΌαααΆαααΆααααααα·ααααααΆααααΆα
ααααααΆαα αααααααααΆααααΌαααΆαααα
ααααΈααΌαααααΆααα·αααααα ααΆαα½αααΉαααΆαααΆααααΆα
αααΎαα‘αΎα ααΆαααα JSON (αα»αααΆα cpp_process_json_vector()
ααααααααα»α C++) αα·ααααααΎαα’αΆαααααααααΌαααααΆααΉαααΌαααΆαα αααααΆαααα ααα·α
ααααααα
αα½αααΆαα½αααααΆαααααΆααααααΌαααΆααααααΎα α’αΆαααααααΆααααααααΈαααα αα·αααααΆαααααΌαααΆααααα
αΌαααααΆαα
αααα»ααααααΈ αααααΆααααααααα‘ααα ααΎααααΈαααααΎαααααΏαααΆαααΆα ααΎαααΆαααααΎααΆααααααΎααα·αα·αααααααα»αααΆααΆα data.table
αα·αααΆαααααααααΆααααααα - αααααααΆααααα
αα "αααααααααααΈ" ααΆααααα ααΆααΆααα·αααααα ααΆαα·ααΆαααΆαααααα»αααΆααααααααΆααααΎααΆαααααααααααααα·αααααΆαααΆαα½αααΉαα
ααα½ααα·ααααααααααΆαααααΆαα½ααα
αααα»α R.
ααααααααααΆαααΆααααααΏααα ααΎαα»αααααΌααααα½ααα Core i5 ααΆαααΌα ααΆααααααα
Iterator αααααααΆα
library(Rcpp)
library(keras)
library(ggplot2)
source("utils/rcpp.R")
source("utils/keras_iterator.R")
con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))
ind <- seq_len(DBI::dbGetQuery(con, "SELECT count(*) FROM doodles")[[1L]])
num_classes <- DBI::dbGetQuery(con, "SELECT max(label_int) + 1 FROM doodles")[[1L]]
# ΠΠ½Π΄Π΅ΠΊΡΡ Π΄Π»Ρ ΠΎΠ±ΡΡΠ°ΡΡΠ΅ΠΉ Π²ΡΠ±ΠΎΡΠΊΠΈ
train_ind <- sample(ind, floor(length(ind) * 0.995))
# ΠΠ½Π΄Π΅ΠΊΡΡ Π΄Π»Ρ ΠΏΡΠΎΠ²Π΅ΡΠΎΡΠ½ΠΎΠΉ Π²ΡΠ±ΠΎΡΠΊΠΈ
val_ind <- ind[-train_ind]
rm(ind)
# ΠΠΎΡΡΡΠΈΡΠΈΠ΅Π½Ρ ΠΌΠ°ΡΡΡΠ°Π±Π°
scale <- 0.5
# ΠΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π·Π°ΠΌΠ΅ΡΠ°
res_bench <- bench::press(
batch_size = 2^(4:10),
{
it1 <- train_generator(
db_connection = con,
samples_index = train_ind,
num_classes = num_classes,
batch_size = batch_size,
scale = scale
)
bench::mark(
it1(),
min_iterations = 50L
)
}
)
# ΠΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π±Π΅Π½ΡΠΌΠ°ΡΠΊΠ°
cols <- c("batch_size", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]
# batch_size min median max `itr/sec` total_time n_itr
# <dbl> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:tm> <int>
# 1 16 25ms 64.36ms 92.2ms 15.9 3.09s 49
# 2 32 48.4ms 118.13ms 197.24ms 8.17 5.88s 48
# 3 64 69.3ms 117.93ms 181.14ms 8.57 5.83s 50
# 4 128 157.2ms 240.74ms 503.87ms 3.85 12.71s 49
# 5 256 359.3ms 613.52ms 988.73ms 1.54 30.5s 47
# 6 512 884.7ms 1.53s 2.07s 0.674 1.11m 45
# 7 1024 2.7s 3.83s 5.47s 0.261 2.81m 44
ggplot(res_bench, aes(x = factor(batch_size), y = median, group = 1)) +
geom_point() +
geom_line() +
ylab("median time, s") +
theme_minimal()
DBI::dbDisconnect(con, shutdown = TRUE)
ααααα·αααΎα’αααααΆα RAM αααααααααΆαα α’αααα’αΆα
αααααΎαααααΏαααααα·ααααα·ααΆαααααααΌαααααΆααα·ααααααααααααααααΆαα
RAM ααΌα
ααααΆααα (32 GB ααΊαααααααααΆαααααααΆαααα·α
αα
ααΆαααααααΎα)α αα
αααα»αααΈαα»α
ααΆαααΆαααααΌαααΆαααααααΆαααααΆαααΎα /dev/shm
ααΆααααΆαααα αΌααααααΆαααααααΆααααααααααΆα RAM α α’αααα’αΆα
ααααα·α
ααααααααααααααΆαααααααα½α /etc/fstab
ααΎααααΈααα½αααΆααααααααααΆααΌα
tmpfs /dev/shm tmpfs defaults,size=25g 0 0
. ααααΌαααααΆααααΆα
αΆααααααΎαα‘αΎααα·αα αΎααα·αα·αααααΎααααααααααααααΎαααΆαααΆααααααααΆ df -h
.
αααααα·ααΈ iterator αααααΆαααα·ααααααααΆαααααααΎααα ααΆααααααΆα αααααΆααααα»ααα·ααααααααΆαααααααααΉα RAM ααΆαααααα»αα
Iterator αααααΆαααα·ααααααααΆααααα
test_generator <- function(dt,
batch_size = 32,
scale = 1,
color = FALSE,
imagenet_preproc = FALSE) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_data_table(dt)
checkmate::assert_count(batch_size)
checkmate::assert_number(scale, lower = 0.001, upper = 5)
checkmate::assert_flag(color)
checkmate::assert_flag(imagenet_preproc)
# ΠΡΠΎΡΡΠ°Π²Π»ΡΠ΅ΠΌ Π½ΠΎΠΌΠ΅ΡΠ° Π±Π°ΡΡΠ΅ΠΉ
dt[, batch := (.I - 1L) %/% batch_size + 1L]
data.table::setkey(dt, batch)
i <- 1
max_i <- dt[, max(batch)]
# ΠΠ°ΠΌΡΠΊΠ°Π½ΠΈΠ΅
function() {
batch_x <- cpp_process_json_vector(dt[batch == i, drawing],
scale = scale, color = color)
if (imagenet_preproc) {
# Π¨ΠΊΠ°Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ c ΠΈΠ½ΡΠ΅ΡΠ²Π°Π»Π° [0, 1] Π½Π° ΠΈΠ½ΡΠ΅ΡΠ²Π°Π» [-1, 1]
batch_x <- (batch_x - 0.5) * 2
}
result <- list(batch_x)
i <<- i + 1
return(result)
}
}
4. ααΆαααααΎαααΎαααααΆααααααααααααΌ
ααααΆααααααααααααΌααααααααΎααΊ (batch, height, width, 3)
αααααΊα
ααα½αααα»αααα·ααα·αα’αΆα
ααααΆααααααΌαααΆαααα αα·αααΆαααΆαααααααααααααα
αααα»α Python αα ααΌα
ααααα αΎαααΎαααΆααααααΆαααααααΆαα αα·ααααααααΆαα’αα»ααααααααΆαααααα½αααααααΎαααααααΆααααααααααα αααααααΎααΆαα’αααααααΎα (αααααααΆαααΆαααααααα
αααααααΆααα
αααα»ααααα keras):
ααααΆαααααααα Mobilenet v1
library(keras)
top_3_categorical_accuracy <- custom_metric(
name = "top_3_categorical_accuracy",
metric_fn = function(y_true, y_pred) {
metric_top_k_categorical_accuracy(y_true, y_pred, k = 3)
}
)
layer_sep_conv_bn <- function(object,
filters,
alpha = 1,
depth_multiplier = 1,
strides = c(2, 2)) {
# NB! depth_multiplier != resolution multiplier
# https://github.com/keras-team/keras/issues/10349
layer_depthwise_conv_2d(
object = object,
kernel_size = c(3, 3),
strides = strides,
padding = "same",
depth_multiplier = depth_multiplier
) %>%
layer_batch_normalization() %>%
layer_activation_relu() %>%
layer_conv_2d(
filters = filters * alpha,
kernel_size = c(1, 1),
strides = c(1, 1)
) %>%
layer_batch_normalization() %>%
layer_activation_relu()
}
get_mobilenet_v1 <- function(input_shape = c(224, 224, 1),
num_classes = 340,
alpha = 1,
depth_multiplier = 1,
optimizer = optimizer_adam(lr = 0.002),
loss = "categorical_crossentropy",
metrics = c("categorical_crossentropy",
top_3_categorical_accuracy)) {
inputs <- layer_input(shape = input_shape)
outputs <- inputs %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), strides = c(2, 2), padding = "same") %>%
layer_batch_normalization() %>%
layer_activation_relu() %>%
layer_sep_conv_bn(filters = 64, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 128, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 128, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 256, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 256, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
layer_sep_conv_bn(filters = 1024, strides = c(2, 2)) %>%
layer_sep_conv_bn(filters = 1024, strides = c(1, 1)) %>%
layer_global_average_pooling_2d() %>%
layer_dense(units = num_classes) %>%
layer_activation_softmax()
model <- keras_model(
inputs = inputs,
outputs = outputs
)
model %>% compile(
optimizer = optimizer,
loss = loss,
metrics = metrics
)
return(model)
}
αα»ααα·ααααα·αααα·ααΈααΆααααααααααΊααΆαααααααα αααα»αα
ααααΆααααααααΌαααααΆα
αααΎα ααα»αααααααα»ααα
αα·α αααα»ααα·αα
αααααααα‘αΎααα·αααΌαααααΆααααααααααΈαα½αααααααααα ααΎαβααβααααΌαβααΆαβααβααβα αΌαβα±ααΆαβαααα»αβααΆαβααααΎβααααααβααβαααΌαααβαααβααΆαβα αααΉαα αΆααβαα»αβαα
βααΎ imagenetα ααΌα
ααααααΆ ααΆααα·ααααΆα―αααΆαααΆααα½αα αα»αααΆα get_config()
α’αα»ααααΆαβα±ααβα’αααβααα½αβααΆαβααΆαβαα·αααααΆβα’αααΈβααααΌβαααα»αβααααααβαααβααααααβαααααΆααβααΆαβααααααα½α (base_model_conf$layers
- αααααΈ R ααααααΆ) αα·ααα»αααΆα from_config()
α’αα»ααααααΆααααααααααα
αααΆααα
ααΆααααα»ααααΌα
base_model_conf <- get_config(base_model)
base_model_conf$layers[[1]]$config$batch_input_shape[[4]] <- 1L
base_model <- from_config(base_model_conf)
α₯α‘αΌααααααΆαα·ααα·ααΆααααααα»αααΆαααααααα»αααΆαααΆαααααΎααααΈααα½αααΆαααΆαα½ααααααΆααααααααααα keras αααΌααααααααΆαα¬ααααΆααααααααααααΆαααααα»ααααααΆααα ααΎ imagenetα
αα»αααΆααααααΆαααααα»αααααΆαααααααααααααααααα½α ααΆααααα
get_model <- function(name = "mobilenet_v2",
input_shape = NULL,
weights = "imagenet",
pooling = "avg",
num_classes = NULL,
optimizer = keras::optimizer_adam(lr = 0.002),
loss = "categorical_crossentropy",
metrics = NULL,
color = TRUE,
compile = FALSE) {
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π°ΡΠ³ΡΠΌΠ΅Π½ΡΠΎΠ²
checkmate::assert_string(name)
checkmate::assert_integerish(input_shape, lower = 1, upper = 256, len = 3)
checkmate::assert_count(num_classes)
checkmate::assert_flag(color)
checkmate::assert_flag(compile)
# ΠΠΎΠ»ΡΡΠ°Π΅ΠΌ ΠΎΠ±ΡΠ΅ΠΊΡ ΠΈΠ· ΠΏΠ°ΠΊΠ΅ΡΠ° keras
model_fun <- get0(paste0("application_", name), envir = asNamespace("keras"))
# ΠΡΠΎΠ²Π΅ΡΠΊΠ° Π½Π°Π»ΠΈΡΠΈΡ ΠΎΠ±ΡΠ΅ΠΊΡΠ° Π² ΠΏΠ°ΠΊΠ΅ΡΠ΅
if (is.null(model_fun)) {
stop("Model ", shQuote(name), " not found.", call. = FALSE)
}
base_model <- model_fun(
input_shape = input_shape,
include_top = FALSE,
weights = weights,
pooling = pooling
)
# ΠΡΠ»ΠΈ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠ΅ Π½Π΅ ΡΠ²Π΅ΡΠ½ΠΎΠ΅, ΠΌΠ΅Π½ΡΠ΅ΠΌ ΡΠ°Π·ΠΌΠ΅ΡΠ½ΠΎΡΡΡ Π²Ρ
ΠΎΠ΄Π°
if (!color) {
base_model_conf <- keras::get_config(base_model)
base_model_conf$layers[[1]]$config$batch_input_shape[[4]] <- 1L
base_model <- keras::from_config(base_model_conf)
}
predictions <- keras::get_layer(base_model, "global_average_pooling2d_1")$output
predictions <- keras::layer_dense(predictions, units = num_classes, activation = "softmax")
model <- keras::keras_model(
inputs = base_model$input,
outputs = predictions
)
if (compile) {
keras::compile(
object = model,
optimizer = optimizer,
loss = loss,
metrics = metrics
)
}
return(model)
}
αα
αααααααΎααΌαααΆαααΆααααααα½α ααααΆααααααααααααΆαα αααΉαα αΆααααΆαα»αααααΌαααΆαααααΎααα αααα’αΆα
ααααΌαααΆααα½ααα»α: αααααααΎαα»αααΆα get_weights()
ααα½αααΆαααααααααααΌαααα»ααααααααααααααΈα’αΆαα R ααααΆααααααΌααα·ααΆαααααααΆαα»ααααΌααααααααΈααα (αααααααΆαααααααα½αα¬ααΆαααααααΆααααΈ) α αΎααααααΆαααααααα»ααααααααααα‘αααα
ααααΌαα·αααΆαα½αααΉααα»αααΆα set_weights()
. ααΎαβαα·αβαααβααααααβαα»αααΆαβαααβαα αααααβαα
βααααΆααααΆαβαααβααΆβα
αααΆααβα αΎαβααΆβααΆβααΆαβααα·αααΆαβα
αααΎαβααΆαβααΎααααΈβααααΎααΆαβααΆαα½αβααΌαααΆαβαααα
ααΎαααΆαα’αα»ααααααΆααα·αααααααΆαα αααΎααααααααΎ mobilenet αααα 1 αα·α 2 ααααΌα ααΆ resnet34 α ααααΆααααααααααααΎαααΆα αααΎααααααΌα ααΆ SE-ResNeXt ααααΎαααΆαααΆαααα’αα αααα»αααΆαααααα½ααααααααααα ααΆα’αα»αα ααΎααα·αααΆαααΆαα’αα»αααααααααααααα½α ααΆααααα αα αααα»αααΆαα ααααααααΎααα α αΎαααΎααα·αααΆααααααααααααα½αα―ααα (ααα»ααααααΎαααΉαααααααααΆααα·αααααΆαα)α
5. αααΆαααΆααααααααααααααΈα
ααΎααααΈααΆαααΆααααα½α ααΌαααΆααα’αααααααΆααααΆαα
αΆααααααΎαααΆαααααα»ααααααΆαααααΌαααΆααα
ααΆα‘αΎαααΆααααααΈααααα½α ααααααααααΆαααααΎααααΆαα
doc <- '
Usage:
train_nn.R --help
train_nn.R --list-models
train_nn.R [options]
Options:
-h --help Show this message.
-l --list-models List available models.
-m --model=<model> Neural network model name [default: mobilenet_v2].
-b --batch-size=<size> Batch size [default: 32].
-s --scale-factor=<ratio> Scale factor [default: 0.5].
-c --color Use color lines [default: FALSE].
-d --db-dir=<path> Path to database directory [default: Sys.getenv("db_dir")].
-r --validate-ratio=<ratio> Validate sample ratio [default: 0.995].
-n --n-gpu=<number> Number of GPUs [default: 1].
'
args <- docopt::docopt(doc)
αααα
αα α―αααΆα ααααΆαα±ααααΆαα’αα»αααα Rscript bin/train_nn.R -m resnet50 -c -d /home/andrey/doodle_db
α¬ ./bin/train_nn.R -m resnet50 -c -d /home/andrey/doodle_db
ααααα·αααΎα―αααΆα train_nn.R
α’αΆα
ααααα·ααααα·ααΆα (ααΆααααααααΆαααααΉαα
αΆααααααΎαααααα»ααααααΆαααααΌ resnet50
αα
ααΎααΌαααΆαααΈααααααααΆαααα α 128x128 ααΈαααα ααΌαααααΆααα·ααααααααααΌαααααΆαααΈααΆαααα
αααα»αααα―αααΆα /home/andrey/doodle_db
) α’αααα’αΆα
ααααααααααΏααα·ααααΆ αααααααααααα·ααΈαααααΎαααααα·αααααΆα αα·ααααΆαααΆαααααααααα’αΆα
ααααΌαααΆααααααααααααααα
αααα»ααααααΈα αα
αααα»αααααΎαααΆαααααΆααααα
αααΆαααααα»αααααΆααΆαααααααααΆαααΆααααΆαααααααα mobilenet_v2
ααΈαααααα
αα
α»αααααα keras αααα»αααΆαααααΎααααΆαα R
αα·ααΈααΆααααααααααΆαααααΎα±ααααΆα’αΆα
ααααΎαα
ααΆαααΎααααΈαααααΎαααααΏαααΆααα·αααααααΆαα½αααΉααααΌαααααααααααααΆααΎαααααααααα
ααΉαααΆαααΎαααααΎαααΆαααααααΈαααααααααααΈαα
αααα»α RStudio (ααΎααααααααΆαααααα
ααααΆαααααΎααααα’αΆα
ααααΎαα
ααΆα
6. Dockerization αα scripts
ααΎαααΆαααααΎ Docker ααΎααααΈααΆααΆααΆαααΌαααΆαα
αααααααα·ααααΆααααααΆααααααΌααααα»ααααααΆααααΆααααΆαα·ααααα»α αα·ααααααΆααααΆαααΆααα±ααααααΎααααΆαααααΆααα αααα
ααΎαααα α’αααα’αΆα
α
αΆααααααΎαααααΆααα§αααααααα ααααα·αααααααΆαααααΆααα’αααααααααααααα·ααΈ R ααΆαα½α
Docker α’αα»ααααΆαα±ααα’ααααααααΎαααΌαααΆαααααΆαααααα½αααααα’αααααΈααααΌα αα·αααααΎααΌαααΆαααααααααααΆααΌαααααΆααααααΆαααααααΎαααΌαααΆαααααΆαααααα½αααααα’αααα αα
ααααα·ααΆααααααΎααααααΆα ααΎαααΆαααααα·ααααΆαααΆααΆαααα‘αΎααααααα·ααΈαααααΆ NVIDIA, CUDA + cuDNN αα·ααααααΆααα Python ααΊααΆααααααα½αααααΆααααααΊααααΌαααΆα α αΎαααΎαααΆααααααα
α
α·αααααααΌαααΆαααααΌαααΆαααΆααΌαααααΆα tensorflow/tensorflow:1.12.0-gpu
ααααααααααααα
αα R α
αΆαααΆα
ααα
ααΈαααα
α―αααΆα docker α α»ααααααααΎααα ααΌα αααα
Dockerfile
FROM tensorflow/tensorflow:1.12.0-gpu
MAINTAINER Artem Klevtsov <[email protected]>
SHELL ["/bin/bash", "-c"]
ARG LOCALE="en_US.UTF-8"
ARG APT_PKG="libopencv-dev r-base r-base-dev littler"
ARG R_BIN_PKG="futile.logger checkmate data.table rcpp rapidjsonr dbi keras jsonlite curl digest remotes"
ARG R_SRC_PKG="xtensor RcppThread docopt MonetDBLite"
ARG PY_PIP_PKG="keras"
ARG DIRS="/db /app /app/data /app/models /app/logs"
RUN source /etc/os-release &&
echo "deb https://cloud.r-project.org/bin/linux/ubuntu ${UBUNTU_CODENAME}-cran35/" > /etc/apt/sources.list.d/cran35.list &&
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 &&
add-apt-repository -y ppa:marutter/c2d4u3.5 &&
add-apt-repository -y ppa:timsc/opencv-3.4 &&
apt-get update &&
apt-get install -y locales &&
locale-gen ${LOCALE} &&
apt-get install -y --no-install-recommends ${APT_PKG} &&
ln -s /usr/lib/R/site-library/littler/examples/install.r /usr/local/bin/install.r &&
ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r &&
ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r &&
echo 'options(Ncpus = parallel::detectCores())' >> /etc/R/Rprofile.site &&
echo 'options(repos = c(CRAN = "https://cloud.r-project.org"))' >> /etc/R/Rprofile.site &&
apt-get install -y $(printf "r-cran-%s " ${R_BIN_PKG}) &&
install.r ${R_SRC_PKG} &&
pip install ${PY_PIP_PKG} &&
mkdir -p ${DIRS} &&
chmod 777 ${DIRS} &&
rm -rf /tmp/downloaded_packages/ /tmp/*.rds &&
rm -rf /var/lib/apt/lists/*
COPY utils /app/utils
COPY src /app/src
COPY tests /app/tests
COPY bin/*.R /app/
ENV DBDIR="/db"
ENV CUDA_HOME="/usr/local/cuda"
ENV PATH="/app:${PATH}"
WORKDIR /app
VOLUME /db
VOLUME /app
CMD bash
ααΎααααΈααΆαααΆααααα½α αααα
αααααααΆαααααΎααααΌαααΆαααΆααα
αΌααα
αααα»αα’αααα ααΆαα
αααΎαααααααααΈααααααΆααααααααααΌαααΆαα
αααααα
ααΆααααα»ααα»αααα‘α»ααααααα‘αΎαα ααΎαααααΆαααααΆααααααΌα command shell αα
/bin/bash
αααααΆααααΆαααΆααααα½αααααΆαααααΎααααΆααααΆαα·ααΆ /etc/os-release
. ααααααααΆααααααΌαααΆααααα»αααΆααααααΆαααααα OS αα
αααα»αααΌαα
ααΎαααΈαααααα ααααααΈα bash ααΌα αα½αααααΌαααΆαααααααααα’αα»ααααΆαα±ααα’αααααΎααα»αααΊαααααΆαα½αααΉαααΆααααααααΆααααααα α§ααΆα ααα ααΆαααααα’αΆα ααΆααααααΈααααααΆααααααα»ααααααΆααααααΆααααααααααΆααααααΈαα»αααααΌαααΆαααΆαααα ααΆααααα»ααα»αααΊααα α¬αααααΆααααααααΆαααααΆααααααΆααααα α»α αα·ααααα½ααα·αα·αααααααα·ααααα·ααΆααααααα»αααΊαααα
ααααααΈαααΎααααΈααΎαααααΎαααΆααα»αααΊααα
#!/bin/sh
DBDIR=${PWD}/db
LOGSDIR=${PWD}/logs
MODELDIR=${PWD}/models
DATADIR=${PWD}/data
ARGS="--runtime=nvidia --rm -v ${DBDIR}:/db -v ${LOGSDIR}:/app/logs -v ${MODELDIR}:/app/models -v ${DATADIR}:/app/data"
if [ -z "$1" ]; then
CMD="Rscript /app/train_nn.R"
elif [ "$1" = "bash" ]; then
ARGS="${ARGS} -ti"
else
CMD="Rscript /app/train_nn.R $@"
fi
docker run ${ARGS} doodles-tf ${CMD}
ααααα·αααΎααααααΈα bash αααααααΎαααΆααααααααΆααααΆαααΆαααααα ααααααΈαααΉαααααΌαααΆαα α
αα
ααΆααααα»ααα»α train_nn.R
ααΆαα½αααΉααααααααααΆαααΎα; ααααα·αααΎα’αΆαα»αααααααΈααΆααααααΌαααΊ "bash" ααααα»αααΊαααααΉαα
αΆααααααΎαα’ααααααααααΆαα½ααααααΆααααααααΆα αα
αααα»αααααΈααααααααααΆααα’αα αααααααα’αΆαα»αααααααΈααΆααααααΌαααΆααααα½αα CMD="Rscript /app/train_nn.R $@"
.
αα½ααααααααΆααααΆ αααααααΆααα·ααααααααααα αα·αααΌαααααΆααα·αααααα ααααΌα ααΆαααααααΆαααααααΆαα»αααααΌαααααΆαααααα»ααααααΆαααααΌαααΆααααααα ααΆααααα»ααα»αααΊαααααΈαααααααααααΆαααΈα αααα’αα»ααααΆαα±ααα’αααα αΌαααααΎααααααααααααααΈαααααα·αα αΆαααΆα ααααα αα
7. ααΆαααααΎααααΆαα GPU α αααΎααα ααΎ Google Cloud
αααααααα·ααααα½αααααΆαααααα½αααααααααΊαα·αααααααααααααΆαααααΆαα (ααΌαααΎαααΌαααΆαα
αααααΎα ααα
αΈααΈ @Leigh.plt ααΈ ODS slack) α ααααα»ααααα½αααααα»ααααααααΆααααΉααααα αΆααα α αΎααααααΆααααΈααΆααα·αααααααΎαα»αααααΌααααααααΆα 1 GPU ααΎαααΆααααααα
α
α·αααααααΎααΆααα
αΆααααααααΌααααα»ααααααΆααα
ααΎ GPUs ααΆα
αααΎααα
αααα»ααααα ααΆαααααΎ GoogleCloud (dev/shm
.
ααΆαα αΆααα’αΆααααααααααΆαααααα»αααΊαααααααΌααααααα½ααα»αααααΌααααα»αααΆαααααΎααααΆαα GPUs α αααΎαα ααααΌα ααααΌααααΌαααΆααααααΎααα ααΎ CPU αααααααΎαααααα·ααΈαααααααααααα·αα ααΌα αα αααα»α Pythonα
with(tensorflow::tf$device("/cpu:0"), {
model_cpu <- get_model(
name = model_name,
input_shape = input_shape,
weights = weights,
metrics =(top_3_categorical_accuracy,
compile = FALSE
)
})
αααααΆααααααααΌααααα·αααΆαα ααααα (αααααααΆαα) ααααΌαααΆαα αααααα α ααα½α GPUs αααααΆαααααΆαα α αΎαααΆααααααααΆααααΈαααααΆααααΌαααΆαα αααααα
model <- keras::multi_gpu_model(model_cpu, gpus = n_gpu)
keras::compile(
object = model,
optimizer = keras::optimizer_adam(lr = 0.0004),
loss = "categorical_crossentropy",
metrics = c(top_3_categorical_accuracy)
)
αα αα ααααααα»ααΆαααααΆαααααααααααΆααααΆααα’αα ααΎαααααααααααΆααα α»αααααα ααΆαααααα»ααααααΆααααααΆααα α»αααααα ααΆααα·αααααα αα·αααΆαα αααΉαα αΆααα‘αΎααα·αααΌαααααΌααΆααααΌααααααΆαα GPUs ααΆα αααΎααα·αα’αΆα ααααΌαααΆαα’αα»ααααααΆαααα
ααΆαααααα»ααααααΆαααααΌαααΆααααα½ααα·αα·ααααααααααΆαααΆαααααΎααααΆααα tensorboardααααααααα½αααΎαα ααααααΆααααααααΆαααααα ααα» αα·αααΆααααααΆαα»αααααΌαααααΆααααααααααααΆααααααΆααααΈααααααΈαα½ααα
ααΆαα α αααα‘αααααα·α
# Π¨Π°Π±Π»ΠΎΠ½ ΠΈΠΌΠ΅Π½ΠΈ ΡΠ°ΠΉΠ»Π° Π»ΠΎΠ³Π°
log_file_tmpl <- file.path("logs", sprintf(
"%s_%d_%dch_%s.csv",
model_name,
dim_size,
channels,
format(Sys.time(), "%Y%m%d%H%M%OS")
))
# Π¨Π°Π±Π»ΠΎΠ½ ΠΈΠΌΠ΅Π½ΠΈ ΡΠ°ΠΉΠ»Π° ΠΌΠΎΠ΄Π΅Π»ΠΈ
model_file_tmpl <- file.path("models", sprintf(
"%s_%d_%dch_{epoch:02d}_{val_loss:.2f}.h5",
model_name,
dim_size,
channels
))
callbacks_list <- list(
keras::callback_csv_logger(
filename = log_file_tmpl
),
keras::callback_early_stopping(
monitor = "val_loss",
min_delta = 1e-4,
patience = 8,
verbose = 1,
mode = "min"
),
keras::callback_reduce_lr_on_plateau(
monitor = "val_loss",
factor = 0.5, # ΡΠΌΠ΅Π½ΡΡΠ°Π΅ΠΌ lr Π² 2 ΡΠ°Π·Π°
patience = 4,
verbose = 1,
min_delta = 1e-4,
mode = "min"
),
keras::callback_model_checkpoint(
filepath = model_file_tmpl,
monitor = "val_loss",
save_best_only = FALSE,
save_weights_only = FALSE,
mode = "min"
)
)
8. αααα½αα±ααααΆαααααα·ααααΆααα½αα
αααα αΆαα½αα ααα½ααααααΎααα½αααααααα·αααΆααααααΌαααΆααααααααΆαα
- Π² keras αα·αααΆααα»αααΆααααααααααα½α
ααΆααααα
αααααΆααααΆαααααααααααααααααααααααα·αααααΆααα’ααααΆαα·ααααΆααααα’αααααΎα (analogue
lr_finder
αα αααα»ααααααΆααα fast.ai); ααΆαα½αααΉαααΆααα·αααααααΉαααααααα½αα ααα½α ααΆα’αΆα αααααΌαααΆαα’αα»ααααααΆααΈααΈααΈαα R α§ααΆα ααααααα ; - ααΆααααααααα ααα»α αα»α ααΆαα·αα’αΆα αα αα½α αααααα»αααΆαααααΎαααΎαααααΏαα αααΉαα αΆααααααΉαααααΌααα αααααααΎ GPUs ααΆα αααΎαα
- ααΆαααΆαααααααΆαααααααΆαααααααααααααΆααααααααααΆαααααΎα ααΆαα·αααα’ααααααααΆαααα½αααΆαααααα»ααααααΆαααΆαα»ααα ααΎαααααΆαααΌαααΆαα
- ααααΆααααααΆααααααααα½α αα·αα’ααααΆααΆααααααΌααααααααΎαα’αΎααα (ααΆααα»αααααΆααααΌαααΈαα»αααΊααΆαααααΎααααααΎαα
ααΆαα’αα»αααα , ααΌαα’ααα»αααααΈααΆα ).
α’αααΈαααααΆαααααααααααααΌαααΆααααααΈααΆαααααα½ααααα
- αα ααΎαααααααΉααααααΆαααΆαααααΆα α’αααα’αΆα ααααΎααΆαααΆαα½αααα·ααΆααα·αααααααααααα (α αααΎαααααααα α RAM) αααααααΆαααΆαααΊα αΆααα αααααααΆαααα·α ααΆααΆααα·αααααα αααααΆαα»αα’αααα αα αΆααααααΆαααααΆαααααααααΆααΆααα ααΉααααααα ααααααααΆαααΆαα αααααα½αααΆ α αΎααα αααααααΎααΆαααααΉαααααΌα αααααααΆαααααααΆααααΎαααααααααααα αΆαααΈααααΏαααααααααα»ααααα»αα ααααα§αααααααΆααα’αααααααααΆααααΎααααααΆααααΆααΆααααααΈαα ααΆααααααΆαα»ααα·αααααααα αααα»αααΌαααααΆααα·ααααααα’αα»ααααΆαα±ααα’ααααααα»αααααΈααΆα αααΎαααααα·ααα·αααΈαααααΌαααΆααααα»αααΆαα ααααΆα ααααα»ααα·ααααααααΆααααΌααα αααα»α RAM α
- αα»αααΆαααΊααααα»α R α’αΆα ααααΌαααΆααααα½αααααα»αααΆαααΏααααα»α C++ αααααααΎαααα αα Rcpp. ααααα·αααΎααααααααΎααΆαααααΎααααΆαα RcppThread α¬ RcppParallelααΎαααα½αααΆαααΆαα’αα»αααααα α»αααααααααααα·ααΆ ααΌα αααααα·αα αΆαααΆα αααααΎααααααααΆααΆαα½αααΌααα ααααα·α R ααα
- αααα
αα Rcpp α’αΆα
ααααΌαααΆαααααΎαααααααΆαα
ααααααΉαααααααααααα C ++ α’αααααααΆαααααααΌαααΆαααααΌαααΆαααΌααααααΆαα
αα ααΈααα . α―αααΆαααααααΆαααααΆαααααααΆααα C αααααΆαααα½αα ααα½αααΌα ααΆ xtensor ααΆααα ααΎ CRAN αααααΊα αααααΆαα ααΆαααααααααα½ααααα»αααααΌαααΆααααααΎαα‘αΎααααααΆααααΆαα’αα»ααααααααααααααα½ααααα αΌαααΌα C++ αααααΆαααααα·αααααΆαααααααααααααααα½α ααΆααααα αα αααα»α R α ααΆαααΆααααα½αααααααααΊααΆαααααα·α ααΆααααααααααα αα·αα§ααααααα·ααΆαααΌα C++ αα·αα·αααααα αααα»α RStudio α - α―αααΆα α’αα»ααααΆαα±ααα’αααααααΎαααΆαααααααΈαααααααα½αα―αααΆαα½αααΉααααΆαααΆααααααα ααΆααΆααααα½αααααΎαα ααΎαααΆαααΈαααααΈα ααααΆα αα½αααΆααα αα ααααα docker α αα αααα»α RStudio ααΆααΆαααΆααα’αΆαααα’α½ααααα»αααΆαααααΎααΆααα·αααααααΆα αααΎαααααααΆαα½αααΉαααΆαααααα»ααααααΆααααααΆααααααααααΆα α αΎαααΆαααα‘αΎα IDE αα ααΎ server αααα½αααΆαα·ααααααααααΉαααααΌααααααα
- Docker ααΆααΆααΌαααΆαααΆααααα½αααααΌα αα·αααΆαααα·αα‘αΎααα·ααααααααααααΆαα’αααα’αα·αααααααααααΆααααααααααααααΆαα OS αα·ααααααΆααα ααααΌα ααΆααΆαααΆααααα½αααααΆαααααα·ααααα·αα ααΎαααΆαααΈαααα α’αααα’αΆα ααΎαααααΎαααΆααααααααααα»ααααααΆαααΆααααΌααααααααΆααααααΆααααααααΆαα½αα
- Google Cloud ααΊααΆαα·ααΈααααααααα·ααααΉαααα·ααΆαααα»αααΆααα·αααααααΎ Hardware αααααΆαααααααααα ααα»ααααα’αααααααΌαααααΎαααΎαααΆαααααααα ααΆααααααααααααααα»αααααααααα
- ααΆαααΆααααααΏααααααααααΌαααΈαα½ααααΊααΆαααααααααααααΆααααΆαα ααΆαα·ααααα ααααα½ααααα αΌα R αα·α C++ αα·αααΆαα½ααααα αα αααααΆααΈα‘αΆααααααα»α - ααααΆααααα½ααααααα
ααα»ααα αααα·αααααααααα·αααΆααΆαααααααααααααΆααααΆαα α αΎαααΎαααααααααΎααΆαααΎααααΈαααααααΆααααα αΆαα½αα ααα½ααααααΆαααΎαα‘αΎαα
ααααα: www.habr.com