Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural

Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural

E Habr!

I ka hāʻule hope loa, ua hoʻolauna ʻo Kaggle i kahi hoʻokūkū no ka hoʻokaʻawale ʻana i nā kiʻi paʻi lima, Quick Draw Doodle Recognition, kahi i komo ai kekahi hui o nā ʻepekema R: Artem Klevtsova, Luna Hooponopono Pilipo и Andrey Ogurtsov. ʻAʻole mākou e wehewehe kikoʻī i ka hoʻokūkū; ua hana ʻia i loko hoʻolaha hou.

ʻAʻole maikaʻi kēia manawa me ka mahiʻai mekala, akā nui ka ʻike waiwai i loaʻa, no laila makemake wau e haʻi i ke kaiāulu e pili ana i kekahi o nā mea hoihoi a maikaʻi hoʻi ma Kagle a me nā hana o kēlā me kēia lā. Ma waena o nā kumuhana i kūkākūkāʻia: ola paʻakikī me kaʻole OpenCV, JSON parsing (e nānā kēia mau hiʻohiʻona i ka hoʻohui ʻana o ka code C++ i loko o nā palapala a i ʻole nā ​​pūʻolo ma R me ka hoʻohana ʻana. Rcpp), ka hoʻohālikelike o nā palapala a me ka dockerization o ka hopena hope loa. Loaʻa nā code āpau mai ka memo ma kahi ʻano kūpono no ka hoʻokō ʻana hale waihona.

Nā mea:

  1. E hoʻouka pono i ka ʻikepili mai CSV i MonetDB
  2. Ke hoʻomākaukau nei i nā pūʻulu
  3. Nā mea hoʻoponopono no ka wehe ʻana i nā pūʻulu mai ka waihona
  4. Ke koho ʻana i kahi Hoʻohālike Hoʻohālike
  5. Hoʻohālikelike palapala
  6. Dockerization o nā palapala
  7. Ke hoʻohana nei i nā GPU he nui ma Google Cloud
  8. Ma kahi o ka hopena

1. Hoʻouka pono i ka ʻikepili mai CSV i loko o ka waihona MonetDB

ʻAʻole hāʻawi ʻia nā ʻikepili i kēia hoʻokūkū ma ke ʻano o nā kiʻi i hoʻomākaukau ʻia, akā ma ke ʻano o nā faila 340 CSV (hoʻokahi faila no kēlā me kēia papa) i loaʻa nā JSON me nā kiko kiko. Ma ka hoʻohui ʻana i kēia mau kiko me nā laina, loaʻa iā mākou ke kiʻi hope loa me 256x256 pixels. No kēlā me kēia moʻolelo, aia kahi lepili e hōʻike ana inā ua ʻike pololei ʻia ke kiʻi e ka papa helu i hoʻohana ʻia i ka manawa i hōʻiliʻili ʻia ai ka ʻikepili, kahi helu helu ʻelua o ka ʻāina i noho ai ka mea kākau o ke kiʻi, kahi hōʻailona kūʻokoʻa, kahi hōʻailona manawa. a me kahi inoa papa e pili ana i ka inoa faila. ʻO ka mana maʻalahi o ka ʻikepili kumu he 7.4 GB i loko o ka waihona a ma kahi o 20 GB ma hope o ka wehe ʻana, ʻo ka ʻikepili piha ma hope o ka wehe ʻana i ka 240 GB. Ua hōʻoia ka mea hoʻonohonoho i nā mana ʻelua i hana hou i nā kiʻi like ʻole, ʻo ia hoʻi, ʻaʻole i pau ka mana piha. I kekahi hihia, ʻo ka mālama ʻana i 50 miliona mau kiʻi i loko o nā faila kiʻi a i ʻole ke ʻano o nā arrays i manaʻo koke ʻia ʻaʻole pono, a ua hoʻoholo mākou e hoʻohui i nā faila CSV āpau mai ka waihona. train_simplified.zip i loko o ka waihona me nā hanauna hou o nā kiʻi o ka nui i makemake ʻia "ma ka lele" no kēlā me kēia pūʻulu.

Ua koho ʻia kahi ʻōnaehana i hōʻoia maikaʻi ʻia e like me ka DBMS MonetDB, ʻo ia hoʻi kahi hoʻokō no R ma ke ʻano he pūʻolo MonetDBLite. Aia i loko o ka pūʻolo kahi mana i hoʻopili ʻia o ka kikowaena waihona a hiki iā ʻoe ke kiʻi pololei i ke kikowaena mai kahi hālāwai R a hana pū me ia ma laila. Hana ʻia ka hana ʻana i kahi waihona a me ka hoʻopili ʻana iā ia me hoʻokahi kauoha:

con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))

Pono mākou e hana i ʻelua papa: hoʻokahi no ka ʻikepili āpau, ʻo kekahi no ka ʻike lawelawe e pili ana i nā faila i hoʻoiho ʻia (pono inā hewa kekahi mea a pono e hoʻomaka ke kaʻina hana ma hope o ka hoʻoiho ʻana i kekahi mau faila):

Ke hana nei i nā papa

if (!DBI::dbExistsTable(con, "doodles")) {
  DBI::dbCreateTable(
    con = con,
    name = "doodles",
    fields = c(
      "countrycode" = "char(2)",
      "drawing" = "text",
      "key_id" = "bigint",
      "recognized" = "bool",
      "timestamp" = "timestamp",
      "word" = "text"
    )
  )
}

if (!DBI::dbExistsTable(con, "upload_log")) {
  DBI::dbCreateTable(
    con = con,
    name = "upload_log",
    fields = c(
      "id" = "serial",
      "file_name" = "text UNIQUE",
      "uploaded" = "bool DEFAULT false"
    )
  )
}

ʻO ke ala wikiwiki loa e hoʻouka i ka ʻikepili i loko o ka waihona ʻo ia ke kope pololei i nā faila CSV me ka hoʻohana ʻana i SQL - kauoha COPY OFFSET 2 INTO tablename FROM path USING DELIMITERS ',','n','"' NULL AS '' BEST EFFORTkahi tablename - inoa papa a me path - ke ala i ka faila. ʻOiai e hana ana me ka waihona, ua ʻike ʻia ka hoʻokō i kūkulu ʻia unzip i R ʻaʻole hana pololei me ka nui o nā faila mai ka waihona, no laila ua hoʻohana mākou i ka ʻōnaehana unzip (e hoʻohana ana i ka palena getOption("unzip")).

Hana no ke kākau ʻana i ka waihona

#' @title Извлечение и загрузка файлов
#'
#' @description
#' Извлечение CSV-файлов из ZIP-архива и загрузка их в базу данных
#'
#' @param con Объект подключения к базе данных (класс `MonetDBEmbeddedConnection`).
#' @param tablename Название таблицы в базе данных.
#' @oaram zipfile Путь к ZIP-архиву.
#' @oaram filename Имя файла внури ZIP-архива.
#' @param preprocess Функция предобработки, которая будет применена извлечённому файлу.
#'   Должна принимать один аргумент `data` (объект `data.table`).
#'
#' @return `TRUE`.
#'
upload_file <- function(con, tablename, zipfile, filename, preprocess = NULL) {
  # Проверка аргументов
  checkmate::assert_class(con, "MonetDBEmbeddedConnection")
  checkmate::assert_string(tablename)
  checkmate::assert_string(filename)
  checkmate::assert_true(DBI::dbExistsTable(con, tablename))
  checkmate::assert_file_exists(zipfile, access = "r", extension = "zip")
  checkmate::assert_function(preprocess, args = c("data"), null.ok = TRUE)

  # Извлечение файла
  path <- file.path(tempdir(), filename)
  unzip(zipfile, files = filename, exdir = tempdir(), 
        junkpaths = TRUE, unzip = getOption("unzip"))
  on.exit(unlink(file.path(path)))

  # Применяем функция предобработки
  if (!is.null(preprocess)) {
    .data <- data.table::fread(file = path)
    .data <- preprocess(data = .data)
    data.table::fwrite(x = .data, file = path, append = FALSE)
    rm(.data)
  }

  # Запрос к БД на импорт CSV
  sql <- sprintf(
    "COPY OFFSET 2 INTO %s FROM '%s' USING DELIMITERS ',','n','"' NULL AS '' BEST EFFORT",
    tablename, path
  )
  # Выполнение запроса к БД
  DBI::dbExecute(con, sql)

  # Добавление записи об успешной загрузке в служебную таблицу
  DBI::dbExecute(con, sprintf("INSERT INTO upload_log(file_name, uploaded) VALUES('%s', true)",
                              filename))

  return(invisible(TRUE))
}

Inā pono ʻoe e hoʻololi i ka pākaukau ma mua o ke kākau ʻana i ka waihona, ua lawa ia e hele i ka hoʻopaʻapaʻa preprocess hana e hoʻololi i ka ʻikepili.

Code no ka hoʻouka ʻana i ka ʻikepili i loko o ka waihona:

Ke kākau ʻana i ka ʻikepili i ka waihona

# Список файлов для записи
files <- unzip(zipfile, list = TRUE)$Name

# Список исключений, если часть файлов уже была загружена
to_skip <- DBI::dbGetQuery(con, "SELECT file_name FROM upload_log")[[1L]]
files <- setdiff(files, to_skip)

if (length(files) > 0L) {
  # Запускаем таймер
  tictoc::tic()
  # Прогресс бар
  pb <- txtProgressBar(min = 0L, max = length(files), style = 3)
  for (i in seq_along(files)) {
    upload_file(con = con, tablename = "doodles", 
                zipfile = zipfile, filename = files[i])
    setTxtProgressBar(pb, i)
  }
  close(pb)
  # Останавливаем таймер
  tictoc::toc()
}

# 526.141 sec elapsed - копирование SSD->SSD
# 558.879 sec elapsed - копирование USB->SSD

Hiki ke ʻokoʻa ka manawa hoʻouka ʻikepili ma muli o nā hiʻohiʻona wikiwiki o ke kaʻa i hoʻohana ʻia. I kā mākou hihia, ʻo ka heluhelu a me ke kākau ʻana i loko o hoʻokahi SSD a i ʻole mai kahi flash drive (puna waihona) i kahi SSD (DB) e emi iho ma mua o 10 mau minuke.

He mau kekona hou aʻe e hana ai i kolamu me ka lepili papa helu helu a me ka kolamu kuhikuhi (ORDERED INDEX) me nā helu laina e hōʻike ʻia ai nā hiʻohiʻona i ka wā e hana ana i nā pūʻulu:

Ke hana ʻana i nā kolamu a me ka papa kuhikuhi

message("Generate lables")
invisible(DBI::dbExecute(con, "ALTER TABLE doodles ADD label_int int"))
invisible(DBI::dbExecute(con, "UPDATE doodles SET label_int = dense_rank() OVER (ORDER BY word) - 1"))

message("Generate row numbers")
invisible(DBI::dbExecute(con, "ALTER TABLE doodles ADD id serial"))
invisible(DBI::dbExecute(con, "CREATE ORDERED INDEX doodles_id_ord_idx ON doodles(id)"))

No ka hoʻoponopono i ka pilikia o ka hana ʻana i kahi pūʻulu ma ka lele, pono mākou e hoʻokō i ka wikiwiki kiʻekiʻe o ka unuhi ʻana i nā lālani maʻamau mai ka papaʻaina. doodles. No kēia, ua hoʻohana mākou i 3 mau hoʻopunipuni. ʻO ka mea mua e hōʻemi i ka dimensionality o ke ʻano e mālama ai i ka ID nānā. Ma ka waihona ʻikepili kumu, ʻo ke ʻano e pono ai e mālama i ka ID bigint, akā hiki i ka helu o nā nānā ke hoʻokomo i kā lākou mau mea hōʻike, e like me ka helu ordinal, i loko o ke ʻano int. ʻOi aku ka wikiwiki o ka huli ʻana ma kēia hihia. ʻO ka lua o ka hoʻopunipuni ka hoʻohana ʻana ORDERED INDEX - ua hiki mai mākou i kēia hoʻoholo i ka empirically, i ka hele ʻana i nā mea āpau i loaʻa nā koho. ʻO ke kolu, ʻo ia ka hoʻohana ʻana i nā nīnau parameterized. ʻO ke kumu o ke ʻano, ʻo ia ke hoʻokō i ke kauoha i hoʻokahi manawa PREPARE me ka hoʻohana hope ʻana i kahi ʻōlelo mākaukau i ka wā e hana ai i kahi hui o nā nīnau o ka ʻano like, akā ʻoiaʻiʻo aia kahi pōmaikaʻi i ka hoʻohālikelike ʻana me kahi mea maʻalahi. SELECT ua ʻike ʻia i loko o ka laulā o ka hewa helu helu.

ʻO ke kaʻina o ka hoʻouka ʻana i ka ʻikepili ʻaʻole i ʻoi aku ma mua o 450 MB o RAM. ʻO ia hoʻi, ʻo ke ala i wehewehe ʻia e hiki ai iā ʻoe ke hoʻoneʻe i nā datasets i ke kaupaona ʻana o nā ʻumi gigabytes ma kahi kokoke i nā lako kālā kālā, me kekahi mau mea papa hoʻokahi, he nani loa.

ʻO nā mea a pau i koe, ʻo ke ana ʻana i ka wikiwiki o ka lawe ʻana i ka ʻikepili (ʻokoʻa) a loiloi i ka scaling i ka wā e hōʻiliʻili ai i nā pūʻulu o nā ʻano nui like ʻole:

ʻIkepili waihona

library(ggplot2)

set.seed(0)
# Подключение к базе данных
con <- DBI::dbConnect(MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))

# Функция для подготовки запроса на стороне сервера
prep_sql <- function(batch_size) {
  sql <- sprintf("PREPARE SELECT id FROM doodles WHERE id IN (%s)",
                 paste(rep("?", batch_size), collapse = ","))
  res <- DBI::dbSendQuery(con, sql)
  return(res)
}

# Функция для извлечения данных
fetch_data <- function(rs, batch_size) {
  ids <- sample(seq_len(n), batch_size)
  res <- DBI::dbFetch(DBI::dbBind(rs, as.list(ids)))
  return(res)
}

# Проведение замера
res_bench <- bench::press(
  batch_size = 2^(4:10),
  {
    rs <- prep_sql(batch_size)
    bench::mark(
      fetch_data(rs, batch_size),
      min_iterations = 50L
    )
  }
)
# Параметры бенчмарка
cols <- c("batch_size", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]

#   batch_size      min   median      max `itr/sec` total_time n_itr
#        <dbl> <bch:tm> <bch:tm> <bch:tm>     <dbl>   <bch:tm> <int>
# 1         16   23.6ms  54.02ms  93.43ms     18.8        2.6s    49
# 2         32     38ms  84.83ms 151.55ms     11.4       4.29s    49
# 3         64   63.3ms 175.54ms 248.94ms     5.85       8.54s    50
# 4        128   83.2ms 341.52ms 496.24ms     3.00      16.69s    50
# 5        256  232.8ms 653.21ms 847.44ms     1.58      31.66s    50
# 6        512  784.6ms    1.41s    1.98s     0.740       1.1m    49
# 7       1024  681.7ms    2.72s    4.06s     0.377      2.16m    49

ggplot(res_bench, aes(x = factor(batch_size), y = median, group = 1)) +
  geom_point() +
  geom_line() +
  ylab("median time, s") +
  theme_minimal()

DBI::dbDisconnect(con, shutdown = TRUE)

Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural

2. Hoʻomākaukau i nā pūʻulu

ʻO ke kaʻina hana hoʻomākaukau pūʻulu holoʻokoʻa i kēia mau ʻanuʻu:

  1. Hoʻopili i kekahi mau JSON i loaʻa nā vectors o nā kaula me nā hoʻonohonoho o nā kiko.
  2. Kahakiʻi ʻana i nā laina kala e pili ana i nā hoʻonohonoho o nā kiko ma ke kiʻi o ka nui i makemake ʻia (no ka laʻana, 256 × 256 a i ʻole 128 × 128).
  3. Ke hoʻololi nei i nā kiʻi i loaʻa i kahi tensor.

Ma ke ʻano o ka hoʻokūkū ma waena o nā kernels Python, ua hoʻopau mua ʻia ka pilikia me ka hoʻohana ʻana OpenCV. ʻO kekahi o nā analogues maʻalahi a maopopo loa ma R e like me kēia:

Ke hoʻokō nei i ka JSON i ka hoʻololi ʻana ʻo Tensor ma R

r_process_json_str <- function(json, line.width = 3, 
                               color = TRUE, scale = 1) {
  # Парсинг JSON
  coords <- jsonlite::fromJSON(json, simplifyMatrix = FALSE)
  tmp <- tempfile()
  # Удаляем временный файл по завершению функции
  on.exit(unlink(tmp))
  png(filename = tmp, width = 256 * scale, height = 256 * scale, pointsize = 1)
  # Пустой график
  plot.new()
  # Размер окна графика
  plot.window(xlim = c(256 * scale, 0), ylim = c(256 * scale, 0))
  # Цвета линий
  cols <- if (color) rainbow(length(coords)) else "#000000"
  for (i in seq_along(coords)) {
    lines(x = coords[[i]][[1]] * scale, y = coords[[i]][[2]] * scale, 
          col = cols[i], lwd = line.width)
  }
  dev.off()
  # Преобразование изображения в 3-х мерный массив
  res <- png::readPNG(tmp)
  return(res)
}

r_process_json_vector <- function(x, ...) {
  res <- lapply(x, r_process_json_str, ...)
  # Объединение 3-х мерных массивов картинок в 4-х мерный в тензор
  res <- do.call(abind::abind, c(res, along = 0))
  return(res)
}

Hana ʻia ke kiʻi ʻana me ka hoʻohana ʻana i nā mea hana R maʻamau a mālama ʻia i kahi PNG pōkole i mālama ʻia ma RAM (ma Linux, aia nā papa kuhikuhi R pōkole i ka papa kuhikuhi. /tmp, kau ʻia ma RAM). Heluhelu ʻia kēia faila ma ke ʻano he ʻekolu-dimensional array me nā helu mai ka 0 a hiki i ka 1. He mea nui kēia no ka mea e heluhelu ʻia kahi BMP maʻamau i loko o kahi ʻano maka me nā code kala hex.

E ho'āʻo kākou i ka hopena:

zip_file <- file.path("data", "train_simplified.zip")
csv_file <- "cat.csv"
unzip(zip_file, files = csv_file, exdir = tempdir(), 
      junkpaths = TRUE, unzip = getOption("unzip"))
tmp_data <- data.table::fread(file.path(tempdir(), csv_file), sep = ",", 
                              select = "drawing", nrows = 10000)
arr <- r_process_json_str(tmp_data[4, drawing])
dim(arr)
# [1] 256 256   3
plot(magick::image_read(arr))

Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural

E hoʻokumu ʻia ka hui ponoʻī penei:

res <- r_process_json_vector(tmp_data[1:4, drawing], scale = 0.5)
str(res)
 # num [1:4, 1:128, 1:128, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
 # - attr(*, "dimnames")=List of 4
 #  ..$ : NULL
 #  ..$ : NULL
 #  ..$ : NULL
 #  ..$ : NULL

ʻO kēia hoʻokō me he mea lā ʻaʻole maikaʻi loa iā mākou, ʻoiai ʻo ka hoʻokumu ʻana i nā pūʻulu nui he manawa lōʻihi loa, a ua hoʻoholo mākou e hoʻohana i ka ʻike o kā mākou mau hoa hana ma o ka hoʻohana ʻana i kahi waihona ikaika. OpenCV. I kēlā manawa, ʻaʻohe pūʻolo i mākaukau no R (ʻaʻohe mea i kēia manawa), no laila ua kākau ʻia kahi hoʻokō liʻiliʻi o ka hana i koi ʻia ma C ++ me ka hoʻohui ʻana i ka code R me ka hoʻohana ʻana. Rcpp.

No ka hoʻoponopono ʻana i ka pilikia, ua hoʻohana ʻia nā pūʻolo a me nā waihona:

  1. OpenCV no ka hana ʻana me nā kiʻi a me ke kaha kiʻi laina. Hoʻohana ʻia nā waihona waihona ʻōnaehana i hoʻokomo mua ʻia a me nā faila poʻomanaʻo, a me ka hoʻopili ikaika.

  2. xtensor no ka hana ʻana me nā ʻāpana multidimensional a me nā tensor. Ua hoʻohana mākou i nā faila poʻomanaʻo i loko o ka pūʻolo R o ka inoa like. Hāʻawi ka waihona iā ʻoe e hana me nā ʻāpana multidimensional, ma ka lālani nui a me ke kolamu nui.

  3. ndjson no ka hoʻopau ʻana iā JSON. Hoʻohana ʻia kēia waihona ma xtensor 'akomi inā aia i loko o ka papahana.

  4. RcppThread no ka hoʻonohonoho ʻana i ka hoʻoili ʻia ʻana o kahi vector mai JSON. Ua hoʻohana i nā faila poʻomanaʻo i hāʻawi ʻia e kēia pūʻolo. Mai nā mea kaulana loa RcppParallel ʻO ka pūʻolo, ma waena o nā mea ʻē aʻe, loaʻa i kahi mīkini hoʻopili loop i kūkulu ʻia.

He mea kūpono eʻike i kēlā xtensor ua lilo ia i mea hoʻohiwahiwa akua: ma waho aʻe o ka ʻoiaʻiʻo he nui kāna hana a me ka hana kiʻekiʻe, ua pane nā mea hoʻomohala a pane koke i nā nīnau a me ka kikoʻī. Me kā lākou kōkua, ua hiki ke hoʻokō i nā hoʻololi o OpenCV matrices i xtensor tensors, a me kahi ala e hoʻohui ai i nā tensors kiʻi 3-dimensional i kahi tensor 4-dimensional o ke ana kūpono (ʻo ka pūʻulu ponoʻī).

Nā mea no ke aʻo ʻana iā Rcpp, xtensor a me RcppThread

https://thecoatlessprofessor.com/programming/unofficial-rcpp-api-documentation

https://docs.opencv.org/4.0.1/d7/dbd/group__imgproc.html

https://xtensor.readthedocs.io/en/latest/

https://xtensor.readthedocs.io/en/latest/file_loading.html#loading-json-data-into-xtensor

https://cran.r-project.org/web/packages/RcppThread/vignettes/RcppThread-vignette.pdf

No ka hōʻuluʻulu ʻana i nā faila e hoʻohana ana i nā faila ʻōnaehana a me nā loulou ikaika me nā hale waihona puke i hoʻokomo ʻia ma ka ʻōnaehana, ua hoʻohana mākou i ka mīkini plugin i hoʻokomo ʻia i loko o ka pūʻolo. Rcpp. No ka huli ʻana i nā ala a me nā hae, ua hoʻohana mākou i kahi pono Linux kaulana pkg-config.

Ka hoʻokō ʻana i ka plugin Rcpp no ​​ka hoʻohana ʻana i ka waihona OpenCV

Rcpp::registerPlugin("opencv", function() {
  # Возможные названия пакета
  pkg_config_name <- c("opencv", "opencv4")
  # Бинарный файл утилиты pkg-config
  pkg_config_bin <- Sys.which("pkg-config")
  # Проврека наличия утилиты в системе
  checkmate::assert_file_exists(pkg_config_bin, access = "x")
  # Проверка наличия файла настроек OpenCV для pkg-config
  check <- sapply(pkg_config_name, 
                  function(pkg) system(paste(pkg_config_bin, pkg)))
  if (all(check != 0)) {
    stop("OpenCV config for the pkg-config not found", call. = FALSE)
  }

  pkg_config_name <- pkg_config_name[check == 0]
  list(env = list(
    PKG_CXXFLAGS = system(paste(pkg_config_bin, "--cflags", pkg_config_name), 
                          intern = TRUE),
    PKG_LIBS = system(paste(pkg_config_bin, "--libs", pkg_config_name), 
                      intern = TRUE)
  ))
})

Ma muli o ka hana a ka plugin, e hoʻololi ʻia nā waiwai aʻe i ka wā o ke kaʻina hana:

Rcpp:::.plugins$opencv()$env

# $PKG_CXXFLAGS
# [1] "-I/usr/include/opencv"
#
# $PKG_LIBS
# [1] "-lopencv_shape -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_datasets -lopencv_dpm -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_line_descriptor -lopencv_optflow -lopencv_video -lopencv_plot -lopencv_reg -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_rgbd -lopencv_viz -lopencv_surface_matching -lopencv_text -lopencv_ximgproc -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_xobjdetect -lopencv_objdetect -lopencv_ml -lopencv_xphoto -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_photo -lopencv_imgproc -lopencv_core"

Hāʻawi ʻia ke code hoʻokō no ka parsing JSON a me ka hana ʻana i kahi pūʻulu no ka hoʻouna ʻana i ke kumu hoʻohālike i hāʻawi ʻia ma lalo o ka mea hao. ʻO ka mea mua, hoʻohui i kahi papa kuhikuhi papahana kūloko e ʻimi i nā faila poʻomanaʻo (pono no ndjson):

Sys.setenv("PKG_CXXFLAGS" = paste0("-I", normalizePath(file.path("src"))))

Ka hoʻokō ʻana o JSON i ka hoʻololi tensor ma C++

// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::plugins(opencv)]]
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::depends(RcppThread)]]

#include <xtensor/xjson.hpp>
#include <xtensor/xadapt.hpp>
#include <xtensor/xview.hpp>
#include <xtensor-r/rtensor.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <Rcpp.h>
#include <RcppThread.h>

// Синонимы для типов
using RcppThread::parallelFor;
using json = nlohmann::json;
using points = xt::xtensor<double,2>;     // Извлечённые из JSON координаты точек
using strokes = std::vector<points>;      // Извлечённые из JSON координаты точек
using xtensor3d = xt::xtensor<double, 3>; // Тензор для хранения матрицы изоображения
using xtensor4d = xt::xtensor<double, 4>; // Тензор для хранения множества изображений
using rtensor3d = xt::rtensor<double, 3>; // Обёртка для экспорта в R
using rtensor4d = xt::rtensor<double, 4>; // Обёртка для экспорта в R

// Статические константы
// Размер изображения в пикселях
const static int SIZE = 256;
// Тип линии
// См. https://en.wikipedia.org/wiki/Pixel_connectivity#2-dimensional
const static int LINE_TYPE = cv::LINE_4;
// Толщина линии в пикселях
const static int LINE_WIDTH = 3;
// Алгоритм ресайза
// https://docs.opencv.org/3.1.0/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121
const static int RESIZE_TYPE = cv::INTER_LINEAR;

// Шаблон для конвертирования OpenCV-матрицы в тензор
template <typename T, int NCH, typename XT=xt::xtensor<T,3,xt::layout_type::column_major>>
XT to_xt(const cv::Mat_<cv::Vec<T, NCH>>& src) {
  // Размерность целевого тензора
  std::vector<int> shape = {src.rows, src.cols, NCH};
  // Общее количество элементов в массиве
  size_t size = src.total() * NCH;
  // Преобразование cv::Mat в xt::xtensor
  XT res = xt::adapt((T*) src.data, size, xt::no_ownership(), shape);
  return res;
}

// Преобразование JSON в список координат точек
strokes parse_json(const std::string& x) {
  auto j = json::parse(x);
  // Результат парсинга должен быть массивом
  if (!j.is_array()) {
    throw std::runtime_error("'x' must be JSON array.");
  }
  strokes res;
  res.reserve(j.size());
  for (const auto& a: j) {
    // Каждый элемент массива должен быть 2-мерным массивом
    if (!a.is_array() || a.size() != 2) {
      throw std::runtime_error("'x' must include only 2d arrays.");
    }
    // Извлечение вектора точек
    auto p = a.get<points>();
    res.push_back(p);
  }
  return res;
}

// Отрисовка линий
// Цвета HSV
cv::Mat ocv_draw_lines(const strokes& x, bool color = true) {
  // Исходный тип матрицы
  auto stype = color ? CV_8UC3 : CV_8UC1;
  // Итоговый тип матрицы
  auto dtype = color ? CV_32FC3 : CV_32FC1;
  auto bg = color ? cv::Scalar(0, 0, 255) : cv::Scalar(255);
  auto col = color ? cv::Scalar(0, 255, 220) : cv::Scalar(0);
  cv::Mat img = cv::Mat(SIZE, SIZE, stype, bg);
  // Количество линий
  size_t n = x.size();
  for (const auto& s: x) {
    // Количество точек в линии
    size_t n_points = s.shape()[1];
    for (size_t i = 0; i < n_points - 1; ++i) {
      // Точка начала штриха
      cv::Point from(s(0, i), s(1, i));
      // Точка окончания штриха
      cv::Point to(s(0, i + 1), s(1, i + 1));
      // Отрисовка линии
      cv::line(img, from, to, col, LINE_WIDTH, LINE_TYPE);
    }
    if (color) {
      // Меняем цвет линии
      col[0] += 180 / n;
    }
  }
  if (color) {
    // Меняем цветовое представление на RGB
    cv::cvtColor(img, img, cv::COLOR_HSV2RGB);
  }
  // Меняем формат представления на float32 с диапазоном [0, 1]
  img.convertTo(img, dtype, 1 / 255.0);
  return img;
}

// Обработка JSON и получение тензора с данными изображения
xtensor3d process(const std::string& x, double scale = 1.0, bool color = true) {
  auto p = parse_json(x);
  auto img = ocv_draw_lines(p, color);
  if (scale != 1) {
    cv::Mat out;
    cv::resize(img, out, cv::Size(), scale, scale, RESIZE_TYPE);
    cv::swap(img, out);
    out.release();
  }
  xtensor3d arr = color ? to_xt<double,3>(img) : to_xt<double,1>(img);
  return arr;
}

// [[Rcpp::export]]
rtensor3d cpp_process_json_str(const std::string& x, 
                               double scale = 1.0, 
                               bool color = true) {
  xtensor3d res = process(x, scale, color);
  return res;
}

// [[Rcpp::export]]
rtensor4d cpp_process_json_vector(const std::vector<std::string>& x, 
                                  double scale = 1.0, 
                                  bool color = false) {
  size_t n = x.size();
  size_t dim = floor(SIZE * scale);
  size_t channels = color ? 3 : 1;
  xtensor4d res({n, dim, dim, channels});
  parallelFor(0, n, [&x, &res, scale, color](int i) {
    xtensor3d tmp = process(x[i], scale, color);
    auto view = xt::view(res, i, xt::all(), xt::all(), xt::all());
    view = tmp;
  });
  return res;
}

Pono e hoʻokomo i kēia code i ka faila src/cv_xt.cpp a hui pū me ke kauoha Rcpp::sourceCpp(file = "src/cv_xt.cpp", env = .GlobalEnv); pono no ka hana nlohmann/json.hpp mai waihona waihona. Ua māhele ʻia ke code i kekahi mau hana:

  • to_xt - kahi hana i hoʻohālikelike ʻia no ka hoʻololi ʻana i kahi matrix kiʻi (cv::Mat) i kahi tensor xt::xtensor;

  • parse_json - hoʻopau ka hana i kahi kaula JSON, unuhi i nā hoʻonohonoho o nā kiko, hoʻopili iā lākou i kahi vector;

  • ocv_draw_lines - mai ka hopena vector o nā kiko, e kaha kiʻi i nā laina ʻokoʻa;

  • process - hoʻohui i nā hana i luna a hoʻohui pū i ka hiki ke hoʻonui i ke kiʻi hopena;

  • cpp_process_json_str - wīwī ma luna o ka hana process, ka mea e hoʻokuʻu aku i ka hopena i kahi R-object (multidimensional array);

  • cpp_process_json_vector - wīwī ma luna o ka hana cpp_process_json_str, ka mea e hiki ai iā ʻoe ke hana i kahi vector string ma ke ʻano multi-threaded.

No ke kahakiʻi ʻana i nā laina ʻulaʻula, ua hoʻohana ʻia ke kumu hoʻohālike HSV, a laila hoʻololi ʻia i RGB. E ho'āʻo kākou i ka hopena:

arr <- cpp_process_json_str(tmp_data[4, drawing])
dim(arr)
# [1] 256 256   3
plot(magick::image_read(arr))

Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural
Hoʻohālikelike i ka wikiwiki o nā hoʻokō ma R a me C++

res_bench <- bench::mark(
  r_process_json_str(tmp_data[4, drawing], scale = 0.5),
  cpp_process_json_str(tmp_data[4, drawing], scale = 0.5),
  check = FALSE,
  min_iterations = 100
)
# Параметры бенчмарка
cols <- c("expression", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]

#   expression                min     median       max `itr/sec` total_time  n_itr
#   <chr>                <bch:tm>   <bch:tm>  <bch:tm>     <dbl>   <bch:tm>  <int>
# 1 r_process_json_str     3.49ms     3.55ms    4.47ms      273.      490ms    134
# 2 cpp_process_json_str   1.94ms     2.02ms    5.32ms      489.      497ms    243

library(ggplot2)
# Проведение замера
res_bench <- bench::press(
  batch_size = 2^(4:10),
  {
    .data <- tmp_data[sample(seq_len(.N), batch_size), drawing]
    bench::mark(
      r_process_json_vector(.data, scale = 0.5),
      cpp_process_json_vector(.data,  scale = 0.5),
      min_iterations = 50,
      check = FALSE
    )
  }
)

res_bench[, cols]

#    expression   batch_size      min   median      max `itr/sec` total_time n_itr
#    <chr>             <dbl> <bch:tm> <bch:tm> <bch:tm>     <dbl>   <bch:tm> <int>
#  1 r                   16   50.61ms  53.34ms  54.82ms    19.1     471.13ms     9
#  2 cpp                 16    4.46ms   5.39ms   7.78ms   192.      474.09ms    91
#  3 r                   32   105.7ms 109.74ms 212.26ms     7.69        6.5s    50
#  4 cpp                 32    7.76ms  10.97ms  15.23ms    95.6     522.78ms    50
#  5 r                   64  211.41ms 226.18ms 332.65ms     3.85      12.99s    50
#  6 cpp                 64   25.09ms  27.34ms  32.04ms    36.0        1.39s    50
#  7 r                  128   534.5ms 627.92ms 659.08ms     1.61      31.03s    50
#  8 cpp                128   56.37ms  58.46ms  66.03ms    16.9        2.95s    50
#  9 r                  256     1.15s    1.18s    1.29s     0.851     58.78s    50
# 10 cpp                256  114.97ms 117.39ms 130.09ms     8.45       5.92s    50
# 11 r                  512     2.09s    2.15s    2.32s     0.463       1.8m    50
# 12 cpp                512  230.81ms  235.6ms 261.99ms     4.18      11.97s    50
# 13 r                 1024        4s    4.22s     4.4s     0.238       3.5m    50
# 14 cpp               1024  410.48ms 431.43ms 462.44ms     2.33      21.45s    50

ggplot(res_bench, aes(x = factor(batch_size), y = median, 
                      group =  expression, color = expression)) +
  geom_point() +
  geom_line() +
  ylab("median time, s") +
  theme_minimal() +
  scale_color_discrete(name = "", labels = c("cpp", "r")) +
  theme(legend.position = "bottom") 

Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural

E like me kāu e ʻike ai, ua lilo ka piʻi wikiwiki i mea koʻikoʻi, a ʻaʻole hiki ke hopu i ka code C ++ ma ka hoʻohālikelike ʻana i ka code R.

3. Iterators no ka wehe ʻana i nā pūʻulu mai ka waihona

Loaʻa iā R kahi inoa maikaʻi no ka hoʻoili ʻana i ka ʻikepili i kūpono i ka RAM, ʻoiai ʻo Python ka mea i ʻike ʻia e ka hoʻoili ʻikepili iterative, e ʻae iā ʻoe e hoʻokō maʻalahi a maʻamau i nā helu ʻana i waho (ka helu ʻana me ka hoʻomanaʻo waho). ʻO kahi hiʻohiʻona maʻamau a kūpono hoʻi no mākou i ka pōʻaiapili o ka pilikia i wehewehe ʻia ʻo ia nā ʻupena neural hohonu i hoʻomaʻamaʻa ʻia e ke ʻano gradient descent me ka hoʻohālikelike ʻana i ka gradient i kēlā me kēia ʻanuʻu me ka hoʻohana ʻana i kahi hapa liʻiliʻi o ka nānā ʻana, a i ʻole mini-batch.

Loaʻa i nā papa hana hoʻonaʻauao hohonu i kākau ʻia ma Python nā papa kūikawā e hoʻokō i nā iterator e pili ana i ka ʻikepili: nā papa, nā kiʻi i loko o nā waihona, nā format binary, etc. Hiki iā ʻoe ke hoʻohana i nā koho i mākaukau a kākau paha i kāu ponoʻī no nā hana kikoʻī. Ma R hiki iā mākou ke hoʻohana i nā hiʻohiʻona āpau o ka waihona Python paʻaumaha me kona mau hope like ʻole me ka hoʻohana ʻana i ka pūʻolo o ka inoa hoʻokahi, e hana ana ma luna o ka pūʻolo hoʻopaʻapaʻa. Pono ka hope i kahi ʻatikala lōʻihi kaʻawale; ʻAʻole ia e ʻae iā ʻoe e holo i ka code Python mai R, akā hiki iā ʻoe ke hoʻololi i nā mea ma waena o nā hui R a me Python, e hana maʻalahi i nā hoʻololi ʻano pono āpau.

Hoʻopau mākou i ka pono e mālama i nā ʻikepili āpau i ka RAM ma o ka hoʻohana ʻana iā MonetDBite, e hana ʻia nā hana "neural network" āpau e ke code kumu ma Python, pono mākou e kākau i kahi iterator ma luna o ka ʻikepili, no ka mea ʻaʻohe mea mākaukau. no kēlā ʻano kūlana ma R a i ʻole Python. ʻElua wale nō koi no ia mea: pono e hoʻihoʻi i nā pūʻulu i kahi loop pau ʻole a mālama i kona mokuʻāina ma waena o nā hoʻololi (ua hoʻokō ʻia ka hope ma R ma ke ala maʻalahi me ka hoʻohana ʻana i nā pani). Ma mua, ua koi ʻia e hoʻololi pololei i nā arrays R i nā arrays numpy i loko o ka iterator, akā ʻo ka mana o kēia manawa o ka pōʻai. paʻaumaha hana ia ia iho.

ʻO ka mea hoʻomaʻamaʻa no ka hoʻomaʻamaʻa ʻana a me ka ʻikepili hōʻoia i ʻike ʻia e like me kēia:

Iterator no ke aʻo ʻana a me ka ʻikepili hōʻoia

train_generator <- function(db_connection = con,
                            samples_index,
                            num_classes = 340,
                            batch_size = 32,
                            scale = 1,
                            color = FALSE,
                            imagenet_preproc = FALSE) {
  # Проверка аргументов
  checkmate::assert_class(con, "DBIConnection")
  checkmate::assert_integerish(samples_index)
  checkmate::assert_count(num_classes)
  checkmate::assert_count(batch_size)
  checkmate::assert_number(scale, lower = 0.001, upper = 5)
  checkmate::assert_flag(color)
  checkmate::assert_flag(imagenet_preproc)

  # Перемешиваем, чтобы брать и удалять использованные индексы батчей по порядку
  dt <- data.table::data.table(id = sample(samples_index))
  # Проставляем номера батчей
  dt[, batch := (.I - 1L) %/% batch_size + 1L]
  # Оставляем только полные батчи и индексируем
  dt <- dt[, if (.N == batch_size) .SD, keyby = batch]
  # Устанавливаем счётчик
  i <- 1
  # Количество батчей
  max_i <- dt[, max(batch)]

  # Подготовка выражения для выгрузки
  sql <- sprintf(
    "PREPARE SELECT drawing, label_int FROM doodles WHERE id IN (%s)",
    paste(rep("?", batch_size), collapse = ",")
  )
  res <- DBI::dbSendQuery(con, sql)

  # Аналог keras::to_categorical
  to_categorical <- function(x, num) {
    n <- length(x)
    m <- numeric(n * num)
    m[x * n + seq_len(n)] <- 1
    dim(m) <- c(n, num)
    return(m)
  }

  # Замыкание
  function() {
    # Начинаем новую эпоху
    if (i > max_i) {
      dt[, id := sample(id)]
      data.table::setkey(dt, batch)
      # Сбрасываем счётчик
      i <<- 1
      max_i <<- dt[, max(batch)]
    }

    # ID для выгрузки данных
    batch_ind <- dt[batch == i, id]
    # Выгрузка данных
    batch <- DBI::dbFetch(DBI::dbBind(res, as.list(batch_ind)), n = -1)

    # Увеличиваем счётчик
    i <<- i + 1

    # Парсинг JSON и подготовка массива
    batch_x <- cpp_process_json_vector(batch$drawing, scale = scale, color = color)
    if (imagenet_preproc) {
      # Шкалирование c интервала [0, 1] на интервал [-1, 1]
      batch_x <- (batch_x - 0.5) * 2
    }

    batch_y <- to_categorical(batch$label_int, num_classes)
    result <- list(batch_x, batch_y)
    return(result)
  }
}

Lawe ʻia ka hana ma ke ʻano he mea hoʻololi me kahi pilina i ka waihona, nā helu o nā laina i hoʻohana ʻia, ka helu o nā papa, ka nui o ka hui, ka pālākiō (scale = 1 e pili ana i ka lawe ʻana i nā kiʻi o 256x256 pixels, scale = 0.5 — 128x128 pixels), hōʻailona kala (color = FALSE hōʻike i ka hāʻawi ʻana i ke kala hina ke hoʻohana ʻia color = TRUE kahakiʻi ʻia kēlā me kēia hahau i kahi kala hou) a me kahi hōʻailona preprocessing no nā pūnaewele i hoʻomaʻamaʻa mua ʻia ma ka imagenet. Pono ka mea hope i mea e hoʻonui ai i nā waiwai pixel mai ka wā [0, 1] a i ka wā [-1, 1], i hoʻohana ʻia i ka wā e aʻo ai i ka mea i hoʻolako ʻia. paʻaumaha nā hiʻohiʻona.

Aia i loko o ka hana waho ka nānā ʻana i ke ʻano hoʻopaʻapaʻa, kahi pākaukau data.table me nā helu laina huikau ʻole mai samples_index a me nā helu pūʻulu, counter a me ka helu kiʻekiʻe o nā pūʻulu, a me kahi hōʻike SQL no ka wehe ʻana i ka ʻikepili mai ka waihona. Eia kekahi, ua wehewehe mākou i kahi analogue wikiwiki o ka hana i loko keras::to_categorical(). Ua hoʻohana mākou i nā ʻikepili āpau no ka hoʻomaʻamaʻa ʻana, e waiho ana i ka hapalua pakeneka no ka hōʻoia ʻana, no laila ua kaupalena ʻia ka nui o ka manawa e ka ʻāpana. steps_per_epoch ke kaheaia keras::fit_generator(), a me ke kūlana if (i > max_i) Ua hana wale no ka mea hoʻoponopono hōʻoia.

I loko o ka hana i loko, e kiʻi ʻia nā papa kuhikuhi lālani no ka pūʻulu aʻe, hoʻokuʻu ʻia nā moʻolelo mai ka waihona me ka piʻi ʻana o ka pūʻulu, JSON parsing (hana. cpp_process_json_vector(), i kākau ʻia ma C++) a me ka hana ʻana i nā arrays e pili ana i nā kiʻi. A laila hana ʻia nā vectors wela hoʻokahi me nā lepili papa, nā arrays me nā waiwai pixel a me nā lepili i hui ʻia i kahi papa inoa, ʻo ia ka waiwai hoʻihoʻi. No ka wikiwiki o ka hana, ua hoʻohana mākou i ka hana ʻana i nā indexes i nā papa data.table a me ka hoʻololi ʻana ma o ka loulou - me ka ʻole o kēia pūʻolo "chips" ʻikepili.table He mea paʻakikī ke noʻonoʻo i ka hana maikaʻi ʻana me ka nui o ka ʻikepili ma R.

ʻO nā hopena o nā ana wikiwiki ma kahi kamepiula Core i5 penei:

Hōʻailona hōʻailona

library(Rcpp)
library(keras)
library(ggplot2)

source("utils/rcpp.R")
source("utils/keras_iterator.R")

con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), Sys.getenv("DBDIR"))

ind <- seq_len(DBI::dbGetQuery(con, "SELECT count(*) FROM doodles")[[1L]])
num_classes <- DBI::dbGetQuery(con, "SELECT max(label_int) + 1 FROM doodles")[[1L]]

# Индексы для обучающей выборки
train_ind <- sample(ind, floor(length(ind) * 0.995))
# Индексы для проверочной выборки
val_ind <- ind[-train_ind]
rm(ind)
# Коэффициент масштаба
scale <- 0.5

# Проведение замера
res_bench <- bench::press(
  batch_size = 2^(4:10),
  {
    it1 <- train_generator(
      db_connection = con,
      samples_index = train_ind,
      num_classes = num_classes,
      batch_size = batch_size,
      scale = scale
    )
    bench::mark(
      it1(),
      min_iterations = 50L
    )
  }
)
# Параметры бенчмарка
cols <- c("batch_size", "min", "median", "max", "itr/sec", "total_time", "n_itr")
res_bench[, cols]

#   batch_size      min   median      max `itr/sec` total_time n_itr
#        <dbl> <bch:tm> <bch:tm> <bch:tm>     <dbl>   <bch:tm> <int>
# 1         16     25ms  64.36ms   92.2ms     15.9       3.09s    49
# 2         32   48.4ms 118.13ms 197.24ms     8.17       5.88s    48
# 3         64   69.3ms 117.93ms 181.14ms     8.57       5.83s    50
# 4        128  157.2ms 240.74ms 503.87ms     3.85      12.71s    49
# 5        256  359.3ms 613.52ms 988.73ms     1.54       30.5s    47
# 6        512  884.7ms    1.53s    2.07s     0.674      1.11m    45
# 7       1024     2.7s    3.83s    5.47s     0.261      2.81m    44

ggplot(res_bench, aes(x = factor(batch_size), y = median, group = 1)) +
    geom_point() +
    geom_line() +
    ylab("median time, s") +
    theme_minimal()

DBI::dbDisconnect(con, shutdown = TRUE)

Hoʻomaopopo wikiwiki Doodle: pehea e hana ai i mau hoaaloha me R, C ++ a me nā pūnaewele neural

Inā loaʻa iā ʻoe ka nui o ka RAM, hiki iā ʻoe ke wikiwiki wikiwiki i ka hana o ka waihona ma ka hoʻoili ʻana iā ia i kēia RAM like (32 GB lawa no kā mākou hana). Ma Linux, ua kau ʻia ka ʻāpana ma ka paʻamau /dev/shm, e noho ana a hiki i ka hapalua o ka mana RAM. Hiki iā ʻoe ke hōʻailona hou aku ma ka hoʻoponopono ʻana /etc/fstabe kiʻi i kahi moʻolelo e like me tmpfs /dev/shm tmpfs defaults,size=25g 0 0. E hōʻoia e hoʻomaka hou a nānā i ka hopena ma ka holo ʻana i ke kauoha df -h.

ʻOi aku ka maʻalahi o ka mea hoʻāʻo no ka ʻikepili hōʻike, no ka mea, pili pono ka ʻikepili hōʻike i ka RAM:

Mea hoʻololi no ka ʻikepili hoʻāʻo

test_generator <- function(dt,
                           batch_size = 32,
                           scale = 1,
                           color = FALSE,
                           imagenet_preproc = FALSE) {

  # Проверка аргументов
  checkmate::assert_data_table(dt)
  checkmate::assert_count(batch_size)
  checkmate::assert_number(scale, lower = 0.001, upper = 5)
  checkmate::assert_flag(color)
  checkmate::assert_flag(imagenet_preproc)

  # Проставляем номера батчей
  dt[, batch := (.I - 1L) %/% batch_size + 1L]
  data.table::setkey(dt, batch)
  i <- 1
  max_i <- dt[, max(batch)]

  # Замыкание
  function() {
    batch_x <- cpp_process_json_vector(dt[batch == i, drawing], 
                                       scale = scale, color = color)
    if (imagenet_preproc) {
      # Шкалирование c интервала [0, 1] на интервал [-1, 1]
      batch_x <- (batch_x - 0.5) * 2
    }
    result <- list(batch_x)
    i <<- i + 1
    return(result)
  }
}

4. Ke koho ʻana i ke ʻano hoʻohālike

ʻO ka papa hana mua i hoʻohana ʻia mobilenet v1, nā hiʻohiʻona i kūkākūkāʻia ma kēia memo. Hoʻokomoʻia e like me ke kūlana paʻaumaha a, no laila, ua loaʻa i loko o ka pūʻolo o ka inoa like no R. Akā, i ka wā e ho'āʻo ai e hoʻohana me nā kiʻi kiʻi hoʻokahi-kanal, ua ʻike ʻia kahi mea ʻē aʻe: pono e loaʻa i ka tensor hoʻokomo ke ana. (batch, height, width, 3), 'o ia ho'i, 'a'ole hiki ke ho'ololi 'ia ka helu o nā kaha. ʻAʻohe palena like ʻole ma Python, no laila ua wikiwiki mākou a kākau i kā mākou hoʻokō ponoʻī ʻana i kēia hoʻolālā, ma hope o ka ʻatikala kumu (me ka ʻole o ka haʻalele ʻana i ka mana paʻakikī):

Hoʻolālā ʻo Mobilenet v1

library(keras)

top_3_categorical_accuracy <- custom_metric(
    name = "top_3_categorical_accuracy",
    metric_fn = function(y_true, y_pred) {
         metric_top_k_categorical_accuracy(y_true, y_pred, k = 3)
    }
)

layer_sep_conv_bn <- function(object, 
                              filters,
                              alpha = 1,
                              depth_multiplier = 1,
                              strides = c(2, 2)) {

  # NB! depth_multiplier !=  resolution multiplier
  # https://github.com/keras-team/keras/issues/10349

  layer_depthwise_conv_2d(
    object = object,
    kernel_size = c(3, 3), 
    strides = strides,
    padding = "same",
    depth_multiplier = depth_multiplier
  ) %>%
  layer_batch_normalization() %>% 
  layer_activation_relu() %>%
  layer_conv_2d(
    filters = filters * alpha,
    kernel_size = c(1, 1), 
    strides = c(1, 1)
  ) %>%
  layer_batch_normalization() %>% 
  layer_activation_relu() 
}

get_mobilenet_v1 <- function(input_shape = c(224, 224, 1),
                             num_classes = 340,
                             alpha = 1,
                             depth_multiplier = 1,
                             optimizer = optimizer_adam(lr = 0.002),
                             loss = "categorical_crossentropy",
                             metrics = c("categorical_crossentropy",
                                         top_3_categorical_accuracy)) {

  inputs <- layer_input(shape = input_shape)

  outputs <- inputs %>%
    layer_conv_2d(filters = 32, kernel_size = c(3, 3), strides = c(2, 2), padding = "same") %>%
    layer_batch_normalization() %>% 
    layer_activation_relu() %>%
    layer_sep_conv_bn(filters = 64, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 128, strides = c(2, 2)) %>%
    layer_sep_conv_bn(filters = 128, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 256, strides = c(2, 2)) %>%
    layer_sep_conv_bn(filters = 256, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 512, strides = c(2, 2)) %>%
    layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 512, strides = c(1, 1)) %>%
    layer_sep_conv_bn(filters = 1024, strides = c(2, 2)) %>%
    layer_sep_conv_bn(filters = 1024, strides = c(1, 1)) %>%
    layer_global_average_pooling_2d() %>%
    layer_dense(units = num_classes) %>%
    layer_activation_softmax()

    model <- keras_model(
      inputs = inputs,
      outputs = outputs
    )

    model %>% compile(
      optimizer = optimizer,
      loss = loss,
      metrics = metrics
    )

    return(model)
}

ʻIke ʻia nā hemahema o kēia ala. Makemake wau e ho'āʻo i nā hiʻohiʻona he nui, akā ma ka mea ʻē aʻe, ʻaʻole wau makemake e kākau hou i kēlā me kēia hale hana me ka lima. Ua hoʻonele ʻia mākou i ka manawa e hoʻohana ai i nā paona o nā hiʻohiʻona i hoʻomaʻamaʻa mua ʻia ma imagenet. E like me ka mea maʻamau, ua kōkua ke aʻo ʻana i nā palapala. Hana get_config() hiki iā ʻoe ke kiʻi i ka wehewehe ʻana o ke kumu hoʻohālike ma kahi ʻano kūpono no ka hoʻoponopono ʻana (base_model_conf$layers - he papa inoa R maʻamau), a me ka hana from_config() hana i ka hoohuli hope ana i kekahi mea kükohu:

base_model_conf <- get_config(base_model)
base_model_conf$layers[[1]]$config$batch_input_shape[[4]] <- 1L
base_model <- from_config(base_model_conf)

I kēia manawa ʻaʻole paʻakikī ke kākau i kahi hana āpau e loaʻa ai kekahi o nā mea i hoʻolako ʻia paʻaumaha nā kumu hoʻohālike me nā mea kaupaona ʻole i aʻo ʻia ma imagenet:

Hana no ka hoʻouka ʻana i nā hale hana i mākaukau

get_model <- function(name = "mobilenet_v2",
                      input_shape = NULL,
                      weights = "imagenet",
                      pooling = "avg",
                      num_classes = NULL,
                      optimizer = keras::optimizer_adam(lr = 0.002),
                      loss = "categorical_crossentropy",
                      metrics = NULL,
                      color = TRUE,
                      compile = FALSE) {
  # Проверка аргументов
  checkmate::assert_string(name)
  checkmate::assert_integerish(input_shape, lower = 1, upper = 256, len = 3)
  checkmate::assert_count(num_classes)
  checkmate::assert_flag(color)
  checkmate::assert_flag(compile)

  # Получаем объект из пакета keras
  model_fun <- get0(paste0("application_", name), envir = asNamespace("keras"))
  # Проверка наличия объекта в пакете
  if (is.null(model_fun)) {
    stop("Model ", shQuote(name), " not found.", call. = FALSE)
  }

  base_model <- model_fun(
    input_shape = input_shape,
    include_top = FALSE,
    weights = weights,
    pooling = pooling
  )

  # Если изображение не цветное, меняем размерность входа
  if (!color) {
    base_model_conf <- keras::get_config(base_model)
    base_model_conf$layers[[1]]$config$batch_input_shape[[4]] <- 1L
    base_model <- keras::from_config(base_model_conf)
  }

  predictions <- keras::get_layer(base_model, "global_average_pooling2d_1")$output
  predictions <- keras::layer_dense(predictions, units = num_classes, activation = "softmax")
  model <- keras::keras_model(
    inputs = base_model$input,
    outputs = predictions
  )

  if (compile) {
    keras::compile(
      object = model,
      optimizer = optimizer,
      loss = loss,
      metrics = metrics
    )
  }

  return(model)
}

I ka hoʻohana ʻana i nā kiʻi kiʻi ala hoʻokahi, ʻaʻole hoʻohana ʻia nā paona i hoʻomaʻamaʻa mua ʻia. Hiki ke hoʻopaʻa ʻia kēia: me ka hoʻohana ʻana i ka hana get_weights() e kiʻi i nā paona hoʻohālike ma ke ʻano o ka papa inoa o nā arrays R, e hoʻololi i ke ana o ka mea mua o kēia papa inoa (ma ka lawe ʻana i hoʻokahi kaila kala a i ʻole ka awelika ʻana i nā mea ʻekolu), a laila hoʻouka hou i nā kaupaona i loko o ke kumu hoʻohālike me ka hana. set_weights(). ʻAʻole mākou i hoʻohui i kēia hana, no ka mea, i kēia manawa ua maopopo ʻoi aku ka maikaʻi o ka hana ʻana me nā kiʻi kala.

Ua hana mākou i ka hapa nui o nā hoʻokolohua me ka mobilenet versions 1 a me 2, a me resnet34. Ua hoʻokō maikaʻi ʻia nā hale hana hou hou e like me SE-ResNeXt i kēia hoʻokūkū. ʻO ka mea pōʻino, ʻaʻole i loaʻa iā mākou nā hoʻokō mākaukau i kā mākou makemake, a ʻaʻole mākou i kākau i kā mākou iho (akā mākou e kākau maoli).

5. Ka hoʻohālikelike ʻana o nā palapala

No ka maʻalahi, ua hoʻolālā ʻia nā code āpau no ka hoʻomaka ʻana i ka hoʻomaʻamaʻa ʻana ma ke ʻano he palapala hoʻokahi, i hoʻohālikelike ʻia me ka hoʻohana ʻana docopt penei:

doc <- '
Usage:
  train_nn.R --help
  train_nn.R --list-models
  train_nn.R [options]

Options:
  -h --help                   Show this message.
  -l --list-models            List available models.
  -m --model=<model>          Neural network model name [default: mobilenet_v2].
  -b --batch-size=<size>      Batch size [default: 32].
  -s --scale-factor=<ratio>   Scale factor [default: 0.5].
  -c --color                  Use color lines [default: FALSE].
  -d --db-dir=<path>          Path to database directory [default: Sys.getenv("db_dir")].
  -r --validate-ratio=<ratio> Validate sample ratio [default: 0.995].
  -n --n-gpu=<number>         Number of GPUs [default: 1].
'
args <- docopt::docopt(doc)

Hoʻolapala docopt hōʻike i ka hoʻokō http://docopt.org/ no R. Me kona kōkua, hoʻomaka ʻia nā palapala me nā kauoha maʻalahi e like me Rscript bin/train_nn.R -m resnet50 -c -d /home/andrey/doodle_db ai ole ia, ./bin/train_nn.R -m resnet50 -c -d /home/andrey/doodle_db, ina waihona train_nn.R hiki ke hoʻokō (e hoʻomaka ana kēia kauoha e aʻo i ke kumu hoʻohālike resnet50 ma nā kiʻi ʻekolu kala e ana i 128x128 pixels, pono e loaʻa ka waihona i loko o ka waihona /home/andrey/doodle_db). Hiki iā ʻoe ke hoʻohui i ka wikiwiki o ke aʻo ʻana, ke ʻano optimizer, a me nā ʻāpana ʻokoʻa ʻē aʻe i ka papa inoa. Ma ke kaʻina hana o ka hoʻomākaukau ʻana i ka paʻi ʻana, ua ʻike ʻia ʻo ka hale hana mobilenet_v2 mai ka mana o keia manawa paʻaumaha ma R hoohana ʻaʻole hiki ma muli o nā hoʻololi ʻaʻole i noʻonoʻo ʻia i ka pā R, ke kali nei mākou iā lākou e hoʻoponopono.

ʻO kēia ala i hiki ai ke wikiwiki i nā hoʻokolohua me nā hiʻohiʻona like ʻole i hoʻohālikelike ʻia i ka hoʻomaka ʻana o nā palapala kuʻuna ma RStudio (ʻike mākou i ka pūʻolo ma ke ʻano he ʻokoʻa. tfruns). Akā ʻo ka pōmaikaʻi nui ka hiki ke hoʻokele maʻalahi i ka hoʻomaka ʻana o nā palapala ma Docker a i ʻole ma ke kikowaena, me ka ʻole o ka hoʻokomo ʻana iā RStudio no kēia.

6. Dockerization o nā palapala

Ua hoʻohana mākou iā Docker e hōʻoia i ka portability o ke kaiapuni no ka hoʻomaʻamaʻa ʻana ma waena o nā lālā o ka hui a no ka wikiwiki ʻana i ke ao. Hiki iā ʻoe ke hoʻomaka e kamaʻāina me kēia hāmeʻa, he mea maʻamau ʻole ia no kahi polokalamu R, me kēia moʻo hoʻolaha a i ʻole papa wikiō.

Hāʻawi ʻo Docker iā ʻoe e hana i kāu mau kiʻi ponoʻī mai ka ʻohi ʻana a hoʻohana i nā kiʻi ʻē aʻe i kumu no ka hana ʻana i kāu iho. I ka nānā ʻana i nā koho i loaʻa, ua hiki mākou i ka hopena ʻo ka hoʻokomo ʻana i nā mea hoʻokele NVIDIA, CUDA + cuDNN a me nā hale waihona puke Python he ʻāpana voluminous o ke kiʻi, a ua hoʻoholo mākou e lawe i ke kiʻi kūhelu i kumu. tensorflow/tensorflow:1.12.0-gpu, hoʻohui i nā pūʻolo R pono ma laila.

ʻO ka faila docker hope e like me kēia:

dockerfile

FROM tensorflow/tensorflow:1.12.0-gpu

MAINTAINER Artem Klevtsov <[email protected]>

SHELL ["/bin/bash", "-c"]

ARG LOCALE="en_US.UTF-8"
ARG APT_PKG="libopencv-dev r-base r-base-dev littler"
ARG R_BIN_PKG="futile.logger checkmate data.table rcpp rapidjsonr dbi keras jsonlite curl digest remotes"
ARG R_SRC_PKG="xtensor RcppThread docopt MonetDBLite"
ARG PY_PIP_PKG="keras"
ARG DIRS="/db /app /app/data /app/models /app/logs"

RUN source /etc/os-release && 
    echo "deb https://cloud.r-project.org/bin/linux/ubuntu ${UBUNTU_CODENAME}-cran35/" > /etc/apt/sources.list.d/cran35.list && 
    apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 && 
    add-apt-repository -y ppa:marutter/c2d4u3.5 && 
    add-apt-repository -y ppa:timsc/opencv-3.4 && 
    apt-get update && 
    apt-get install -y locales && 
    locale-gen ${LOCALE} && 
    apt-get install -y --no-install-recommends ${APT_PKG} && 
    ln -s /usr/lib/R/site-library/littler/examples/install.r /usr/local/bin/install.r && 
    ln -s /usr/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r && 
    ln -s /usr/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r && 
    echo 'options(Ncpus = parallel::detectCores())' >> /etc/R/Rprofile.site && 
    echo 'options(repos = c(CRAN = "https://cloud.r-project.org"))' >> /etc/R/Rprofile.site && 
    apt-get install -y $(printf "r-cran-%s " ${R_BIN_PKG}) && 
    install.r ${R_SRC_PKG} && 
    pip install ${PY_PIP_PKG} && 
    mkdir -p ${DIRS} && 
    chmod 777 ${DIRS} && 
    rm -rf /tmp/downloaded_packages/ /tmp/*.rds && 
    rm -rf /var/lib/apt/lists/*

COPY utils /app/utils
COPY src /app/src
COPY tests /app/tests
COPY bin/*.R /app/

ENV DBDIR="/db"
ENV CUDA_HOME="/usr/local/cuda"
ENV PATH="/app:${PATH}"

WORKDIR /app

VOLUME /db
VOLUME /app

CMD bash

No ka maʻalahi, ua hoʻokomo ʻia nā pūʻolo i hoʻohana ʻia i loko o nā loli; kope ʻia ka nui o nā palapala i kākau ʻia i loko o nā pahu i ka wā e ʻākoakoa ai. Ua hoʻololi pū mākou i ka shell command i /bin/bash no ka maʻalahi o ka hoʻohana ʻana i ka ʻike /etc/os-release. Ua pale kēia i ka pono e kuhikuhi i ka mana OS ma ke code.

Eia kekahi, ua kākau ʻia kahi palapala bash liʻiliʻi e hiki ai iā ʻoe ke hoʻomaka i kahi pahu me nā kauoha like ʻole. No ka laʻana, hiki i kēia mau palapala no ka hoʻomaʻamaʻa ʻana i nā ʻupena neural i waiho mua ʻia i loko o ka ipu, a i ʻole he pūpū kauoha no ka debugging a me ka nānā ʻana i ka hana o ka ipu:

Script e hoʻomaka i ka ipu

#!/bin/sh

DBDIR=${PWD}/db
LOGSDIR=${PWD}/logs
MODELDIR=${PWD}/models
DATADIR=${PWD}/data
ARGS="--runtime=nvidia --rm -v ${DBDIR}:/db -v ${LOGSDIR}:/app/logs -v ${MODELDIR}:/app/models -v ${DATADIR}:/app/data"

if [ -z "$1" ]; then
    CMD="Rscript /app/train_nn.R"
elif [ "$1" = "bash" ]; then
    ARGS="${ARGS} -ti"
else
    CMD="Rscript /app/train_nn.R $@"
fi

docker run ${ARGS} doodles-tf ${CMD}

Inā holo kēia palapala bash me ka ʻole o nā ʻāpana, e kāhea ʻia ka palapala i loko o ka ipu train_nn.R me nā waiwai paʻamau; inā ʻo "bash" ka hoʻopaʻapaʻa kūlana mua, a laila e hoʻomaka ka pahu me kahi pūpū kauoha. I nā hihia ʻē aʻe, ua hoʻololi ʻia nā waiwai o nā manaʻo kūlana: CMD="Rscript /app/train_nn.R $@".

He mea pono ke hoʻomaopopo ʻia ʻo nā papa kuhikuhi me ka ʻikepili kumu a me ka waihona, a me ka papa kuhikuhi no ka mālama ʻana i nā hiʻohiʻona i hoʻomaʻamaʻa ʻia, ua kau ʻia i loko o ka pahu mai ka ʻōnaehana hoʻokipa, e hiki ai iā ʻoe ke komo i nā hopena o nā palapala me ka ʻole o nā manipulations pono ʻole.

7. Ke hoʻohana nei i nā GPU he nui ma Google Cloud

ʻO kekahi o nā hiʻohiʻona o ka hoʻokūkū ʻo ia ka ʻikepili walaʻau loa (e ʻike i ke kiʻi poʻo inoa, ʻaiʻē mai @Leigh.plt mai ODS slack). Kōkua nā pūʻulu nui i ka hakakā ʻana i kēia, a ma hope o nā hoʻokolohua ma kahi PC me 1 GPU, ua hoʻoholo mākou e haku i nā hiʻohiʻona hoʻomaʻamaʻa ma kekahi mau GPU i ke ao. Hoʻohana ʻia ʻo GoogleCloud (alakaʻi maikaʻi i nā kumu) ma muli o ke koho nui o nā hoʻonohonoho i loaʻa, nā kumukūʻai kūpono a me ka bonus $300. Ma muli o ke kuko, ua kauoha au i kahi hihia 4xV100 me kahi SSD a me kahi ton o RAM, a he hewa nui kēlā. ʻAi ʻia kēlā mīkini i ke kālā me ka wikiwiki; hiki iā ʻoe ke hele haʻihaʻi i ka hoʻokolohua me ka ʻole o kahi pipeline i hōʻoia ʻia. No nā kumu hoʻonaʻauao, ʻoi aku ka maikaʻi o ka lawe ʻana i ka K80. Akā ua hiki mai ka nui o ka RAM - ʻaʻole i hoʻohauʻoli ke kapua SSD i kāna hana, no laila ua hoʻololi ʻia ka waihona dev/shm.

ʻO ka hoihoi nui ka ʻāpana code no ka hoʻohana ʻana i nā GPU he nui. ʻO ka mea mua, ua hana ʻia ke kumu hoʻohālike ma ka CPU me ka hoʻohana ʻana i kahi mana pōʻaiapili, e like me Python:

with(tensorflow::tf$device("/cpu:0"), {
  model_cpu <- get_model(
    name = model_name,
    input_shape = input_shape,
    weights = weights,
    metrics =(top_3_categorical_accuracy,
    compile = FALSE
  )
})

A laila ua kope ʻia ka hiʻohiʻona uncompiled (he mea koʻikoʻi kēia) i kahi helu o nā GPU i loaʻa, a ma hope wale nō e hoʻopili ʻia:

model <- keras::multi_gpu_model(model_cpu, gpus = n_gpu)
keras::compile(
  object = model,
  optimizer = keras::optimizer_adam(lr = 0.0004),
  loss = "categorical_crossentropy",
  metrics = c(top_3_categorical_accuracy)
)

ʻAʻole hiki ke hoʻokō ʻia ke ʻano hana maʻamau o ka hoʻoheheʻe ʻana i nā papa āpau koe wale nō ka mea hope loa, ke aʻo ʻana i ka papa hope, unfreezing a me ka hoʻomaʻamaʻa hou ʻana i ke kumu hoʻohālike holoʻokoʻa no kekahi mau GPU.

Ua nānā ʻia ke aʻo ʻana me ka hoʻohana ʻole. papa ʻuʻuku, ka palena ʻana iā mākou iho i ka hoʻopaʻa ʻana i nā lāʻau a me ka mālama ʻana i nā kumu hoʻohālike me nā inoa ʻike ma hope o kēlā me kēia manawa:

Kāhea hou

# Шаблон имени файла лога
log_file_tmpl <- file.path("logs", sprintf(
  "%s_%d_%dch_%s.csv",
  model_name,
  dim_size,
  channels,
  format(Sys.time(), "%Y%m%d%H%M%OS")
))
# Шаблон имени файла модели
model_file_tmpl <- file.path("models", sprintf(
  "%s_%d_%dch_{epoch:02d}_{val_loss:.2f}.h5",
  model_name,
  dim_size,
  channels
))

callbacks_list <- list(
  keras::callback_csv_logger(
    filename = log_file_tmpl
  ),
  keras::callback_early_stopping(
    monitor = "val_loss",
    min_delta = 1e-4,
    patience = 8,
    verbose = 1,
    mode = "min"
  ),
  keras::callback_reduce_lr_on_plateau(
    monitor = "val_loss",
    factor = 0.5, # уменьшаем lr в 2 раза
    patience = 4,
    verbose = 1,
    min_delta = 1e-4,
    mode = "min"
  ),
  keras::callback_model_checkpoint(
    filepath = model_file_tmpl,
    monitor = "val_loss",
    save_best_only = FALSE,
    save_weights_only = FALSE,
    mode = "min"
  )
)

8. Ma kahi o ka hopena

ʻAʻole i hoʻokō ʻia kekahi mau pilikia i loaʻa iā mākou:

  • в paʻaumaha ʻaʻohe hana i mākaukau no ka ʻimi ʻakomi ʻana i ka helu aʻo maikaʻi loa (analogue lr_finder i ka hale waihona puke wikiwiki.ai); Me ka hoʻoikaika ʻana, hiki ke hoʻokomo i nā hoʻokō ʻekolu ʻaoʻao iā R, no ka laʻana, kēia;
  • ma muli o ka helu mua, ʻaʻole hiki ke koho i ka wikiwiki hoʻomaʻamaʻa kūpono i ka wā e hoʻohana ai i kekahi mau GPU;
  • aia ka nele o nā ʻenehana neural network hou, ʻoi aku ka poʻe i hoʻomaʻamaʻa mua ʻia ma imagenet;
  • ʻAʻohe kulekele hoʻokele a me nā helu aʻo hoʻokaʻawale (ʻo ka cosine annealing ma kā mākou noi hoʻokō ʻia, Mahalo skeydan).

He aha nā mea pono i aʻo ʻia mai kēia hoʻokūkū:

  • Ma nā lako hana haʻahaʻa haʻahaʻa, hiki iā ʻoe ke hana me ka maikaʻi (nui ka nui o ka nui o ka RAM) o ka ʻikepili me ka ʻole o ka ʻeha. Eke 'ūlina ʻikepili.table mālama i ka hoʻomanaʻo ma muli o ka hoʻololi ʻana o nā papa ʻaina, ka mea e pale ai i ke kope ʻana iā lākou, a ke hoʻohana pono ʻia, ʻaneʻane hōʻike mau kona hiki i ka wikiwiki kiʻekiʻe ma waena o nā mea hana āpau i ʻike ʻia e mākou no ka kākau ʻana i nā ʻōlelo. ʻO ka mālama ʻana i ka ʻikepili i loko o kahi waihona e hiki ai iā ʻoe, i nā manawa he nui, ʻaʻole e noʻonoʻo pono i ka pono e kaomi i ka dataset āpau i RAM.
  • Hiki ke pani ʻia nā hana lohi ma R me nā hana wikiwiki i C++ me ka hoʻohana ʻana i ka pūʻolo Rcpp. Inā hoʻohui i ka hoʻohana RcppThread ai ole ia, RcppParallel, loaʻa iā mākou nā hoʻokō multi-threaded cross-platform, no laila ʻaʻohe pono e hoʻohālikelike i ke code ma ka pae R.
  • Pūʻolo Rcpp hiki ke hoʻohana ʻia me ka ʻike ʻole o C ++, ua wehewehe ʻia ka palena liʻiliʻi maanei. Nā waihona poʻomanaʻo no kekahi mau hale waihona puke C like xtensor loaʻa ma CRAN, ʻo ia hoʻi, ke kūkulu ʻia nei kahi ʻōnaehana no ka hoʻokō ʻana i nā papahana e hoʻohui i ka code C++ hana kiʻekiʻe i mākaukau i R. ʻO ka hōʻoluʻolu hou aku ka syntax highlighting a me kahi mea hoʻoponopono code C++ static ma RStudio.
  • docopt hiki iā ʻoe ke holo i nā palapala paʻa ponoʻī me nā ʻāpana. He kūpono kēia no ka hoʻohana ʻana ma kahi kikowaena mamao, incl. malalo o docker. Ma RStudio, he mea maʻalahi ka hana ʻana i nā hola he nui o nā hoʻokolohua me ka hoʻomaʻamaʻa ʻana i nā ʻupena neural, a ʻo ka hoʻokomo ʻana i ka IDE ma ke kikowaena ponoʻī ʻaʻole i ʻāpono mau ʻia.
  • Hāʻawi ʻo Docker i ka code portability a me ka reproducibility ma waena o nā mea hoʻomohala me nā ʻano like ʻole o ka OS a me nā hale waihona puke, a me ka maʻalahi o ka hoʻokō ʻana ma nā kikowaena. Hiki iā ʻoe ke hoʻomaka i ka pipeline aʻo holoʻokoʻa me hoʻokahi kauoha.
  • He ala pili kālā ʻo Google Cloud e hoʻāʻo ai i nā lako waiwai nui, akā pono ʻoe e koho pono i nā hoʻonohonoho.
  • He mea maikaʻi loa ke ana ʻana i ka wikiwiki o nā ʻāpana code pākahi, ʻoi aku ka maikaʻi o ka hoʻohui ʻana iā R a me C++, a me ka pūʻolo. wahi - maʻalahi loa.

Ma ke ʻano holoʻokoʻa, ua maikaʻi loa kēia ʻike a hoʻomau mākou i ka hana e hoʻoponopono i kekahi o nā pilikia i hāpai ʻia.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka