Ke ulu ikaika nei nā ʻupena neural i ka ʻike kamepiula, ʻaʻole i hoʻoholo ʻia nā pilikia he nui. No ka hele ʻana ma kāu kahua, e hahai wale i nā influencers ma Twitter a heluhelu i nā ʻatikala kūpono ma arXiv.org. Akā ua loaʻa iā mākou ka manawa e hele ai i ka International Conference on Computer Vision (ICCV) 2019. I kēia makahiki e mālama ʻia ana ma South Korea. I kēia manawa makemake mākou e kaʻana like me ka poʻe heluhelu Habr i nā mea a mākou i ʻike ai a aʻo ai.
He nui mākou ma laila mai Yandex: nā mea hoʻomohala o nā kaʻa kaʻa kaʻa ponoʻī, nā mea noiʻi, a me nā mea e pili ana i nā hana CV i nā lawelawe i hele mai. Akā i kēia manawa makemake mākou e hōʻike i kahi manaʻo o kā mākou hui - ka Machine Intelligence Laboratory (Yandex MILAB). Ua nānā paha nā kāne ʻē aʻe i ka ʻaha kūkā mai ko lākou ʻaoʻao ponoʻī.
He aha ka hana a ka hale hana?Hana mākou i nā papahana hoʻokolohua e pili ana i ka hana ʻana i nā kiʻi a me nā mele no nā hana leʻaleʻa. Makemake nui mākou i nā pūnaewele neural e ʻae iā ʻoe e hoʻololi i ka ʻike mai ka mea hoʻohana (no nā kiʻi, kapa ʻia kēia hana he manipulation kiʻi).
Nui nā ʻaha kūkā ʻepekema, akā kū nā mea kiʻekiʻe, nā mea i kapa ʻia ʻo A * conferences, kahi e paʻi pinepine ʻia ai nā ʻatikala e pili ana i nā ʻenehana hoihoi a koʻikoʻi. ʻAʻohe papa inoa pololei o nā hālāwai kūkā A*, eia kahi papa inoa kokoke a piha ʻole: NeurIPS (NIPS ma mua), ICML, SIGIR, WWW, WSDM, KDD, ACL, CVPR, ICCV, ECCV. ʻO nā mea hope ʻekolu i loea i ke kumuhana CV.
ICCV ma ka nānā ʻana: nā pepa hoʻolaha, nā haʻawina, nā papa hana, nā kū
Ua loaʻa i ka hālāwai kūkā nā pepa 1075, aia nā poʻe komo 7500. Ua hele mai nā kānaka 103 mai Rusia mai, aia nā ʻatikala mai nā limahana o Yandex, Skoltech, Samsung AI Center Moscow a me Samara University. I kēia makahiki, ʻaʻole nui nā mea noiʻi kiʻekiʻe i kipa aku iā ICCV, akā, no ka laʻana, ʻo Alexey (Alyosha) Efros, ka mea e huki mau ai i nā poʻe he nui:
Lakeponahelu
Ma ia mau ʻaha kūkā, hōʻike ʻia nā ʻatikala ma ke ʻano o ka pepa (
Eia kekahi mau hana mai Rusia mai
Me nā aʻo aʻo hiki iā ʻoe ke luʻu i kahi kumuhana kikoʻī; hoʻomanaʻo ia i kahi haʻiʻōlelo ma ke kulanui. Heluhelu ʻia e ka mea hoʻokahi, maʻamau me ka ʻole o ke kamaʻilio ʻana no nā hana kikoʻī. He laʻana o kahi aʻo maikaʻi (
Ma nā papa hana, ma ka ʻokoʻa, kamaʻilio lākou e pili ana i nā ʻatikala. ʻO ka maʻamau, he mau hana kēia ma kekahi kumuhana haiki, nā moʻolelo mai nā poʻo keʻena hana e pili ana i nā hana hou a nā haumāna, a i ʻole nā ʻatikala i ʻae ʻole ʻia i ka ʻaha kūkā nui.
Hele mai nā hui kākoʻo i ICCV me nā kū. I kēia makahiki, hele mai ʻo Google, Facebook, Amazon a me nā hui honua ʻē aʻe, a me ka nui o nā hoʻomaka - Korean a me Kina. Nui nā ʻoihana hoʻomaka i loea i ka hōʻailona ʻikepili. Aia nā hana ma nā kū, hiki iā ʻoe ke lawe i nā mea kūʻai aku a nīnau i nā nīnau. No ka hopu holoholona, he pāʻina nā hui kākoʻo. Hiki iā ʻoe ke komo i loko o lākou inā e hōʻoiaʻiʻo ʻoe i ka poʻe recruiters makemake ʻoe a hiki iā ʻoe ke hele i nā nīnauele. Inā ua paʻi ʻoe i kahi ʻatikala (a i ʻole, hōʻike ʻia), hoʻomaka a hoʻopau paha i kahi PhD, ʻoi aku kēia, akā i kekahi manawa hiki iā ʻoe ke kūkākūkā ma ke kū ma ka nīnau ʻana i nā nīnau hoihoi i nā ʻenekinia o ka hui.
Nā lā
Hāʻawi ka ʻaha kūkā iā ʻoe e nānā i ke kahua CV holoʻokoʻa. Ma ka helu o nā mea hoʻolaha ma kekahi kumuhana, hiki iā ʻoe ke loiloi i ka wela o ke kumuhana. Manaʻo kekahi mau hopena iā lākou iho e pili ana i nā huaʻōlelo:
Zero-pana, hoʻokahi-pana, kakaikahi-pana, mālama ponoʻī a semi-mālama ʻia: nā ala hou i nā hana aʻo lōʻihi.
Ke aʻo nei ka poʻe e hoʻohana pono i ka ʻikepili. Eia kekahi laʻana, ma
3D a me 360°
ʻO nā pilikia i hoʻoponopono nui ʻia no nā kiʻi (segmentation, detection) pono i ka noiʻi hou no nā hiʻohiʻona 3D a me nā wikiō panoramic. Ua ʻike mākou i nā ʻatikala he nui e pili ana i ka hoʻololi ʻana i ka RGB a me RGB-D i 3D. Hiki ke hoʻoponopono maoli ʻia kekahi mau pilikia, e like me ka manaʻo o ke kanaka, ma ka neʻe ʻana i nā hiʻohiʻona 3D. Akā, ʻaʻohe manaʻo ʻokoʻa e pili ana i ke ʻano o ka hōʻike ʻana i nā hiʻohiʻona XNUMXD - ma ke ʻano o ka mesh, point cloud, voxels a i ʻole SDF. Eia kekahi koho:
I nā panoramas, ke ulu ikaika nei nā convolutions ma ka pōʻai (e nānā.
ʻIke pose a me ka wānana neʻe kanaka
Ua loaʻa mua nā holomua i ka ʻike pose ma 2D - i kēia manawa ua neʻe ka manaʻo i ka hana ʻana me nā kāmera lehulehu a ma 3D. No ka laʻana, hiki iā ʻoe ke ʻike i kahi iwi i loko o ka pā ma ka nānā ʻana i nā loli i ka hōʻailona Wi-Fi i ka wā e hele ai i loko o ke kino kanaka.
Nui nā hana i hana ʻia ma ke kahua o ka ʻike ʻana i nā kī kī lima. Ua ʻike ʻia nā ʻikepili hou, e pili ana i nā wikiō o ke kamaʻilio ʻana ma waena o ʻelua poʻe - i kēia manawa hiki iā ʻoe ke wānana i nā hana lima mai ka leo a i ʻole ka kikokikona o kahi kamaʻilio! Ua like ka holomua ma nā hana nānā maka (gaze estimation).
Hiki i kekahi ke ʻike i kahi hui nui o nā hana e pili ana i ka wānana neʻe o ke kanaka (e laʻa,
Hoʻoponopono me nā kānaka i loko o nā kiʻi a me nā wikiō, nā lumi pono virtual
ʻO ke ʻano nui ka hoʻololi ʻana i nā kiʻi maka e like me nā ʻāpana wehewehe. Manaʻo: deepfake e pili ana i hoʻokahi kiʻi, hoʻololi i ka ʻōlelo ma muli o ka hoʻohālikelike ʻana i ka helehelena (
Hana mai nā kiʻi kiʻi/kiʻi
ʻO ka hoʻomohala ʻana o ka manaʻo "E hoʻohua i ka grid i kahi mea e pili ana i ka ʻike ma mua" i lilo i mea ʻē aʻe: "E hōʻike mākou i ke koho i makemake nui iā mākou."
Hoʻokahi o nā ʻatikala 25 Adobe no ICCV e hui pū i ʻelua GAN: hoʻopiha kekahi i ke kiʻi kiʻi no ka mea hoʻohana, hoʻopuka kekahi i kahi kiʻi photorealistic mai ka sketch (
I ka wā ma mua, ʻaʻole pono nā kiʻi i ka hana kiʻi, akā i kēia manawa ua hana ʻia lākou i ipu ʻike e pili ana i ke ʻano. Ua lanakila pū ʻia ka makana Best Paper Honorable Mentions e pili ana i nā hopena o ICCV e ka ʻatikala
Hoʻomaopopo hou i nā kānaka a me nā kaʻa, e helu ana i ka nui o ka lehulehu (!)
Nui nā ʻatikala i hoʻolaʻa ʻia i ka nānā ʻana i nā kānaka a me ka ʻike hou ʻana i nā kānaka a me nā mīkini. Akā ʻo ka mea kahaha iā mākou he pūʻulu ʻatikala e pili ana i ka helu lehulehu, mai Kina a pau.
Nā leka uila
Akā ʻo Facebook, ma kahi ʻē aʻe, hoʻokaʻawale i ke kiʻi. A hana ʻo ia i kēia ma ke ʻano hoihoi: hoʻomaʻamaʻa ia i ka neural network e hana i kahi maka me ka ʻole o nā kikoʻī kūʻokoʻa - like, akā ʻaʻole like i hiki ke ʻike pololei ʻia e nā ʻōnaehana ʻike maka.
Palekana mai ka hoouka kaua
Me ka hoʻomohala ʻana o nā noi ʻike kamepiula i ka honua maoli (i nā kaʻa kaʻa kaʻa ponoʻī, i ka ʻike maka), ke piʻi aʻe nei ka nīnau o ka hilinaʻi o ia mau ʻōnaehana. No ka hoʻohana piha ʻana i ka CV, pono ʻoe e hōʻoia i ka pale ʻana o ka ʻōnaehana i nā hoʻouka kaua - ʻo ia ke kumu ʻaʻole i emi iki nā ʻatikala e pili ana i ka pale ʻana iā lākou ma mua o ka hoʻouka ʻana iā lākou iho. Ua nui nā hana ma ka wehewehe ʻana i nā wanana pūnaewele (palapala saliency) a me ke ana ʻana i ka hilinaʻi i ka hopena.
Nā hana hui
I ka hapa nui o nā hana me ka pahuhopu hoʻokahi, ua pau nā mea hiki ke hoʻomaikaʻi i ka maikaʻi; ʻo kekahi o nā kuhikuhi hou no ka hoʻonui hou ʻana i ka maikaʻi ʻo ke aʻo ʻana i nā neural network e hoʻoponopono i nā pilikia like i ka manawa like. Nā laʻana:
— wanana hana + wanana kahe optical,
— hōʻike wikiō + hōʻike ʻōlelo (
-
Aia kekahi mau ʻatikala e pili ana i ka ʻāpana, ka hoʻoholo ʻana a me ka ʻike hou ʻana o nā holoholona!
Nā mea nui
Aneane ua ʻike mua ʻia nā ʻatikala a pau, aia ka kikokikona ma arXiv.org. No laila, ʻano ʻē ka hōʻike ʻana i nā hana e like me Everybody Dance Now, FUNIT, Image2StyleGAN - he mau hana pono loa kēia, akā ʻaʻole hou. Me he mea lā ke hakihaki nei ke kaʻina hana kahiko o nā puke ʻepekema - ke neʻe wikiwiki nei ka ʻepekema.
He paʻakikī loa ka hoʻoholo ʻana i nā hana maikaʻi loa - he nui o lākou, ʻokoʻa nā kumuhana. He mau ʻatikala i loaʻa
Makemake mākou e hōʻike i nā hana hoihoi mai ka manaʻo o ka manipulation kiʻi, ʻoiai ʻo kā mākou kumuhana kēia. Ua lilo lākou i mea hou a hoihoi iā mākou (ʻaʻole mākou e hoʻohālike i ka pahuhopu).
ʻO SingAN (ka makana pepa maikaʻi loa) a me InGAN
SingGAN:
INGAN:
Ka hoʻomohala ʻana i ke kiʻi hohonu Manaʻo mua mai Dmitry Ulyanov, Andrea Vedaldi a me Victor Lempitsky. Ma kahi o ka hoʻomaʻamaʻa ʻana i kahi GAN ma kahi waihona, aʻo nā ʻupena mai nā ʻāpana o ke kiʻi like i mea e hoʻomanaʻo ai i nā helu i loko. ʻO ka pūnaewele i hoʻomaʻamaʻa ʻia e ʻae iā ʻoe e hoʻoponopono a hoʻoulu i nā kiʻi (SinGAN) a i ʻole e hoʻohua i nā kiʻi hou o kēlā me kēia nui mai ke ʻano o ke kiʻi kumu, e mālama ana i ka hale kūloko (InGAN).
SingGAN:
INGAN:
E ʻike i ka mea hiki ʻole i kahi GAN ke hana
Lawe pinepine nā ʻupena neural e hoʻohua i nā kiʻi i kahi vector o ka walaʻau leo ma ke ʻano he hoʻokomo. I loko o kahi pūnaewele i hoʻomaʻamaʻa ʻia, nui nā vectors hoʻokomo i kahi ākea, nā neʻe liʻiliʻi e alakaʻi i nā loli liʻiliʻi i ke kiʻi. Me ka hoʻohana ʻana i ka loiloi, hiki iā ʻoe ke hoʻoponopono i ka pilikia inverse: e ʻimi i kahi vector hoʻokomo kūpono no ke kiʻi mai ka honua maoli. Hōʻike ka mea kākau ʻaneʻane hiki ʻole ke loaʻa i kahi kiʻi kūlike piha i kahi pūnaewele neural. ʻAʻole i hana ʻia kekahi mau mea ma ke kiʻi (me he mea lā ma muli o ka loli nui o kēia mau mea).
Manaʻo ka mea kākau ʻaʻole uhi ʻo GAN i nā wahi holoʻokoʻa o nā kiʻi, akā he mau ʻāpana wale nō, i hoʻopiha ʻia me nā lua, e like me ka cheese. Ke hoʻāʻo mākou e ʻimi i nā kiʻi mai ka honua maoli i loko, e hāʻule mau mākou, no ka mea, hoʻopuka mau ʻo GAN i nā kiʻi maoli ʻole. Hiki ke lanakila i nā ʻokoʻa ma waena o nā kiʻi maoli a me nā kiʻi i hana ʻia ma ka hoʻololi ʻana i nā paona o ka pūnaewele, ʻo ia hoʻi, ma ke aʻo hou ʻana iā ia no kahi kiʻi kikoʻī.
Ke hoʻomaʻamaʻa hou ʻia ka pūnaewele no kahi kiʻi kikoʻī, hiki iā ʻoe ke hoʻāʻo i nā manipulation like ʻole me kēia kiʻi. Ma ka laʻana ma lalo nei, ua hoʻohui ʻia kahi puka makani i ke kiʻi, a ua hana pū ka ʻupena i nā manaʻo noʻonoʻo ma ka ʻāpana kīhini. ʻO kēia keʻano o ka pūnaewele,ʻoiai ma hope o ka hoʻonaʻauao houʻana no ke kiʻi kiʻi,ʻaʻole i nalowale ka hiki keʻike i ka pilina ma waena o nā mea i ke kiʻi.
Ganalyze: E pili ana i nā wehewehe ʻike o nā ʻano kiʻi cognitive
Ke hoʻohana nei i ke ala mai kēia hana, hiki iā ʻoe ke noʻonoʻo a nānā i nā mea a ka neural network i aʻo ai. Manaʻo nā mea kākau e hoʻomaʻamaʻa iā GAN e hana i nā kiʻi e hoʻopuka ai ka pūnaewele i nā wānana kikoʻī. Ua hoʻohana ka ʻatikala i kekahi mau pūnaewele e like me nā hiʻohiʻona, me MemNet, e wānana ana i ka hoʻomanaʻo ʻana i ke kiʻi. Ua hoʻololi ʻia no ka hoʻomanaʻo maikaʻi ʻana, pono ka mea ma ke kiʻi:
- kokoke i ke kikowaena
- ʻoi aku ka poepoe a i ʻole ke ʻano huinahā a me kahi ʻano maʻalahi,
- e noho ma kahi ʻano like ʻole,
- Loaʻa nā maka maka (ma ka liʻiliʻi no nā kiʻi ʻīlio),
- ʻoi aku ka mālamalama, ʻoi aku ka momona, i kekahi mau hihia, ʻulaʻula.
ʻO Liquid Warping GAN: He Hoʻohui Hoʻohui no ka Hoʻohālike Hoʻohālikelike kanaka, Hoʻololi ʻana i ke ʻano a me ka Novel View Synthesis
Pipeline no ka hana ʻana i nā kiʻi o nā kānaka i hoʻokahi kiʻi i ka manawa. Hōʻike nā mea kākau i nā hiʻohiʻona kūleʻa o ka hoʻoneʻe ʻana i ka neʻe ʻana o kekahi kanaka i kekahi, ka hoʻoili ʻana i nā lole ma waena o nā kānaka a me ka hana ʻana i nā kihi hou o ke kanaka - nā mea āpau mai hoʻokahi kiʻi. ʻAʻole like me nā hana ma mua, ʻaʻole mākou e hoʻohana i nā kī nui ma 2D (pose), akā he 3D mesh o ke kino (pose + shape) e hana i nā kūlana. Ua noʻonoʻo nā mea kākau i ka hoʻololi ʻana i ka ʻike mai ke kiʻi kumu i ka mea i hana ʻia (Liquid Warping Block). Nānā maikaʻi nā hopena, akā ʻo 256x256 wale nō ka hopena o ke kiʻi. No ka hoʻohālikelike, ʻo vid2vid, i ʻike ʻia i hoʻokahi makahiki i hala aku nei, hiki iā ia ke hana i kahi hoʻonā o 2048x1024, akā pono ia e like me 10 mau minuke o ka hoʻopaʻa wikiō ma ke ʻano he dataset.
FSGAN: Kumuhana Agnostic Face Swapping and Reenactment
I ka wā mua, ʻaʻohe mea maʻamau: kahi deepfake me ka ʻoi aʻe a i ʻole ke ʻano maʻamau. Akā ʻo ka hoʻokō nui o ka hana ʻo ka hoʻololi ʻana i nā maka mai kahi kiʻi. ʻAʻole like me nā hana ma mua, koi ʻia ke aʻo ʻana i nā kiʻi he nui o kahi kanaka kikoʻī. Ua lilo ka pipeline i mea paʻakikī (reenactment a segmentation, view interpolation, inpainting, blending) a me ka nui o nā hacks loea, akā pono ka hopena.
Ka ʻike ʻana i ka mea i manaʻo ʻole ʻia ma o ke kiʻi hou ʻana
Pehea e hiki ai i kahi drone ke hoʻomaopopo i ka puka koke ʻana o kahi mea i mua ona ʻaʻole i hāʻule i loko o kekahi papa ʻāpana semantic? Nui nā ʻano, akā hāʻawi nā mea kākau i kahi algorithm intuitive hou e ʻoi aku ka maikaʻi ma mua o nā mea i hana mua ʻia. Ua wānana ʻia ka ʻāpana semantic mai ke kiʻi alanui komo. Hāʻawi ʻia ia ma ke ʻano he hoʻokomo i ka GAN (pix2pixHD), e hoʻāʻo nei e hoʻihoʻi i ke kiʻi kumu wale nō mai ka palapala semantic. ʻO nā anomalies i hāʻule ʻole i loko o kekahi o nā ʻāpana e ʻokoʻa loa i ka hoʻopuka a me ke kiʻi i hana ʻia. Hāʻawi ʻia nā kiʻi ʻekolu (kumu, ʻāpana, a kūkulu hou ʻia) i kahi pūnaewele ʻē aʻe e wānana ana i nā anomalies. Ua hoʻokumu ʻia ka ʻikepili no kēia mai ka ʻikepili Cityscapes kaulana, e hoʻololi wale i nā papa ma ka māhele semantic. ʻO ka mea e mahalo ai, ma kēia hoʻonohonoho, he ʻīlio e kū ana ma waenakonu o ke alanui, akā ua hoʻokaʻawale pololei ʻia (ʻo ia hoʻi he papa no ia), ʻaʻole ia he anomaly, ʻoiai ua hiki i ka ʻōnaehana ke ʻike iā ia.
hopena
Ma mua o ka ʻaha kūkā, he mea nui e ʻike i kāu mau makemake ʻepekema, he aha nā hōʻike āu e makemake ai e hele, a me wai e kamaʻilio me. A laila e ʻoi aku ka maikaʻi o nā mea a pau.
ʻO ka ICCV, ʻo ka mea mua a me ka mea nui, ka pūnaewele. Hoʻomaopopo ʻoe aia nā kula kiʻekiʻe a me nā keʻena ʻepekema kiʻekiʻe, hoʻomaka ʻoe e hoʻomaopopo i kēia, e ʻike i nā poʻe. A hiki iā ʻoe ke heluhelu i nā ʻatikala ma arXiv - a ma ke ala, ʻoluʻolu loa ʻaʻole pono ʻoe e hele i kahi e loaʻa ai ka ʻike.
Eia kekahi, ma ka ʻaha kūkā hiki iā ʻoe ke luʻu hohonu i nā kumuhana i pili ʻole iā ʻoe a ʻike i nā ʻano. ʻAe, e kākau i kahi papa inoa o nā ʻatikala e heluhelu ai. Inā he haumāna ʻoe, he manawa kūpono kēia no ʻoe e hālāwai ai me kahi kumu kumu, inā ʻoe mai ka ʻoihana, a laila me kahi mea hana hou, a inā he hui, a laila e hōʻike iā ʻoe iho.
Kakau inoa iā
Source: www.habr.com