NVIDIA qhib qhov chaw StyleGAN3, lub tshuab kev kawm rau lub ntsej muag synthesis

NVIDIA tau tshaj tawm qhov chaws rau StyleGAN3, lub tshuab kev kawm raws li kev sib txawv ntawm cov neural network (GAN) txhawm rau txhawm rau tsim cov duab tiag tiag ntawm tib neeg lub ntsej muag. Cov cai tau sau rau hauv Python siv PyTorch lub moj khaum thiab muab faib raws li NVIDIA Source Code License, uas txwv tsis pub siv coj mus muag.

Cov qauv npaj ua tiav tau txais kev cob qhia ntawm Flickr-Faces-HQ (FFHQ) sau, uas suav nrog 70 txhiab qhov zoo (1024x1024) PNG cov duab ntawm tib neeg lub ntsej muag, kuj muaj rau rub tawm. Tsis tas li ntawd, muaj cov qauv tsim los ntawm AFHQv2 (cov duab ntawm lub ntsej muag tsiaj) thiab Metfaces (cov duab ntawm tib neeg lub ntsej muag los ntawm portraits ntawm classical painting) collections. Txoj kev loj hlob tsom mus rau lub ntsej muag, tab sis lub kaw lus tuaj yeem raug cob qhia los tsim txhua yam khoom, xws li toj roob hauv pes thiab tsheb. Tsis tas li ntawd, cov cuab yeej tau muab rau kev cob qhia tus kheej lub neural network siv koj tus kheej cov duab sau. Yuav tsum muaj ib lossis ntau daim npav NVIDIA (Tesla V100 lossis A100 GPU pom zoo), tsawg kawg 12 GB ntawm RAM, PyTorch 1.9 thiab CUDA 11.1+ cov cuab yeej. Txhawm rau txiav txim siab qhov xwm txheej ntawm lub ntsej muag tshwm sim, lub cuab yeej tshwj xeeb raug tsim.

Lub kaw lus tso cai rau koj los tsim cov duab ntawm lub ntsej muag tshiab raws li kev cuam tshuam ntawm cov yam ntxwv ntawm ntau lub ntsej muag, sib txuas lawv cov yam ntxwv, nrog rau kev hloov kho cov duab kawg rau lub hnub nyoog, poj niam txiv neej, plaub hau ntev, luag ntxhi, lub qhov ntswg, daim tawv nqaij xim, tsom iav, thiab yees duab kaum. Lub tshuab hluav taws xob txiav txim siab cov duab raws li kev sau ntawm cov qauv, cia li cais cov yam ntxwv cov ntsiab lus (freckles, plaub hau, tsom iav) los ntawm cov yam ntxwv ntawm qib siab (pom, poj niam txiv neej, hnub nyoog hloov pauv) thiab tso cai rau koj los ua ke lawv hauv txhua daim ntawv nrog kev txiav txim siab ntawm qhov tseem ceeb. cov khoom los ntawm qhov hnyav coefficient. Yog li ntawd, dluab yog generated uas yog indistinguishable los ntawm cov duab tiag tiag.

NVIDIA qhib qhov chaw StyleGAN3, lub tshuab kev kawm rau lub ntsej muag synthesis

Thawj version ntawm StyleGAN thev naus laus zis tau luam tawm xyoo 2019, tom qab ntawd ib qho kev txhim kho ntawm StyleGAN2020 tau npaj rau xyoo 2, tso cai rau kev txhim kho cov duab zoo thiab tshem tawm qee yam khoom cuav. Nyob rau tib lub sijhawm, qhov system tseem nyob li qub, i.e. tsis tso cai ua kom tiav cov animation tiag tiag thiab lub ntsej muag txav. Thaum tsim StyleGAN3, lub hom phiaj tseem ceeb yog hloov kho cov thev naus laus zis rau nws siv hauv animation thiab yees duab.

StyleGAN3 siv cov duab tsim qauv tsim qauv tshiab, tsis pub muaj npe, thiab tawm tswv yim tshiab neural network kev cob qhia scenarios. Nws suav nrog cov khoom siv tshiab rau kev sib tham sib pom (visualizer.py), kev tsom xam (avg_spectra.py) thiab yees duab tiam (gen_video.py). Kev siv kuj tseem txo qis kev nco thiab ua kom cov txheej txheem kev kawm tau nrawm.

NVIDIA qhib qhov chaw StyleGAN3, lub tshuab kev kawm rau lub ntsej muag synthesis

Ib qho tseem ceeb ntawm StyleGAN3 architecture yog qhov kev hloov pauv mus rau kev txhais tag nrho cov teeb liab hauv neural network nyob rau hauv daim ntawv ntawm cov txheej txheem tas mus li, uas ua rau nws ua tau, thaum tsim cov khoom, los tswj cov txheeb ze cov hauj lwm uas tsis khi rau lub meej coordinates ntawm ib tug pixels nyob rau hauv. daim duab, tab sis tsau rau saum npoo ntawm cov khoom depicted. Nyob rau hauv StyleGAN thiab StyleGAN2, khi rau pixels thaum lub sij hawm tiam ua rau muaj teeb meem thaum lub sij hawm dynamic rendering, piv txwv li, thaum daim duab tsiv, muaj ib tug mismatch ntawm me me cov ntsiab lus, xws li wrinkles thiab plaub hau, uas zoo li tsiv mus nyob nyias los ntawm tus so ntawm lub ntsej muag. . Hauv StyleGAN3, cov teeb meem no tau daws tau thiab cov thev naus laus zis tau dhau los ua qhov tsim nyog rau kev tsim video.

Tsis tas li ntawd, peb tuaj yeem nco ntsoov cov lus tshaj tawm ntawm kev tsim los ntawm NVIDIA thiab Microsoft ntawm cov qauv lus loj tshaj plaws MT-NLG raws li kev sib sib zog nqus neural network nrog "transformer" architecture. Tus qauv npog 530 billion tsis, thiab ib pawg ntawm 4480 GPUs (560 DGX A100 servers nrog 8 A100 80GB GPUs txhua) tau siv rau kev cob qhia. Cov qauv kev siv suav nrog kev daws teeb meem kev ua cov lus ntuj, xws li kev kwv yees ua tiav cov kab lus tsis tiav, teb cov lus nug, nyeem kev nkag siab, kos duab inferences hauv hom lus, thiab disambiguating lub ntsiab lus ntawm cov lus.

NVIDIA qhib qhov chaw StyleGAN3, lub tshuab kev kawm rau lub ntsej muag synthesis


Tau qhov twg los: opennet.ru

Ntxiv ib saib