Large Hadron Collider ndi Odnoklassniki

Kupitiliza mutu wa mpikisano wophunzirira makina pa HabrΓ©, tikufuna kudziwitsa owerenga nsanja zina ziwiri. Iwo sali aakulu kwambiri ngati kaggle, koma ayenera kusamala.

Large Hadron Collider ndi Odnoklassniki

Inemwini, sindimakonda kaggle kwambiri pazifukwa zingapo:

  • choyamba, mipikisano kumeneko nthawi zambiri imatha kwa miyezi ingapo, ndipo kutenga nawo mbali mwachangu kumafuna khama lalikulu;
  • chachiwiri, maso a anthu (public solutions). Otsatira a Kaggle amalangiza kuwachitira modekha amonke a ku Tibet, koma zoona zake n’zamanyazi pamene chinachake chimene mwakhala mukuchita kwa mwezi umodzi kapena iwiri chikapezeka kuti chaikidwa m’mbale yasiliva kwa aliyense.

Mwamwayi, mipikisano yophunzirira makina imachitika pamapulatifomu ena, ndipo mipikisano ingapo iyi idzakambidwa.

IDAO SNA Hackathon 2019
Chiyankhulo chovomerezeka: Chingerezi,
okonza: Yandex, Sberbank, HSE
Chilankhulo chovomerezeka cha Russian,
okonza: Mail.ru Gulu
Kuzungulira pa intaneti: Jan 15 - Feb 11, 2019;
Pamalo Omaliza: Apr 4-6, 2019
pa intaneti - kuyambira February 7 mpaka Marichi 15;
popanda intaneti - kuyambira pa Marichi 30 mpaka Epulo 1.
Pogwiritsa ntchito gulu lina la tinthu tating'ono mu Large Hadron Collider (trajectory, mphamvu, ndi zina m'malo zovuta thupi magawo), kudziwa ngati ndi muon kapena ayi.
Kuchokera ku mawu awa, ntchito ziwiri zadziwika:
- m'modzi mwangoyenera kutumiza zolosera zanu,
- ndi zina - code yonse ndi chitsanzo cholosera, ndipo kuphedwa kunali koyenera kuletsa nthawi yogwiritsira ntchito ndi kukumbukira.
Pampikisano wa SNA Hackathon, zolemba zomwe zikuwonetsedwa kuchokera m'magulu otseguka m'zakudya za ogwiritsa ntchito February-March 2018 zinasonkhanitsidwa. Mayesowa ali ndi sabata yomaliza ndi theka la Marichi. Chilichonse mu chipikacho chimakhala ndi zambiri za zomwe zawonetsedwa komanso kwa ndani, komanso momwe wogwiritsa ntchitoyo adachitira ndi izi: adavotera, adapereka ndemanga, adazinyalanyaza, kapena kuzibisa kuti zisakayikire.
Chofunika kwambiri cha ntchito za SNA Hackathon ndikuyika aliyense wogwiritsa ntchito malo ochezera a pa Intaneti Odnoklassniki chakudya chake, kukweza kwambiri zomwe zingatheke kuti alandire "kalasi".
Pa intaneti, ntchitoyi idagawidwa m'magawo atatu:
1. sinthani ma post malinga ndi mikhalidwe yosiyanasiyana yogwirira ntchito
2. ma post post malinga ndi zithunzi zomwe zili
3. sinthani ma post malinga ndi zomwe zili
Metric yachizolowezi chovuta, china ngati ROC-AUC Pafupifupi ROC-AUC ndi wogwiritsa ntchito
Mphotho ya gawo loyamba - T-shirts za malo a N, ndime yopita ku gawo lachiwiri, pomwe malo ogona ndi chakudya zidalipiridwa pampikisano.
Gawo lachiwiri - ??? (Pazifukwa zina, sindinapezeke pamwambo wopereka mphothoyo ndipo sindinathe kudziwa kuti mphothozo zinali zotani pamapeto pake). Iwo adalonjeza ma laputopu kwa mamembala onse a gulu lopambana
Mphoto pa gawo loyamba - T-shirts kwa otenga nawo mbali 100, ndimeyi kupita ku gawo lachiwiri, kumene ulendo wopita ku Moscow, malo ogona ndi chakudya pa mpikisano unaperekedwa. Komanso, chakumapeto kwa gawo loyamba, mphoto zinalengezedwa zabwino kwambiri muzochita zitatu pa siteji 3: aliyense adapambana khadi la kanema la RTX 1 TI!
Gawo lachiwiri linali siteji yamagulu, magulu anali anthu 2 mpaka 5, mphoto:
1 malo - 300 rubles
2 malo - 200 rubles
3 malo - 100 rubles
mtengo wa jury - 100 rubles
Gulu lovomerezeka la telegalamu, ~ 190 otenga nawo mbali, kulumikizana mu Chingerezi, mafunso adayenera kudikirira masiku angapo kuti ayankhidwe. Gulu lovomerezeka mu telegalamu, ~ 1500 otenga nawo mbali, kukambirana mwachangu za ntchito pakati pa otenga nawo mbali ndi okonza
Okonzawo adapereka njira ziwiri zoyambira, zosavuta komanso zapamwamba. Zosavuta zimafunikira zosakwana 16 GB ya RAM, ndipo kukumbukira kwapamwamba sikunagwirizane ndi 16. Panthawi imodzimodziyo, kuyang'ana patsogolo pang'ono, ophunzirawo sanathe kupititsa patsogolo kwambiri njira yothetsera vutoli. Panalibe zovuta kukhazikitsa mayankho awa. Tiyenera kuzindikira kuti mu chitsanzo chapamwamba panali ndemanga yokhala ndi lingaliro la komwe mungayambire kukonza yankho. Mayankho achikale adaperekedwa pa ntchito iliyonse, yomwe idadutsa mosavuta ndi omwe adatenga nawo gawo. M'masiku oyambirira a mpikisano, otenga nawo mbali anakumana ndi zovuta zingapo: choyamba, deta inaperekedwa mu mtundu wa Apache Parquet, ndipo osati kuphatikiza kwa Python ndi phukusi la parquet lomwe linagwira ntchito popanda zolakwika. Vuto lachiwiri linali kutsitsa zithunzi kuchokera pamtambo wamakalata; pakadali pano palibe njira yosavuta yotsitsa zambiri nthawi imodzi. Zotsatira zake, mavutowa adachedwetsa ophunzira kwa masiku angapo.

IDAO. Gawo loyamba

Ntchitoyi inali yogawa tinthu tating'ono ta muon/non-muon molingana ndi mikhalidwe yawo. Chofunika kwambiri cha ntchitoyi chinali kukhalapo kwa gawo lolemera muzolemba zamaphunziro, zomwe okonza okhawo adatanthauzira ngati chidaliro cha yankho la mzerewu. Vuto linali loti mizere ingapo inali ndi masikelo olakwika.

Large Hadron Collider ndi Odnoklassniki

Pambuyo poganiza kwa mphindi zingapo za mzere womwe uli ndi lingaliro (lingaliroli limangoyang'ana mbali iyi ya gawo lolemera) ndikupanga graph iyi, tidaganiza zoyang'ana zosankha zitatu:

1) tembenuzani chandamale cha mizere yokhala ndi zolemera zolakwika (ndi zolemera molingana)
2) sinthani masikelo kumtengo wocheperako kuti ayambire ku 0
3) musagwiritse ntchito zolemera za chingwe

Njira yachitatu inakhala yoipitsitsa kwambiri, koma ziwiri zoyambirira zinasintha zotsatira zake, zabwino kwambiri zinali njira No.
Large Hadron Collider ndi Odnoklassniki
Chotsatira chathu chinali kuyang'ananso zomwe zidasowa. Okonzawo adatipatsa zomwe zidasinthidwa kale, pomwe panali zochepa zomwe zidasowa, ndipo zidasinthidwa ndi -9999.

Tinapeza zinthu zomwe zikusoweka m'magawo a MatchedHit_{X,Y,Z}[N] ndi MatchedHit_D{X,Y,Z}[N], ndipo pokhapokha ngati N=2 kapena 3. Monga tikumvetsetsa, tinthu tina tating'ono tating'ono sitinatero dutsa zowunikira zonse 4, ndikuyimitsa pa mbale ya 3 kapena 4. Detayo inalinso ndi mizati ya Lextra_{X,Y}[N], yomwe ikufotokozanso chimodzimodzi monga MatchedHit_{X,Y,Z}[N], koma pogwiritsa ntchito mawu owonjezera. Zongopeka zochepazi zasonyeza kuti Lextra_{X,Y}[N] atha kulowetsedwa m'malo ndi zikhalidwe zomwe zikusoweka mu MatchedHit_{X,Y,Z}[N] (zogwirizanitsa X ndi Y zokha). MatchedHit_Z[N] anali wodzazidwa bwino ndi wapakatikati. Zosokoneza izi zidatilola kuti tifike pamalo oyamba apakati pantchito zonse ziwiri.

Large Hadron Collider ndi Odnoklassniki

Poganizira kuti sanapereke kalikonse kuti tipambane gawo loyamba, tikanatha kuima pamenepo, koma tinapitiriza, kujambula zithunzi zokongola ndikubwera ndi zatsopano.

Large Hadron Collider ndi Odnoklassniki

For example, we found that if we plot the intersection points of a particle with each of the four detector plates, we can see that the points on each of the plates are grouped into 5 rectangles with an aspect ratio of 4 to 5 and centered at mfundo (0,0), ndipo mu Palibe mfundo mu rectangle yoyamba.

Miyeso ya mbale / rectangle 1 2 3 4 5
Gawo 1 500x625 1000x1250 2000x2500 4000x5000 8000x10000
Gawo 2 520x650 1040x1300 2080x2600 4160x5200 8320x10400
Gawo 3 560x700 1120x1400 2240x2800 4480x5600 8960x11200
Gawo 4 600x750 1200x1500 2400x3000 4800x6000 9600x12000

Titatsimikiza miyeso iyi, tidawonjeza zida 4 zatsopano pagawo lililonse - kuchuluka kwa rectangle yomwe imadutsa mbale iliyonse.

Large Hadron Collider ndi Odnoklassniki

Tinawonanso kuti tinthu tating'onoting'ono tating'onoting'ono timabalalika kumbali kuchokera pakati ndipo lingaliro lidawuka kuti liwone "khalidwe" la kubalalitsa uku. Momwemo, zikanakhala zotheka kubwera ndi mtundu wina wa "zabwino" parabola malingana ndi malo olowera ndikuyesa kupatukako, koma tinadzichepetsera ku "zabwino" mzere wolunjika. Titapanga mizere yowongoka yabwino pa malo aliwonse olowera, tinatha kuwerengera kupotoza koyenera kwa kagawo kakang'ono kalikonse kuchokera pamzere wowongokawu. Popeza kupatuka kwapakati pa chandamale = 1 kunali 152, ndipo chandamale = 0 inali 390, tidayesa mozama mbaliyi ngati yabwino. Ndipo ndithudi, mbali imeneyi nthawi yomweyo inapanga kukhala pamwamba pa zothandiza kwambiri.

Tidakondwera ndikuwonjezera kupatuka kwa magawo 4 a mphambano iliyonse pagawo lililonse kuchokera pamzere wowongoka wowongoka ngati zina zowonjezera 4 (ndipo zinagwiranso ntchito bwino).

Maulalo ku nkhani zasayansi pamutu wa mpikisano, woperekedwa kwa okonza, adayambitsa lingaliro lakuti ife tiri kutali ndi oyamba kuthetsa vutoli ndipo, mwinamwake, pali mtundu wina wa mapulogalamu apadera. Titapeza malo osungira pa github pomwe njira za IsMuonSimple, IsMuon, IsMuonLoose zidakhazikitsidwa, tidawasamutsira patsamba lathu ndikusintha pang'ono. Njirazo zinali zophweka kwambiri: mwachitsanzo, ngati mphamvu ili yochepa kuposa malire ena, ndiye kuti si muon, mwinamwake ndi muon. Zinthu zosavuta zotere mwachiwonekere sizikanatha kupatsa chiwonjezeko pakugwiritsa ntchito kukwera kwa gradient, chifukwa chake tidawonjezeranso "mtunda" wina wofunikira polowera. Izi zasinthidwanso pang'ono. Mwinamwake, mwa kusanthula njira zomwe zilipo bwino kwambiri, zinali zotheka kupeza njira zolimba ndikuziwonjezera ku zizindikiro.

Pamapeto pa mpikisano, tidasintha pang'ono yankho la "mwamsanga" pavuto lachiwiri; pamapeto pake, zidasiyana ndi zoyambira pazotsatirazi:

  1. M'mizere yokhala ndi kulemera kolakwika cholingacho chinatembenuzidwa
  2. Zodzaza muzinthu zomwe zikusowa mu MatchedHit_{X,Y,Z}[N]
  3. Kuzama kwachepetsedwa kufika 7
  4. Kutsika kwa maphunziro kufika pa 0.1 (inali 0.19)

Zotsatira zake, tinayesa zinthu zambiri (osati bwino kwambiri), magawo osankhidwa ndi catboost ophunzitsidwa bwino, lightgbm ndi xgboost, adayesa kusakanikirana kosiyana kwa maulosi ndipo tisanatsegule zachinsinsi tinapambana molimba mtima pa ntchito yachiwiri, ndipo poyamba tinali pakati pa atsogoleri.

Titatsegula zachinsinsi tinali pamalo a 10 pa ntchito yoyamba komanso yachitatu yachiwiri. Atsogoleri onse adasokonezeka, ndipo liwiro mwachinsinsi linali lalitali kuposa pa libboard. Zikuwoneka kuti detayo inali yosasunthika bwino (kapena mwachitsanzo panalibe mizere yokhala ndi zolemera zolakwika mwachinsinsi) ndipo izi zinali zokhumudwitsa pang'ono.

SNA Hackathon 2019 - Zolemba. Gawo loyamba

Ntchitoyi inali kuyika zolemba za ogwiritsa ntchito patsamba la Odnoklassniki potengera zomwe anali nazo; kuwonjezera pa zolembazo, panalinso zina zingapo za positiyo (chilankhulo, mwiniwake, tsiku ndi nthawi yolengedwa, tsiku ndi nthawi yowonera. ).

Monga njira zachikale zogwirira ntchito ndi malemba, ndingasonyeze njira ziwiri:

  1. Kupanga mapu a liwu lililonse kukhala n-dimensional vector space kuti mawu ofanana akhale ndi ma vector ofanana (werengani zambiri mu nkhani yathu), ndiye mwina kupeza liwu lapakati palembalo kapena kugwiritsa ntchito njira zomwe zimaganizira momwe mawu alili (CNN, LSTM/GRU).
  2. Kugwiritsa ntchito zitsanzo zomwe zimatha kugwira ntchito nthawi yomweyo ndi ziganizo zonse. Mwachitsanzo, Bert. Mwachidziwitso, njira iyi iyenera kugwira ntchito bwino.

Popeza ichi chinali chochitika changa choyamba ndi malemba, kungakhale kulakwa kuphunzitsa munthu, kotero ndidziphunzitsa ndekha. Nawa maupangiri omwe ndingadzipatse poyambira mpikisano:

  1. Musanathamangire kukaphunzitsa chinachake, yang'anani deta! Kuphatikiza pa zolembazo, detayo inali ndi zigawo zingapo ndipo zinali zotheka kufinya zambiri kuchokera kwa iwo kuposa momwe ine ndinachitira. Chosavuta kwambiri ndikutanthawuza kabisidwe chandamale pazazambiri zina.
  2. Osaphunzira kuchokera ku data yonse! Panali zambiri (pafupifupi mizere 17 miliyoni) ndipo sikunali kofunikira kugwiritsa ntchito zonsezo kuyesa malingaliro. Kuphunzitsa ndi kukonzekereratu kunali kochedwa, ndipo mwachiwonekere ndikadakhala ndi nthawi yoyesa malingaliro osangalatsa.
  3. <Malangizo otsutsana> Palibe chifukwa choyang'ana chitsanzo chakupha. Ndinakhala nthawi yayitali ndikulingalira za Elmo ndi Bert, ndikuyembekeza kuti anditengera nthawi yomweyo kumalo okwera, ndipo chifukwa chake ndinagwiritsa ntchito FastText zoyikapo zomwe zidaphunzitsidwa kale m'chinenero cha Chirasha. Sindinathe kuchita bwino ndi Elmo, ndipo ndinalibebe nthawi yoti ndimvetsetse ndi Bert.
  4. <Malangizo otsutsana> Palibe chifukwa choyang'ana mbali imodzi yakupha. Kuyang'ana deta, ndinazindikira kuti pafupifupi 1 peresenti ya malembawo alibe kwenikweni malemba! Koma panali maulalo kuzinthu zina, ndipo ndidalemba chowerengera chosavuta chomwe chidatsegula tsambalo ndikutulutsa mutu ndi kufotokozera. Zinkawoneka ngati lingaliro labwino, koma kenako ndinatengeka ndikusankha kusanthula maulalo onse a zolemba zonse ndikutayanso nthawi yambiri. Zonsezi sizinapereke kusintha kwakukulu pazotsatira zomaliza (ngakhale ndinaganiza zoyambira, mwachitsanzo).
  5. Classics zimagwira ntchito. Ife a Google, mwachitsanzo, "zolemba zolemba kaggle", timawerenga ndikuwonjezera chilichonse. TF-IDF inapereka kusintha, monganso ziwerengero monga kutalika kwa malemba, mawu, ndi kuchuluka kwa zizindikiro.
  6. Ngati pali mizati ya DateTime, ndi bwino kuwagawa m'magawo angapo (maola, masiku a sabata, ndi zina). Zomwe ziyenera kuwunikira ziyenera kuyesedwa pogwiritsa ntchito ma graph / ma metric ena. Apa, mwachidwi, ndidachita zonse molondola ndikuwunikira zofunikira, koma kusanthula kwabwinobwino sikukadapweteka (mwachitsanzo, monga momwe tidachitira pomaliza).

Large Hadron Collider ndi Odnoklassniki

Chifukwa cha mpikisanowo, ndinaphunzitsa chitsanzo chimodzi cha kera ndi kutembenuza mawu, ndi china chozikidwa pa LSTM ndi GRU. Onsewa adagwiritsa ntchito zoyika za FastText zophunzitsidwa kale zachilankhulo cha Chirasha (ndinayesa zoyika zina zingapo, koma izi ndi zomwe zidagwira ntchito bwino). Nditawerengera zomwe zanenedweratu, ndidatenga malo omaliza a 7 mwa otenga nawo gawo 76.

Pambuyo pa gawo loyamba lidasindikizidwa nkhani ndi Nikolai Anokhin, yemwe adatenga malo achiwiri (adachita nawo mpikisano), ndi yankho lake mpaka chigawo china chinabwereza changa, koma anapita patsogolo chifukwa cha kufufuza-key-value chidwi limagwirira.

Gawo lachiwiri OK & IDAO

Gawo lachiwiri la mpikisano linachitika pafupifupi motsatizana, choncho ndinaganiza zoyang'ana pamodzi.

Choyamba, ine ndi gulu lomwe tinangopeza kumene tinakhala mu ofesi yochititsa chidwi ya kampani ya Mail.ru, kumene ntchito yathu inali yophatikiza zitsanzo za nyimbo zitatu kuchokera pagawo loyamba - malemba, zithunzi ndi mgwirizano. Masiku opitilira 2 adapatsidwa izi, zomwe zidakhala zochepa kwambiri. M'malo mwake, tinangotha ​​kubwereza zotsatira zathu kuchokera pagawo loyamba popanda kulandira phindu lililonse kuchokera pakuphatikizana. Pamapeto pake, tinatenga malo a 5, koma sitinathe kugwiritsa ntchito chitsanzo cha malemba. Pambuyo poyang'ana mayankho a anthu ena, zikuwoneka kuti kunali koyenera kuyesa kugwirizanitsa malembawo ndikuwonjezera pa chitsanzo chogwirizanitsa. Zotsatira za siteji iyi zinali zatsopano, kukumana ndi kuyankhulana ndi otenga nawo mbali ozizira ndi okonzekera, komanso kusowa kwakukulu kwa tulo, zomwe zikhoza kukhudza zotsatira za gawo lomaliza la IDAO.

Ntchito pa IDAO 2019 Final siteji inali kulosera nthawi yodikirira kuyitanitsa oyendetsa takisi ya Yandex pa eyapoti. Pa siteji 2, ntchito za 3 = ma eyapoti a 3 adadziwika. Pa eyapoti iliyonse, deta ya miniti ndi miniti ya kuchuluka kwa ma oda a taxi kwa miyezi isanu ndi umodzi imaperekedwa. Ndipo monga deta yoyesera, mwezi wotsatira ndi deta ya miniti ndi miniti pamaoda a masabata a 2 apitawo anaperekedwa. Panali nthawi yochepa (masiku 1,5), ntchitoyo inali yeniyeni, munthu mmodzi yekha wa timu anabwera ku mpikisano - ndipo chifukwa cha ichi chinali malo achisoni chakumapeto. Malingaliro ochititsa chidwi akuphatikizapo kuyesa kugwiritsa ntchito deta yakunja: nyengo, kuchulukana kwa magalimoto ndi mawerengero a ma taxi a Yandex. Ngakhale okonza sananene kuti ma eyapoti awa anali chiyani, ophunzira ambiri adaganiza kuti anali Sheremetyevo, Domodedovo ndi Vnukovo. Ngakhale kuti lingaliro ili linatsutsidwa pambuyo pa mpikisano, mawonekedwe, mwachitsanzo, kuchokera ku deta ya nyengo ya ku Moscow anasintha zotsatira zonse pa kutsimikiziridwa ndi pa boardboard.

Pomaliza

  1. Mpikisano wa ML ndiwosangalatsa komanso wosangalatsa! Apa mudzapeza kugwiritsa ntchito luso pakusanthula deta, ndi zitsanzo ndi njira zochenjera, komanso kulingalira bwino ndikolandiridwa.
  2. ML ndi chidziwitso chambiri chomwe chikuwoneka kuti chikukula kwambiri. Ndinadziikira cholinga chodziwa madera osiyanasiyana (zizindikiro, zithunzi, matebulo, malemba) ndipo ndazindikira kale kuchuluka kwa maphunziro. Mwachitsanzo, pambuyo pa mpikisanowu ndinaganiza zophunzira: kugwirizanitsa ma aligorivimu, njira zapamwamba zogwirira ntchito ndi malaibulale opititsa patsogolo ma gradient (makamaka, kugwira ntchito ndi CatBoost pa GPU), ma netiweki a capsule, njira yowunikira-makiyi.
  3. Osati mwa kaggle yekha! Pali mipikisano ina yambiri komwe kumakhala kosavuta kupeza T-sheti, ndipo pali mwayi wochulukirapo wa mphotho zina.
  4. Lumikizanani! Pali kale gulu lalikulu pankhani yophunzirira makina ndi kusanthula deta, pali magulu amtundu wa telegraph, slack, ndi anthu akuluakulu ochokera ku Mail.ru, Yandex ndi makampani ena amayankha mafunso ndikuthandizira oyamba kumene ndi omwe akupitiliza njira yawo m'munda uno. cha chidziwitso.
  5. Ndikulangiza aliyense amene adalimbikitsidwa ndi mfundo yapitayi kuti apite datafest - msonkhano waukulu waulere ku Moscow, womwe udzachitika pa May 10-11.

Source: www.habr.com

Kuwonjezera ndemanga