Githubã«æçš¿ããŸãã
ããã¯ã1060 ç§ãããæ°å件ã®æ¿å ¥ãåŠçã§ããã·ã³ãã«ãª GPU ããã·ã¥ ããŒãã«ã§ãã ç§ã® NVIDIA GTX 64 ã©ãããããã§ã¯ãã³ãŒãã¯ã©ã³ãã ã«çæããã 210 äžã®ããŒãšå€ã®ãã¢ãçŽ 32 ããªç§ã§æ¿å ¥ãã64 äžã®ãã¢ãçŽ XNUMX ããªç§ã§åé€ããŸãã
ã€ãŸããã©ãããããã®é床ã¯çŽ 300 åæ¿å ¥/ç§ã500 ååé€/ç§ã§ãã
ããŒãã«ã¯ CUDA ã§äœæãããŠããŸãããåãææ³ã HLSL ãŸã㯠GLSL ã«é©çšããããšãã§ããŸãã ãã㪠ã«ãŒãã§é«ãããã©ãŒãã³ã¹ã確ä¿ããã«ã¯ãå®è£ ã«ã¯ããã€ãã®å¶éããããŸãã
- 32 ãããã®ããŒãšåãå€ã®ã¿ãåŠçãããŸãã
- ããã·ã¥ ããŒãã«ã®ãµã€ãºã¯åºå®ã§ãã
- ãããŠããã®ãµã€ãºã¯ XNUMX ã®ã¹ãä¹ã«çãããªããã°ãªããŸããã
ããŒãšå€ã«ã€ããŠã¯ãåçŽãªåºåãããŒã«ãŒãäºçŽããå¿
èŠããããŸã (äžèšã®ã³ãŒãã§ã¯ããã㯠0xffffffff ã§ã)ã
ããã¯ã®ãªãããã·ã¥ ããŒãã«
ããã·ã¥ ããŒãã«ã¯ãªãŒãã³ ã¢ãã¬ã¹æå®ã䜿çšããŸãã KeyValue
:
struct KeyValue
{
uint32_t key;
uint32_t value;
};
pow2/AND ãã¹ã¯ãé©çšããã«ã¯ XNUMX ã€ã®é«éåœä»€ã§ååã§ãããã¢ãžã¥ã©ã¹æŒç®åã¯ã¯ããã«é ããããããŒãã«ã®ãµã€ãºã¯çŽ æ°ã§ã¯ãªã XNUMX ã®ã¹ãä¹ã§ãã ç·åœ¢ããŒãã« ã«ãã¯ã¢ããã§ã¯ã¹ããã ã€ã³ããã¯ã¹ãåã¹ãããã§ã©ããããå¿ èŠããããããããã¯ç·åœ¢ãããŒãã®å Žåã«éèŠã§ãã ãã®çµæãæäœã®ã³ã¹ããã¹ãããããšã«ã¢ãžã¥ãã§è¿œå ãããŸãã
ããŒãã«ã«ã¯åèŠçŽ ã®ããŒãšå€ã®ã¿ãä¿åãããããŒã®ããã·ã¥ã¯ä¿åãããŸããã ããŒãã«ã«ã¯ 32 ãããã®ããŒã®ã¿ãæ ŒçŽããããããããã·ã¥ã¯éåžžã«è¿ éã«èšç®ãããŸãã äžèšã®ã³ãŒã㯠Murmur3 ããã·ã¥ã䜿çšããŠãããæ°åã®ã·ãããXORãä¹ç®ã®ã¿ãå®è¡ããŸãã
ããã·ã¥ ããŒãã«ã¯ãã¡ã¢ãªã®é åºã«äŸåããªãããã¯ä¿è·æè¡ã䜿çšããŸãã äžéšã®æžã蟌ã¿æäœã«ãã£ãŠä»ã®ãã®ãããªæäœã®é åºãæ··ä¹±ããå Žåã§ããããã·ã¥ ããŒãã«ã¯äŸç¶ãšããŠæ£ããç¶æ ãç¶æããŸãã ããã«ã€ããŠã¯ä»¥äžã§èª¬æããŸãã ãã®æè¡ã¯ãæ°åã®ã¹ã¬ãããåæã«å®è¡ãããã㪠ã«ãŒãã§ããŸãæ©èœããŸãã
ããã·ã¥ ããŒãã«ã®ããŒãšå€ã¯ç©ºã«åæåãããŸãã
ã³ãŒããå€æŽããŠã64 ãããã®ããŒãšå€ãåŠçããããšãã§ããŸãã ããŒã«ã¯ã¢ãããã¯ãªèªã¿åããæžã蟌ã¿ãæ¯èŒããã³äº€æã®æäœãå¿
èŠã§ãã ãŸããå€ã«ã¯ã¢ãããã¯ãªèªã¿åãããã³æžã蟌ã¿æäœãå¿
èŠã§ãã 幞ããªããšã«ãCUDA ã§ã¯ã32 ãããå€ãš 64 ãããå€ã®èªã¿åã/æžã蟌ã¿æäœã¯ãããããèªç¶ã«ã¢ã©ã€ã¡ã³ããããŠããéãã¢ãããã¯ã§ã (以äžãåç
§)ã
ããã·ã¥ããŒãã«ã®ç¶æ
ããã·ã¥ ããŒãã«å ã®åããŒãšå€ã®ãã¢ã¯ã次㮠XNUMX ã€ã®ç¶æ ã®ããããã«ãªããŸãã
- ããŒãšå€ã空ã§ãã ãã®ç¶æ ã§ããã·ã¥ããŒãã«ãåæåãããŸãã
- ããŒã¯æžãçããããŠããŸãããå€ã¯ãŸã æžã蟌ãŸããŠããŸããã å¥ã®ã¹ã¬ãããçŸåšããŒã¿ãèªã¿åã£ãŠããå Žåããã®ã¹ã¬ããã¯ç©ºãè¿ããŸãã ããã¯æ£åžžãªããšã§ãããå¥ã®å®è¡ã¹ã¬ãããããå°ãæ©ãåäœããŠããã°åãããšãèµ·ãã£ãŠããã§ããããããã§è©±ããŠããã®ã¯åæããŒã¿æ§é ã§ãã
- ããŒãšå€ã®äž¡æ¹ãèšé²ãããŸãã
- å€ã¯ä»ã®å®è¡ã¹ã¬ããã§äœ¿çšã§ããŸãããããŒã¯ãŸã 䜿çšã§ããŸããã ããã¯ãCUDA ããã°ã©ãã³ã° ã¢ãã«ã®ã¡ã¢ãª ã¢ãã«ã®é åºãç·©ãããã«çºçããå¯èœæ§ããããŸãã ããã¯æ£åžžã§ããããããã«ããŠããå€ã空ã§ãªããªã£ãŠããããŒã¯ç©ºã®ãŸãŸã§ãã
éèŠãªãã¥ã¢ã³ã¹ã¯ãããŒãã¹ãããã«æžã蟌ãŸãããšãããŒãåé€ãããŠã移åããªããªããšããããšã§ããããã«ã€ããŠã¯ä»¥äžã§èª¬æããŸãã
ããã·ã¥ ããŒãã« ã³ãŒãã¯ãã¡ã¢ãªãèªã¿æžããããé åºãäžæãªãç·©ããã«é åºä»ããããã¡ã¢ãª ã¢ãã«ã§ãæ©èœããŸãã ããã·ã¥ ããŒãã«ã§ã®æ¿å ¥ãæ€çŽ¢ãåé€ã確èªãããšãã¯ãåããŒãšå€ã®ãã¢ãäžèšã® XNUMX ã€ã®ç¶æ ã®ããããã«ããããšã«æ³šæããŠãã ããã
ããã·ã¥ããŒãã«ãžã®æ¿å ¥
ããŒãšå€ã®ãã¢ãããã·ã¥ ããŒãã«ã«æ¿å ¥ãã CUDA é¢æ°ã¯æ¬¡ã®ããã«ãªããŸãã
void gpu_hashtable_insert(KeyValue* hashtable, uint32_t key, uint32_t value)
{
uint32_t slot = hash(key);
while (true)
{
uint32_t prev = atomicCAS(&hashtable[slot].key, kEmpty, key);
if (prev == kEmpty || prev == key)
{
hashtable[slot].value = value;
break;
}
slot = (slot + 1) & (kHashTableCapacity-1);
}
}
ããŒãæ¿å ¥ããã«ã¯ãã³ãŒãã¯æ¿å ¥ãããããŒã®ããã·ã¥ããå§ãŸãããã·ã¥ ããŒãã«é åãå埩åŠçããŸãã é åå ã®åã¹ãããã¯ããã®ã¹ãããå ã®ããŒã空ãšæ¯èŒããã¢ãããã¯ãªæ¯èŒäº€ææäœãå®è¡ããŸãã äžäžèŽãæ€åºãããå Žåãã¹ãããå ã®ããŒã¯æ¿å ¥ãããããŒã§æŽæ°ãããå ã®ã¹ããã ããŒãè¿ãããŸãã ãã®å ã®ããŒã空ã§ããããæ¿å ¥ãããããŒãšäžèŽããå Žåãã³ãŒãã¯æ¿å ¥ã«é©ããã¹ããããèŠã€ããŠãæ¿å ¥ãããå€ããã®ã¹ãããã«æ¿å ¥ããŸãã
XNUMX ã€ã®ã«ãŒãã«åŒã³åºãã®å Žå gpu_hashtable_insert()
åãããŒãæã€èŠçŽ ãè€æ°ããå Žåããããã®å€ã®ãããããã㌠ã¹ãããã«æžã蟌ãããšãã§ããŸãã ããã¯æ£åžžã§ãããšèããããŸããåŒã³åºãäžã®ããŒãšå€ã®æžã蟌ã¿ã® XNUMX ã€ã¯æåããŸãããããããã¹ãŠãè€æ°ã®å®è¡ã¹ã¬ããå
ã§äžŠè¡ããŠè¡ããããããã©ã®ã¡ã¢ãªãžã®æžã蟌ã¿ãæåŸã®æžã蟌ã¿ã«ãªãããäºæž¬ã§ããŸããã
ããã·ã¥ããŒãã«ã«ãã¯ã¢ãã
ããŒãæ€çŽ¢ããã³ãŒã:
uint32_t gpu_hashtable_lookup(KeyValue* hashtable, uint32_t key)
{
uint32_t slot = hash(key);
while (true)
{
if (hashtable[slot].key == key)
{
return hashtable[slot].value;
}
if (hashtable[slot].key == kEmpty)
{
return kEmpty;
}
slot = (slot + 1) & (kHashTableCapacity - 1);
}
}
ããŒãã«ã«æ ŒçŽãããŠããããŒã®å€ãèŠã€ããã«ã¯ãæ¢ããŠããããŒã®ããã·ã¥ããå§ãŸãé åãå埩åŠçããŸãã åã¹ãããã§ãããŒãæ¢ããŠãããã®ã§ãããã©ããã確èªããããã§ããå Žåã¯ãã®å€ãè¿ããŸãã ãŸããããŒã空ãã©ããã確èªãã空ã®å Žåã¯æ€çŽ¢ãäžæ¢ããŸãã
ããŒãèŠã€ãããªãå Žåãã³ãŒãã¯ç©ºã®å€ãè¿ããŸãã
ãããã®æ€çŽ¢æäœã¯ãã¹ãŠãæ¿å ¥ãšåé€ãéããŠåæã«å®è¡ã§ããŸãã ããŒãã«å ã®åãã¢ã¯ããããŒã«é¢ããŠäžã§èª¬æãã XNUMX ã€ã®ç¶æ ã®ãã¡ã® XNUMX ã€ãæã¡ãŸãã
ããã·ã¥ããŒãã«ã§ã®åé€
ããŒãåé€ããã³ãŒã:
void gpu_hashtable_delete(KeyValue* hashtable, uint32_t key, uint32_t value)
{
uint32_t slot = hash(key);
while (true)
{
if (hashtable[slot].key == key)
{
hashtable[slot].value = kEmpty;
return;
}
if (hashtable[slot].key == kEmpty)
{
return;
}
slot = (slot + 1) & (kHashTableCapacity - 1);
}
}
ããŒã®åé€ã¯éåžžãšã¯ç°ãªãæ¹æ³ã§è¡ãããŸããããŒãããŒãã«ã«æ®ãããã®å€ (ããŒèªäœã§ã¯ãªã) ã空ãšããŠããŒã¯ããŸãã ãã®ã³ãŒãã¯ä»¥äžã«éåžžã«ãã䌌ãŠããŸã lookup()
ãã ããããŒã§äžèŽãèŠã€ãã£ãå Žåããã®å€ã¯ç©ºã«ãªããŸãã
åè¿°ããããã«ãããŒãã¹ãããã«æžã蟌ãŸãããšããã以éã¯ç§»åãããªããªããŸãã èŠçŽ ãããŒãã«ããåé€ãããŠããããŒã¯ãã®ãŸãŸæ®ãããã®å€ã¯åã«ç©ºã«ãªããŸãã ããã¯ãçŸåšã®å€ã空ãã©ããã¯é¢ä¿ãªããäŸç¶ãšããŠç©ºã«ãªããããã¹ãããå€ã«å¯ŸããŠã¢ãããã¯ãªæžã蟌ã¿æäœã䜿çšããå¿ èŠããªãããšãæå³ããŸãã
ããã·ã¥ããŒãã«ã®ãµã€ãºå€æŽ
ããã·ã¥ ããŒãã«ã®ãµã€ãºãå€æŽããã«ã¯ããã倧ããªããŒãã«ãäœæããå€ãããŒãã«ãã空ã§ãªãèŠçŽ ãããã«æ¿å ¥ããŸãã ãµã³ãã« ã³ãŒããã·ã³ãã«ã«ä¿ã¡ããã£ãã®ã§ããã®æ©èœã¯å®è£ ããŸããã§ããã ããã«ãCUDA ããã°ã©ã ã§ã¯ãã¡ã¢ãªå²ãåœãŠã¯ CUDA ã«ãŒãã«ã§ã¯ãªããã¹ã ã³ãŒãã§è¡ãããããšããããããŸãã
èšäº
競äºå
äžèšã®é¢æ°ã³ãŒãã¹ããããã§ã¯ gpu_hashtable_insert()
, _lookup()
О _delete()
äžåºŠã« XNUMX ã€ã®ããŒãšå€ã®ãã¢ãåŠçããŸãã ãããŠãããäœã gpu_hashtable_insert()
, _lookup()
О _delete()
ãã¢ã®é
åã䞊åã«åŠçããåãã¢ãåå¥ã® GPU å®è¡ã¹ã¬ããã§åŠçããŸãã
// CPU code to invoke the CUDA kernel on the GPU
uint32_t threadblocksize = 1024;
uint32_t gridsize = (numkvs + threadblocksize - 1) / threadblocksize;
gpu_hashtable_insert_kernel<<<gridsize, threadblocksize>>>(hashtable, kvs, numkvs);
// GPU code to process numkvs key/values in parallel
void gpu_hashtable_insert_kernel(KeyValue* hashtable, const KeyValue* kvs, unsigned int numkvs)
{
unsigned int threadid = blockIdx.x*blockDim.x + threadIdx.x;
if (threadid < numkvs)
{
gpu_hashtable_insert(hashtable, kvs[threadid].key, kvs[threadid].value);
}
}
ããã¯èæ§ã®ããããã·ã¥ ããŒãã«ã¯ãåææ¿å ¥ãæ€çŽ¢ãããã³åé€ããµããŒãããŸãã ããŒãšå€ã®ãã¢ã¯åžžã« XNUMX ã€ã®ç¶æ ã®ããããã«ãããããŒã¯ç§»åããªããããç°ãªãçš®é¡ã®æäœãåæã«äœ¿çšãããå Žåã§ããããŒãã«ã®æ£ç¢ºæ§ãä¿èšŒãããŸãã
ãã ããæ¿å
¥ãšåé€ã®ãããã䞊è¡ããŠåŠçãããã¢ã®å
¥åé
åã«éè€ããŒãå«ãŸããŠããå Žåãã©ã®ãã¢ããåã€ãããã€ãŸãããã·ã¥ ããŒãã«ã«æåŸã«æžã蟌ãŸãããã¢ãäºæž¬ããããšã¯ã§ããŸããã ãã¢ã®å
¥åé
åã䜿çšããŠæ¿å
¥ã³ãŒããåŒã³åºãããšããŸãã A/0 B/1 A/2 C/3 A/4
ã ã³ãŒããå®äºãããããã¢ãªã³ã°ããŸã B/1
О C/3
ããŒãã«å
ã«ååšããããšãä¿èšŒãããŠããŸãããåæã«ããããã®ãã¢ãããŒãã«å
ã«è¡šç€ºãããŸãã A/0
, A/2
ãŸã㯠A/4
ã ããã¯åé¡ã«ãªãå Žåãããã°ãåé¡ã«ãªããªãå ŽåããããŸãããã¹ãŠã¯ã¢ããªã±ãŒã·ã§ã³ã«ãã£ãŠç°ãªããŸãã å
¥åé
åã«éè€ããŒããªãããšãäºåã«ããã£ãŠããå Žåããã©ã®å€ãæåŸã«æžã蟌ãŸããããæ°ã«ããªãå ŽåããããŸãã
ãããåé¡ã«ãªãå Žåã¯ãéè€ãããã¢ãç°ãªã CUDA ã·ã¹ãã ã³ãŒã«ã«åå²ããå¿
èŠããããŸãã CUDA ã§ã¯ãã«ãŒãã«ãåŒã³åºãæäœã¯ãã¹ãŠã次ã®ã«ãŒãã«åŒã³åºãã®åã«åžžã«å®äºããŸã (å°ãªããšã XNUMX ã€ã®ã¹ã¬ããå
ã§ã¯ãç°ãªãã¹ã¬ããã§ã¯ãã«ãŒãã«ã¯äžŠåå®è¡ãããŸã)ã äžèšã®äŸã§ãXNUMX ã€ã®ã«ãŒãã«ã次ã®ããã«åŒã³åºããå Žåã A/0 B/1 A/2 C/3
ãããã³ä»ã® A/4
ã次ã«ã㌠A
å€ãååŸããŸã 4
.
次ã«ãé¢æ°ã次ã®ããã«ãã¹ããã©ããã«ã€ããŠè©±ããŸãããã lookup()
О delete()
ããã·ã¥ ããŒãã«å
ã®ãã¢ã®é
åãžã®ãã¬ãŒã³ãŸãã¯æ®çºæ§ãã€ã³ã¿ã䜿çšããŸãã
ã³ã³ãã€ã©ã¯ãã°ããŒãã« ã¡ã¢ãªãŸãã¯å ±æã¡ã¢ãªãžã®èªã¿åããšæžã蟌ã¿ãæé©åããããšãéžæã§ããŸãããããã®æé©åã¯ãããŒã¯ãŒãã䜿çšããŠç¡å¹ã«ã§ããŸãã
volatile
: ... ãã®å€æ°ãžã®åç §ã¯ãã¹ãŠãå®ã¡ã¢ãªã®èªã¿åããŸãã¯æžã蟌ã¿åœä»€ã«ã³ã³ãã€ã«ãããŸãã
æ£ç¢ºæ§ã«é¢ããèæ
®äºé
ã«ã¯é©çšã¯å¿
èŠãããŸãã volatile
ã å®è¡ã¹ã¬ããã以åã®èªã¿åãæäœã§ãã£ãã·ã¥ãããå€ã䜿çšããå Žåããããã«å€ãæ
å ±ã䜿çšããããšã«ãªããŸãã ãã ããããã¯ã«ãŒãã«åŒã³åºãã®ããæç¹ã§ã®ããã·ã¥ ããŒãã«ã®æ£ããç¶æ
ããã®æ
å ±ã§ãã ææ°ã®æ
å ±ã䜿çšããå¿
èŠãããå Žåã¯ãã€ã³ããã¯ã¹ã䜿çšã§ããŸãã volatile
ãã ãããã®åŸããã©ãŒãã³ã¹ã¯ãããã«äœäžããŸããç§ã®ãã¹ãã«ãããšã32 äžåã®èŠçŽ ãåé€ãããšãé床㯠500 ååé€/ç§ãã 450 å XNUMX äžåé€/ç§ã«äœäžããŸããã
ÐÑПОзвПЎОÑелÑМПÑÑÑ
64 äžåã®èŠçŽ ãæ¿å
¥ãããã®ãã¡ 32 äžåãåé€ãããã¹ãã§ã¯ã std::unordered_map
ãããŠãGPU ã®ããã·ã¥ ããŒãã«ã¯äºå®äžãããŸããã
std::unordered_map
èŠçŽ ã®æ¿å
¥ãšåé€ãããã³èŠçŽ ã®è§£æŸã« 70 ããªç§ãè²»ãããŸãã unordered_map
(äœçŸäžãã®èŠçŽ ãåé€ããã«ã¯å€å€§ãªæéãããããŸãã unordered_map
è€æ°ã®ã¡ã¢ãªå²ãåœãŠãè¡ãããŸã)ã æ£çŽèšãã°ã std:unordered_map
å
šãç°ãªãå¶éã ããã¯åäžã® CPU ã¹ã¬ããã§å®è¡ããããããããµã€ãºã® Key-Value ããµããŒãããé«ã䜿çšçã§åªããããã©ãŒãã³ã¹ãçºæ®ããè€æ°ã®åé€åŸãå®å®ããããã©ãŒãã³ã¹ã瀺ããŸãã
GPU ãšããã°ã©ã ééä¿¡ã®ããã·ã¥ ããŒãã«ã®ç¶ç¶æé㯠984 ããªç§ã§ããã ããã«ã¯ãããŒãã«ã®ã¡ã¢ãªãžã®é 眮ãšåé€ (äžåºŠã« 1 GB ã®ã¡ã¢ãªãå²ãåœãŠããããCUDA ã§ã¯æéãããããŸã)ãèŠçŽ ã®æ¿å ¥ãšåé€ãããã³ãããã®å埩ã«è²»ããããæéãå«ãŸããŸãã ãã㪠ã«ãŒã ã¡ã¢ãªãšã®éã®ãã¹ãŠã®ã³ããŒãèæ ®ãããŸãã
ããã·ã¥ ããŒãã«èªäœãå®äºãããŸã§ã« 271 ããªç§ããããŸããã ããã«ã¯ããã㪠ã«ãŒããèŠçŽ ã®æ¿å ¥ãšåé€ã«è²»ãããæéãå«ãŸããŸãããã¡ã¢ãªãžã®ã³ããŒãšçµæã®ããŒãã«ã®å埩ã«è²»ãããæéã¯èæ ®ãããŠããŸããã GPU ããŒãã«ãé·æéåç¶ããå ŽåããŸãã¯ããã·ã¥ ããŒãã«å šäœããã㪠ã«ãŒãã®ã¡ã¢ãªå ã«å«ãŸããŠããå Žå (ããšãã°ãäžå€®ããã»ããµã§ã¯ãªãä»ã® GPU ã³ãŒãã§äœ¿çšãããããã·ã¥ ããŒãã«ãäœæããå Žå)ããã¹ãçµæãé¢ä¿ããŸãã
ãã㪠ã«ãŒãã®ããã·ã¥ ããŒãã«ã¯ãé«ã¹ã«ãŒããããšã¢ã¯ãã£ããªäžŠååã«ããé«ãããã©ãŒãã³ã¹ãçºæ®ããŸãã
å¶éäºé
ããã·ã¥ ããŒãã«ã®ã¢ãŒããã¯ãã£ã«ã¯ã泚æãã¹ãåé¡ãããã€ããããŸãã
- ç·åœ¢ãããŒãã¯ã¯ã©ã¹ã¿ãªã³ã°ã«ãã£ãŠåŠšããããããŒãã«å ã®ããŒãå®å šã«é 眮ãããªããªããŸãã
- é¢æ°ã䜿çšããŠããŒã¯åé€ãããŸãã
delete
ãããŠæéãçµã€ãšããŒãã«ãæ£ããã£ãŠããŸããŸãã
ãã®çµæãç¹ã«ããã·ã¥ ããŒãã«ãé·æéååšããå€æ°ã®æ¿å ¥ãšåé€ãè¡ãããå Žåãããã·ã¥ ããŒãã«ã®ããã©ãŒãã³ã¹ãåŸã ã«äœäžããå¯èœæ§ããããŸãã ãããã®æ¬ ç¹ã軜æžãã XNUMX ã€ã®æ¹æ³ã¯ãããªãäœã䜿çšçã§æ°ããããŒãã«ã«åããã·ã¥ããåããã·ã¥äžã«åé€ãããããŒããã£ã«ã¿ãŒã§é€å€ããããšã§ãã
説æããåé¡ã説æããããã«ãäžèšã®ã³ãŒãã䜿çšã㊠128 å 4 äžã®èŠçŽ ãæã€ããŒãã«ãäœæãã124 å 0,96 äžã®ã¹ããããåãŸããŸã§ 4 äžã®èŠçŽ ãã«ãŒãããŸã (䜿çšçã¯çŽ XNUMX)ã ããã¯çµæããŒãã«ã§ããåè¡ã¯ãXNUMX äžã®æ°ããèŠçŽ ã XNUMX ã€ã®ããã·ã¥ ããŒãã«ã«æ¿å ¥ããããã® CUDA ã«ãŒãã«åŒã³åºãã§ãã
䜿çšç
æ¿å
¥æé 4 èŠçŽ
0,00
11,608448 ããªç§ (361,314798 äžããŒ/ç§)
0,03
11,751424 ããªç§ (356,918799 äžããŒ/ç§)
0,06
11,942592 ããªç§ (351,205515 äžããŒ/ç§)
0,09
12,081120 ããªç§ (347,178429 äžããŒ/ç§)
0,12
12,242560 ããªç§ (342,600233 äžããŒ/ç§)
0,16
12,396448 ããªç§ (338,347235 äžããŒ/ç§)
0,19
12,533024 ããªç§ (334,660176 äžããŒ/ç§)
0,22
12,703328 ããªç§ (330,173626 äžããŒ/ç§)
0,25
12,884512 ããªç§ (325,530693 äžããŒ/ç§)
0,28
13,033472 ããªç§ (321,810182 äžããŒ/ç§)
0,31
13,239296 ããªç§ (316,807174 äžããŒ/ç§)
0,34
13,392448 ããªç§ (313,184256 äžããŒ/ç§)
0,37
13,624000 ããªç§ (307,861434 äžããŒ/ç§)
0,41
13,875520 ããªç§ (302,280855 äžããŒ/ç§)
0,44
14,126528 ããªç§ (296,909756 äžããŒ/ç§)
0,47
14,399328 ããªç§ (291,284699 äžããŒ/ç§)
0,50
14,690304 ããªç§ (285,515123 äžããŒ/ç§)
0,53
15,039136 ããªç§ (278,892623 äžããŒ/ç§)
0,56
15,478656 ããªç§ (270,973402 äžããŒ/ç§)
0,59
15,985664 ããªç§ (262,379092 äžããŒ/ç§)
0,62
16,668673 ããªç§ (251,627968 äžããŒ/ç§)
0,66
17,587200 ããªç§ (238,486174 äžããŒ/ç§)
0,69
18,690048 ããªç§ (224,413765 äžããŒ/ç§)
0,72
20,278816 ããªç§ (206,831789 äžããŒ/ç§)
0,75
22,545408 ããªç§ (186,038058 äžããŒ/ç§)
0,78
26,053312 ããªç§ (160,989275 äžããŒ/ç§)
0,81
31,895008 ããªç§ (131,503463 äžããŒ/ç§)
0,84
42,103294 ããªç§ (99,619378 äžããŒ/ç§)
0,87
61,849056 ããªç§ (67,815164 äžããŒ/ç§)
0,90
105,695999 ããªç§ (39,682713 äžããŒ/ç§)
0,94
240,204636 ããªç§ (17,461378 äžããŒ/ç§)
䜿çšçãå¢å ãããšãããã©ãŒãã³ã¹ãäœäžããŸãã ããã¯ã»ãšãã©ã®å ŽåãæãŸãããªãããšã§ãã ã¢ããªã±ãŒã·ã§ã³ãããŒãã«ã«èŠçŽ ãæ¿å ¥ãããã®åŸããããç Žæ£ããå Žå (ããšãã°ãæ¬ã®äžã®åèªãæ°ããå Žå)ãããã¯åé¡ã§ã¯ãããŸããã ãã ããã¢ããªã±ãŒã·ã§ã³ãé·æéåç¶ããããã·ã¥ ããŒãã«ã䜿çšããå Žå (ããšãã°ããŠãŒã¶ãŒãæ å ±ãé »ç¹ã«æ¿å ¥ããã³åé€ããç»åã®ç©ºã§ãªãéšåãä¿åããã°ã©ãã£ã㯠ãšãã£ã¿ãŒãªã©)ããã®åäœã¯åé¡ãšãªãå¯èœæ§ããããŸãã
ãããŠã64 äžåã®æ¿å ¥åŸã®ããã·ã¥ ããŒãã«ã®ãããŒãã®æ·±ãã枬å®ããŸãã (å©çšç 0,5)ã å¹³åã®æ·±ã㯠0,4774 ã§ãã£ããããã»ãšãã©ã®ããŒã¯å¯èœãªéãæé©ãªã¹ãããããŸãã¯æé©ãªäœçœ®ãã 60 ã¹ãããé¢ããå Žæã«ãããŸããã æ倧深床ã¯XNUMXã§ããã
次ã«ã124 å 0,97 äžåã®ã€ã³ãµãŒããåããããŒãã«ã®ãããŒãæ·±ãã枬å®ããŸãã (å©çšç 10,1757)ã å¹³å深床ã¯ãã§ã« XNUMX ã§ãããæ倧深床㯠- 6474 (!!)ã 䜿çšçãé«ããªããšããªã㢠ã»ã³ã·ã³ã°ã®ããã©ãŒãã³ã¹ãå€§å¹ ã«äœäžããŸãã
ãã®ããã·ã¥ ããŒãã«ã®äœ¿çšçãäœãä¿ã€ããšãæåã§ãã ãã ããã¡ã¢ãªæ¶è²»ãç ç²ã«ããŠããã©ãŒãã³ã¹ãåäžãããŸãã 幞ããªããšã«ã32 ãããã®ããŒãšå€ã®å Žåãããã¯æ£åœåã§ããŸãã äžèšã®äŸã§ã128 å 0,25 äžåã®èŠçŽ ãå«ãããŒãã«ã§äœ¿çšçã 32 ã«ç¶æãããšãããã«é 眮ã§ããèŠçŽ 㯠96 äžåãŸã§ãšãªããæ®ãã® 8 äžåã®ã¹ããã (ãã¢ããšã« 768 ãã€ã) ã倱ãããŸãã , 倱ãããèšæ¶ã¯XNUMXMBã
ã·ã¹ãã ã¡ã¢ãªããã貎éãªãªãœãŒã¹ã§ãããã㪠ã«ãŒã ã¡ã¢ãªã®æ倱ã«ã€ããŠè©±ããŠããããšã«æ³šæããŠãã ããã CUDA ããµããŒãããã»ãšãã©ã®ææ°ã®ãã¹ã¯ããã ã°ã©ãã£ãã¯ã¹ ã«ãŒãã«ã¯å°ãªããšã 4 GB ã®ã¡ã¢ãªãæèŒãããŠããŸãã (ãã®èšäºã®å·çæç¹ã§ã¯ãNVIDIA 2080 Ti ã«ã¯ 11 GB ã®ã¡ã¢ãªãæèŒãããŠããŸã)ãããã§ããã®ãããªéã倱ãã®ã¯è³¢æãªæ±ºå®ãšã¯èšããŸããã
åŸã§ããããŒãã®æ·±ãã«åé¡ããªããã㪠ã«ãŒãã®ããã·ã¥ ããŒãã«ã®äœæãšãåé€ãããã¹ããããåå©çšããæ¹æ³ã«ã€ããŠè©³ããæžããŸãã
枬深深床枬å®
ããŒã®ãããŒãã®æ·±ãã決å®ããã«ã¯ãå®éã®ããŒãã« ã€ã³ããã¯ã¹ããããŒã®ããã·ã¥ (çæ³çãªããŒãã« ã€ã³ããã¯ã¹) ãæœåºããŸãã
// get_key_index() -> index of key in hash table
uint32_t probelength = (get_key_index(key) - hash(key)) & (hashtablecapacity-1);
1 ã® 3 ã®è£æ°ã® 4 é²æ°ã®éæ³ãšãããã·ã¥ ããŒãã«ã®å®¹éã XNUMX ã® XNUMX ä¹ã§ãããšããäºå®ã«ããããã®ã¢ãããŒãã¯ãã㌠ã€ã³ããã¯ã¹ãããŒãã«ã®å
é ã«ç§»åãããå Žåã§ãæ©èœããŸãã XNUMX ã«ããã·ã¥ãããã¹ããã XNUMX ã«æ¿å
¥ãããããŒãèããŠã¿ãŸãããã次ã«ã容é XNUMX ã®ããŒãã«ã®å Žåã次ã®çµæãåŸãããŸãã (3 â 1) & 3
ããã㯠2 ã«çžåœããŸãã
ãŸãšã
ã質åããæèŠãããããŸãããã次ã®ã¢ãã¬ã¹ã«ã¡ãŒã«ããŠãã ããã
ãã®ã³ãŒãã¯ãåªããèšäºããã€ã³ã¹ãã¬ãŒã·ã§ã³ãåããŠäœæãããŸããã
äžçã§æãã·ã³ãã«ãªããã¯ããªãŒã®ããã·ã¥ ããŒãã« ããã¯ããªãŒããŠã§ã€ãããªãŒã®ããã·ã¥ããŒãã«
ä»åŸãããã㪠ã«ãŒãã®ããã·ã¥ ããŒãã«ã®å®è£
ã«ã€ããŠæžãããã®ããã©ãŒãã³ã¹ãåæããŠãããŸãã ç§ã®èšç»ã«ã¯ãGPU ã«é©ããããŒã¿æ§é ã§ã®ã¢ãããã¯æäœã䜿çšãããã§ãŒã³åãããã³ ããã ããã·ã¥ãããã³ã«ãã³ãŒ ããã·ã¥ãå«ãŸããŸãã
åºæïŒ habr.com