ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜

์ฝ”์Šค ์‹œ์ž‘ ์ „ "๊ฐœ๋ฐœ์ž๋ฅผ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜" ๋‹ค๋ฅธ ์œ ์šฉํ•œ ์ž๋ฃŒ์˜ ๋ฒˆ์—ญ์„ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ—ˆํ”„๋งŒ ์ฝ”๋”ฉ์€ ํŒŒ์ผ ์••์ถ•์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด๋ฅผ ๊ณต์‹ํ™”ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด ๊ธฐ์‚ฌ์—์„œ๋Š” ๊ณ ์ • ๋ฐ ๊ฐ€๋ณ€ ๊ธธ์ด ์ธ์ฝ”๋”ฉ, ๊ณ ์œ ํ•˜๊ฒŒ ๋””์ฝ”๋”ฉ ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ, ์ ‘๋‘์‚ฌ ๊ทœ์น™ ๋ฐ Huffman ํŠธ๋ฆฌ ๊ตฌ์ถ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” ๊ฐ ๋ฌธ์ž๊ฐ€ 0๊ณผ 1์˜ ์‹œํ€€์Šค๋กœ ์ €์žฅ๋˜๊ณ  8๋น„ํŠธ๋ฅผ ์ฐจ์ง€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ฌธ์ž๊ฐ€ ๋™์ผํ•œ ๊ณ ์ • ๋น„ํŠธ ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ๊ณ ์ • ๊ธธ์ด ์ธ์ฝ”๋”ฉ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

ํ…์ŠคํŠธ๊ฐ€ ์ฃผ์–ด์กŒ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ๋‹จ์ผ ๋ฌธ์ž๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณต๊ฐ„์„ ์–ด๋–ป๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ฃผ์š” ์•„์ด๋””์–ด๋Š” ๊ฐ€๋ณ€ ๊ธธ์ด ์ธ์ฝ”๋”ฉ์ž…๋‹ˆ๋‹ค. ํ…์ŠคํŠธ์˜ ์ผ๋ถ€ ๋ฌธ์ž๊ฐ€ ๋‹ค๋ฅธ ๋ฌธ์ž๋ณด๋‹ค ๋” ์ž์ฃผ ๋ฐœ์ƒํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์—ฌ๊ธฐ๋ฅผ ๋ณด์•„๋ผ.) ๋” ์ ์€ ๋น„ํŠธ๋กœ ๋™์ผํ•œ ๋ฌธ์ž ์‹œํ€€์Šค๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด ์ธ์ฝ”๋”ฉ์—์„œ๋Š” ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ์— ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ๋‚˜ํƒ€๋‚˜๋Š”์ง€์— ๋”ฐ๋ผ ๋ฌธ์ž์— ๊ฐ€๋ณ€ ๋น„ํŠธ ์ˆ˜๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ ์ผ๋ถ€ ๋ฌธ์ž๋Š” 1๋น„ํŠธ๋งŒ ๊ฑธ๋ฆด ์ˆ˜๋„ ์žˆ๊ณ  ๋‹ค๋ฅธ ๋ฌธ์ž๋Š” 2๋น„ํŠธ, 3๋น„ํŠธ ์ด์ƒ ๊ฑธ๋ฆด ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด ์ธ์ฝ”๋”ฉ์˜ ๋ฌธ์ œ๋Š” ์‹œํ€€์Šค์˜ ํ›„์† ๋””์ฝ”๋”ฉ์—๋งŒ ์žˆ์Šต๋‹ˆ๋‹ค.

์–ด๋–ป๊ฒŒ ๋น„ํŠธ ์‹œํ€€์Šค๋ฅผ ์•Œ๋ฉด ๋ช…ํ™•ํ•˜๊ฒŒ ๋””์ฝ”๋”ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์„ ์„ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค "์•„๋ฐ•๋‹ต". 8์ž์ด๋ฉฐ ๊ณ ์ • ๊ธธ์ด๋ฅผ ์ธ์ฝ”๋”ฉํ•  ๋•Œ ์ €์žฅํ•˜๋ ค๋ฉด 64๋น„ํŠธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐํ˜ธ ์ฃผํŒŒ์ˆ˜ "a", "b", "c" ะธ "NS" ๊ฐ๊ฐ 4, 2, 1, 1๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ƒ์ƒํ•ด ๋ด…์‹œ๋‹ค "์•„๋ฐ•๋‹ต" ๋” ์ ์€ ๋น„ํŠธ, ์‚ฌ์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ "์—" ๋ณด๋‹ค ์ž์ฃผ ๋ฐœ์ƒ "NS"๊ณผ "NS" ๋ณด๋‹ค ์ž์ฃผ ๋ฐœ์ƒ "์”จ" ะธ "NS". ์ฝ”๋”ฉ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์ž "์—" 0๊ณผ ๊ฐ™์€ XNUMX๋น„ํŠธ, "NS" 11๋น„ํŠธ ์ฝ”๋“œ 100์„ ํ• ๋‹นํ•˜๊ณ  011๋น„ํŠธ XNUMX๊ณผ XNUMX์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. "์”จ" ะธ "NS".

๊ฒฐ๊ณผ์ ์œผ๋กœ ๋‹ค์Œ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

a
0

b
11

c
100

d
011

๊ทธ๋ž˜์„œ ๋ผ์ธ "์•„๋ฐ•๋‹ต" ์šฐ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค 00110100011011 (0|0|11|0|100|011|0|11)์œ„์˜ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ. ๊ทธ๋Ÿฌ๋‚˜ ์ฃผ์š” ๋ฌธ์ œ๋Š” ๋””์ฝ”๋”ฉ์— ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์ž์—ด์„ ํ•ด๋…ํ•˜๋ ค๊ณ  ํ•  ๋•Œ 00110100011011, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ชจํ˜ธํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

0|011|0|100|011|0|11    adacdab
0|0|11|0|100|0|11|011   aabacabd
0|011|0|100|0|11|0|11   adacabab 

...
๊ธฐํƒ€

์ด๋Ÿฌํ•œ ๋ชจํ˜ธ์„ฑ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ์ธ์ฝ”๋”ฉ์ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐœ๋…์„ ์ถฉ์กฑํ•˜๋Š”์ง€ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ ‘๋‘์‚ฌ ๊ทœ์น™, ์ด๋Š” ์ฝ”๋“œ๊ฐ€ ํ•˜๋‚˜์˜ ๊ณ ์œ ํ•œ ๋ฐฉ์‹์œผ๋กœ๋งŒ ๋””์ฝ”๋”ฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ ‘๋‘์‚ฌ ๊ทœ์น™์€ ์–ด๋–ค ์ฝ”๋“œ๋„ ๋‹ค๋ฅธ ์ฝ”๋“œ์˜ ์ ‘๋‘์‚ฌ๊ฐ€ ๋˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ฝ”๋“œ๋ž€ ํŠน์ • ๋ฌธ์ž๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋น„ํŠธ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ์˜ˆ์—์„œ 0 ์ ‘๋‘์‚ฌ์ž…๋‹ˆ๋‹ค 011, ์ ‘๋‘์‚ฌ ๊ทœ์น™์„ ์œ„๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ฝ”๋“œ๊ฐ€ ์ ‘๋‘์‚ฌ ๊ทœ์น™์„ ์ถฉ์กฑํ•˜๋ฉด ๊ณ ์œ ํ•˜๊ฒŒ ๋””์ฝ”๋”ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€).

์œ„์˜ ์˜ˆ๋ฅผ ๋‹ค์‹œ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ์—๋Š” ๊ธฐํ˜ธ์— ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. "a", "b", "c" ะธ "NS" ์ ‘๋‘์‚ฌ ๊ทœ์น™์„ ๋งŒ์กฑํ•˜๋Š” ์ฝ”๋“œ.

a
0

b
10

c
110

d
111

์ด ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ฌธ์ž์—ด "์•„๋ฐ•๋‹ต" ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ธ์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค. 00100100011010 (0|0|10|0|100|011|0|10). ํ•˜์ง€๋งŒ, 00100100011010 ์ด๋ฏธ ๋ช…ํ™•ํ•˜๊ฒŒ ๋””์ฝ”๋”ฉํ•˜๊ณ  ์›๋ž˜ ๋ฌธ์ž์—ด๋กœ ๋Œ์•„๊ฐˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. "์•„๋ฐ•๋‹ต".

ํ—ˆํ”„๋งŒ ์ฝ”๋”ฉ

๊ฐ€๋ณ€ ๊ธธ์ด ์ธ์ฝ”๋”ฉ ๋ฐ ์ ‘๋‘์‚ฌ ๊ทœ์น™์„ ๋‹ค๋ฃจ์—ˆ์œผ๋ฏ€๋กœ ์ด์ œ Huffman ์ธ์ฝ”๋”ฉ์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด ๋ฐฉ๋ฒ•์€ ์ด์ง„ ํŠธ๋ฆฌ ์ƒ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ๋…ธ๋“œ๋Š” ์ตœ์ข… ๋˜๋Š” ๋‚ด๋ถ€์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒ˜์Œ์— ๋ชจ๋“  ๋…ธ๋“œ๋Š” ๊ธฐํ˜ธ ์ž์ฒด์™€ ํ•ด๋‹น ๊ฐ€์ค‘์น˜(์ฆ‰, ๋ฐœ์ƒ ๋นˆ๋„)๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฆฌํ”„(ํ„ฐ๋ฏธ๋„)๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค. ๋‚ด๋ถ€ ๋…ธ๋“œ๋Š” ์บ๋ฆญํ„ฐ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํฌํ•จํ•˜๋ฉฐ ๋‘ ๊ฐœ์˜ ํ•˜์œ„ ๋…ธ๋“œ๋ฅผ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ํ•ฉ์˜์— ์˜ํ•ด ๋น„ํŠธ ยซ0ยป ์™ผ์ชฝ ๋ถ„๊ธฐ ๋‹ค์Œ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ยซ1ยป - ์˜ค๋ฅธ์ชฝ์œผ๋กœ. ์ „์ฒด ํŠธ๋ฆฌ์—์„œ N ๋‚˜๋ญ‡์žŽ๊ณผ N-1 ๋‚ด๋ถ€ ๋…ธ๋“œ. ํ—ˆํ”„๋งŒ ํŠธ๋ฆฌ๋ฅผ ๊ตฌ์„ฑํ•  ๋•Œ ์ตœ์ ์˜ ๊ธธ์ด ์ฝ”๋“œ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ธฐํ˜ธ๋ฅผ ๋ฒ„๋ฆฌ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

์šฐ์„  ์ˆœ์œ„ ๋Œ€๊ธฐ์—ด์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผํŒŒ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ ๋…ธ๋“œ์— ๊ฐ€์žฅ ๋†’์€ ์šฐ์„  ์ˆœ์œ„๊ฐ€ ๋ถ€์—ฌ๋˜๋Š” Huffman ํŠธ๋ฆฌ๋ฅผ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์„ฑ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ๊ฐ ์บ๋ฆญํ„ฐ์— ๋Œ€ํ•œ ๋ฆฌํ”„ ๋…ธ๋“œ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์šฐ์„ ์ˆœ์œ„ ๋Œ€๊ธฐ์—ด์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  2. ๋Œ€๊ธฐ์—ด์— ์‹œํŠธ๊ฐ€ ๋‘ ๊ฐœ ์ด์ƒ ์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    • ๋Œ€๊ธฐ์—ด์—์„œ ์šฐ์„  ์ˆœ์œ„๊ฐ€ ๊ฐ€์žฅ ๋†’์€(๊ฐ€์žฅ ๋‚ฎ์€ ๋นˆ๋„) ๋‘ ๋…ธ๋“œ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ๋‘ ๋…ธ๋“œ๊ฐ€ ์ž์‹์ด ๋  ์ƒˆ ๋‚ด๋ถ€ ๋…ธ๋“œ๋ฅผ ๋งŒ๋“ค๊ณ  ๋ฐœ์ƒ ๋นˆ๋„๋Š” ์ด ๋‘ ๋…ธ๋“œ์˜ ๋นˆ๋„์˜ ํ•ฉ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
    • ์šฐ์„  ์ˆœ์œ„ ๋Œ€๊ธฐ์—ด์— ์ƒˆ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋‚จ์€ ์œ ์ผํ•œ ๋…ธ๋“œ๋Š” ๋ฃจํŠธ๊ฐ€ ๋  ๊ฒƒ์ด๋ฉฐ ์ด๊ฒƒ์œผ๋กœ ํŠธ๋ฆฌ ๊ตฌ์„ฑ์ด ์™„๋ฃŒ๋ฉ๋‹ˆ๋‹ค.

๋ฌธ์ž๋กœ๋งŒ ๊ตฌ์„ฑ๋œ ํ…์ŠคํŠธ๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ์ƒํ•ด๋ณด์‹ญ์‹œ์˜ค. "a", "b", "c", "d" ะธ "๊ณผ", ๋ฐœ์ƒ ๋นˆ๋„๋Š” ๊ฐ๊ฐ 15, 7, 6, 6 ๋ฐ 5์ž…๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋‹จ๊ณ„๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.

ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜

ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜

ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜

ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜

ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜

๋ฃจํŠธ์—์„œ ๋ ๋…ธ๋“œ๊นŒ์ง€์˜ ๊ฒฝ๋กœ๋Š” ํ•ด๋‹น ๋ ๋…ธ๋“œ์™€ ๊ด€๋ จ๋œ ๋ฌธ์ž์— ํ•ด๋‹นํ•˜๋Š” ์ตœ์ ์˜ ์ ‘๋‘์‚ฌ ์ฝ”๋“œ(ํ—ˆํ”„๋งŒ ์ฝ”๋“œ๋ผ๊ณ ๋„ ํ•จ)๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

ํ—ˆํ”„๋งŒ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜
ํ—ˆํ”„๋งŒ ํŠธ๋ฆฌ

์•„๋ž˜์—์„œ C++ ๋ฐ Java๋กœ Huffman ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#include <iostream>
#include <string>
#include <queue>
#include <unordered_map>
using namespace std;

// A Tree node
struct Node
{
	char ch;
	int freq;
	Node *left, *right;
};

// Function to allocate a new tree node
Node* getNode(char ch, int freq, Node* left, Node* right)
{
	Node* node = new Node();

	node->ch = ch;
	node->freq = freq;
	node->left = left;
	node->right = right;

	return node;
}

// Comparison object to be used to order the heap
struct comp
{
	bool operator()(Node* l, Node* r)
	{
		// highest priority item has lowest frequency
		return l->freq > r->freq;
	}
};

// traverse the Huffman Tree and store Huffman Codes
// in a map.
void encode(Node* root, string str,
			unordered_map<char, string> &huffmanCode)
{
	if (root == nullptr)
		return;

	// found a leaf node
	if (!root->left && !root->right) {
		huffmanCode[root->ch] = str;
	}

	encode(root->left, str + "0", huffmanCode);
	encode(root->right, str + "1", huffmanCode);
}

// traverse the Huffman Tree and decode the encoded string
void decode(Node* root, int &index, string str)
{
	if (root == nullptr) {
		return;
	}

	// found a leaf node
	if (!root->left && !root->right)
	{
		cout << root->ch;
		return;
	}

	index++;

	if (str[index] =='0')
		decode(root->left, index, str);
	else
		decode(root->right, index, str);
}

// Builds Huffman Tree and decode given input text
void buildHuffmanTree(string text)
{
	// count frequency of appearance of each character
	// and store it in a map
	unordered_map<char, int> freq;
	for (char ch: text) {
		freq[ch]++;
	}

	// Create a priority queue to store live nodes of
	// Huffman tree;
	priority_queue<Node*, vector<Node*>, comp> pq;

	// Create a leaf node for each character and add it
	// to the priority queue.
	for (auto pair: freq) {
		pq.push(getNode(pair.first, pair.second, nullptr, nullptr));
	}

	// do till there is more than one node in the queue
	while (pq.size() != 1)
	{
		// Remove the two nodes of highest priority
		// (lowest frequency) from the queue
		Node *left = pq.top(); pq.pop();
		Node *right = pq.top();	pq.pop();

		// Create a new internal node with these two nodes
		// as children and with frequency equal to the sum
		// of the two nodes' frequencies. Add the new node
		// to the priority queue.
		int sum = left->freq + right->freq;
		pq.push(getNode('', sum, left, right));
	}

	// root stores pointer to root of Huffman Tree
	Node* root = pq.top();

	// traverse the Huffman Tree and store Huffman Codes
	// in a map. Also prints them
	unordered_map<char, string> huffmanCode;
	encode(root, "", huffmanCode);

	cout << "Huffman Codes are :n" << 'n';
	for (auto pair: huffmanCode) {
		cout << pair.first << " " << pair.second << 'n';
	}

	cout << "nOriginal string was :n" << text << 'n';

	// print encoded string
	string str = "";
	for (char ch: text) {
		str += huffmanCode[ch];
	}

	cout << "nEncoded string is :n" << str << 'n';

	// traverse the Huffman Tree again and this time
	// decode the encoded string
	int index = -1;
	cout << "nDecoded string is: n";
	while (index < (int)str.size() - 2) {
		decode(root, index, str);
	}
}

// Huffman coding algorithm
int main()
{
	string text = "Huffman coding is a data compression algorithm.";

	buildHuffmanTree(text);

	return 0;
}

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

// A Tree node
class Node
{
	char ch;
	int freq;
	Node left = null, right = null;

	Node(char ch, int freq)
	{
		this.ch = ch;
		this.freq = freq;
	}

	public Node(char ch, int freq, Node left, Node right) {
		this.ch = ch;
		this.freq = freq;
		this.left = left;
		this.right = right;
	}
};

class Huffman
{
	// traverse the Huffman Tree and store Huffman Codes
	// in a map.
	public static void encode(Node root, String str,
							  Map<Character, String> huffmanCode)
	{
		if (root == null)
			return;

		// found a leaf node
		if (root.left == null && root.right == null) {
			huffmanCode.put(root.ch, str);
		}


		encode(root.left, str + "0", huffmanCode);
		encode(root.right, str + "1", huffmanCode);
	}

	// traverse the Huffman Tree and decode the encoded string
	public static int decode(Node root, int index, StringBuilder sb)
	{
		if (root == null)
			return index;

		// found a leaf node
		if (root.left == null && root.right == null)
		{
			System.out.print(root.ch);
			return index;
		}

		index++;

		if (sb.charAt(index) == '0')
			index = decode(root.left, index, sb);
		else
			index = decode(root.right, index, sb);

		return index;
	}

	// Builds Huffman Tree and huffmanCode and decode given input text
	public static void buildHuffmanTree(String text)
	{
		// count frequency of appearance of each character
		// and store it in a map
		Map<Character, Integer> freq = new HashMap<>();
		for (int i = 0 ; i < text.length(); i++) {
			if (!freq.containsKey(text.charAt(i))) {
				freq.put(text.charAt(i), 0);
			}
			freq.put(text.charAt(i), freq.get(text.charAt(i)) + 1);
		}

		// Create a priority queue to store live nodes of Huffman tree
		// Notice that highest priority item has lowest frequency
		PriorityQueue<Node> pq = new PriorityQueue<>(
										(l, r) -> l.freq - r.freq);

		// Create a leaf node for each character and add it
		// to the priority queue.
		for (Map.Entry<Character, Integer> entry : freq.entrySet()) {
			pq.add(new Node(entry.getKey(), entry.getValue()));
		}

		// do till there is more than one node in the queue
		while (pq.size() != 1)
		{
			// Remove the two nodes of highest priority
			// (lowest frequency) from the queue
			Node left = pq.poll();
			Node right = pq.poll();

			// Create a new internal node with these two nodes as children 
			// and with frequency equal to the sum of the two nodes
			// frequencies. Add the new node to the priority queue.
			int sum = left.freq + right.freq;
			pq.add(new Node('', sum, left, right));
		}

		// root stores pointer to root of Huffman Tree
		Node root = pq.peek();

		// traverse the Huffman tree and store the Huffman codes in a map
		Map<Character, String> huffmanCode = new HashMap<>();
		encode(root, "", huffmanCode);

		// print the Huffman codes
		System.out.println("Huffman Codes are :n");
		for (Map.Entry<Character, String> entry : huffmanCode.entrySet()) {
			System.out.println(entry.getKey() + " " + entry.getValue());
		}

		System.out.println("nOriginal string was :n" + text);

		// print encoded string
		StringBuilder sb = new StringBuilder();
		for (int i = 0 ; i < text.length(); i++) {
			sb.append(huffmanCode.get(text.charAt(i)));
		}

		System.out.println("nEncoded string is :n" + sb);

		// traverse the Huffman Tree again and this time
		// decode the encoded string
		int index = -1;
		System.out.println("nDecoded string is: n");
		while (index < sb.length() - 2) {
			index = decode(root, index, sb);
		}
	}

	public static void main(String[] args)
	{
		String text = "Huffman coding is a data compression algorithm.";

		buildHuffmanTree(text);
	}
}

์ฐธ๊ณ  : ์ž…๋ ฅ ๋ฌธ์ž์—ด์ด ์‚ฌ์šฉํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ๋Š” 47 * 8 = 376๋น„ํŠธ์ด๊ณ  ์ธ์ฝ”๋”ฉ๋œ ๋ฌธ์ž์—ด์€ 194๋น„ํŠธ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ์•ฝ 48% ์••์ถ•๋ฉ๋‹ˆ๋‹ค. ์œ„์˜ C++ ํ”„๋กœ๊ทธ๋žจ์—์„œ ๋ฌธ์ž์—ด ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ”„๋กœ๊ทธ๋žจ์„ ์ฝ์„ ์ˆ˜ ์žˆ๋„๋ก ์ธ์ฝ”๋”ฉ๋œ ๋ฌธ์ž์—ด์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

ํšจ์œจ์ ์ธ ์šฐ์„  ์ˆœ์œ„ ํ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋Š” ์‚ฝ์ž…๋งˆ๋‹ค ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— O(๋กœ๊ทธ(N)) ํ•˜์ง€๋งŒ ์™„์ „ํ•œ ์ด์ง„ ํŠธ๋ฆฌ์—์„œ N ์žŽ์‚ฌ๊ท€ 2N-1 ๋…ธ๋“œ์ด๊ณ  Huffman ํŠธ๋ฆฌ๋Š” ์™„์ „ํ•œ ์ด์ง„ ํŠธ๋ฆฌ์ด๋ฉฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. O(Nlog(N)) ์‹œ๊ฐ„, ์–ด๋””์„œ N - ์บ๋ฆญํ„ฐ.

์ถœ์ฒ˜ :

en.wikipedia.org/wiki/Huffman_coding
en.wikipedia.org/wiki/Variable-length_code
www.youtube.com/watch?v=5wRPin4oxCo

๊ณผ์ •์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€