å°å ¥
ãã¹ãŠã¯ãäœææ å ±ãçµã¿åãããããã®çãã¹ã¯ãªããããå§ãŸããŸããã ã¡ãŒã« åŸæ¥å¡ã¯ã¡ãŒãªã³ã° ãªã¹ã ãŠãŒã¶ãŒã®ãªã¹ãããååŸãããåŸæ¥å¡ã®åœ¹è·ã¯äººäºéšéã®ããŒã¿ããŒã¹ããååŸãããŸãã äž¡æ¹ã®ãªã¹ã㯠Unicode ããã¹ã ãã¡ã€ã«ã«ãšã¯ã¹ããŒããããŸãã UTF-8 Unix ã®è¡æ«ã§ä¿åãããŸãã
ã³ã³ãã³ã ã¡ãŒã«.txt
ÐваМПв ÐМЎÑей;[email protected]
ã³ã³ãã³ã buhg.txt
ÐваМПва Ðлла;ЌалÑÑ
ÐлкОМа Ðлла;кÑаМПвÑОÑа
ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
ÐбакаМПв ÐОÑ
аОл;ЌалÑÑ
ããŒãžããã«ã¯ããã¡ã€ã«ã Unix ã³ãã³ãã§ãœãŒãããŸããã sort ãã㊠Unix ããã°ã©ã ã®å ¥åã«éä¿¡ãããŸã joinãäºæãããšã©ãŒã§å€±æããŸãã:
$> sort buhg.txt > buhg.srt
$> sort mail.txt > mail.srt
$> join buhg.srt mail.srt > result
join: buhg.srt:4: is not sorted: ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
䞊ã¹æ¿ãçµæãç®ã§èŠããšãäžè¬ã«äžŠã¹æ¿ãã¯æ£ããããšãããããŸããããç·æ§ãšå¥³æ§ã®å§ãäžèŽããå Žåã女æ§ã®å§ãç·æ§ã®å§ãããåã«æ¥ãŸãã
$> sort buhg.txt
ÐбакаМПв ÐОÑ
аОл;ЌалÑÑ
ÐлкОМа Ðлла;кÑаМПвÑОÑа
ÐваМПва Ðлла;ЌалÑÑ
ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
Unicode ã®äžŠã¹æ¿ãã®äžå ·åãã䞊ã¹æ¿ãã¢ã«ãŽãªãºã ã«ããããã§ãããºã ã®çŸãã®ããã«èŠããŸãã ãã¡ãããåè ã®æ¹ããã£ãšããããã§ãã
ãšããããèã«çœ®ããŠãããŸããã join ãããŠããã«çŠç¹ãåœãŠãŸã sortã ç§åŠçãªçªãã䜿ã£ãŠåé¡ã解決ããŠã¿ãŸãããã ãŸãããã±ãŒã«ãããå€æŽããŸããã en_US Ма ru_RUã 䞊ã¹æ¿ããã«ã¯ãç°å¢å€æ°ãèšå®ããã ãã§ååã§ã LC_COLLATEãã ããäºçŽ°ãªããšã§æéãç¡é§ã«ããããšã¯ãããŸããã
$> LANG=ru_RU.UTF-8 sort buhg.txt
ÐбакаМПв ÐОÑ
аОл;ЌалÑÑ
ÐлкОМа Ðлла;кÑаМПвÑОÑа
ÐваМПва Ðлла;ЌалÑÑ
ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
äœãå€ãã£ãŠããŸããã
ãã¡ã€ã«ãã·ã³ã°ã«ãã€ããšã³ã³ãŒãã£ã³ã°ã«åã³ãŒãåããŠã¿ãŸãããã
$> iconv -f UTF-8 -t KOI8-R buhg.txt
| LANG=ru_RU.KOI8-R sort
| iconv -f KOI8-R -t UTF8
ãŸãããŠãäœãå€ãã£ãŠããŸããã
ã§ããããšã¯äœããªãã®ã§ãã€ã³ã¿ãŒãããã§è§£æ±ºçãæ¢ãå¿
èŠããããŸãã ãã·ã¢ã®å§ã«ã€ããŠçŽæ¥çãªããšã¯äœããããŸãããããã®ä»ã®äžŠã¹æ¿ãã®å¥åŠãªç¹ã«ã€ããŠã¯çåããããŸãã ããšãã°ã次ã®ãããªåé¡ããããŸãã
çãã¯ã©ãã§ãæšæºã§ããããã°ã©ã ãã±ãŒã«ã䜿çšããŸãã "C"ã® ãããŠããªãã¯å¹žãã«ãªãã§ãããã ãã£ãŠã¿ããïŒ
$> LANG=C sort buhg.txt
ÐлкОМа Ðлла;кÑаМПвÑОÑа
ÐбакаМПв ÐОÑ
аОл;ЌалÑÑ
ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
ÐваМПва Ðлла;аЎвПкаÑ
äœããå€ãããŸããã ã€ã¯ãã家ã¯æ£ããé åºã§äžŠãã ãããšã«ããã¯ã©ããã§æ»ã£ãŠããŸã£ãã å ã®åé¡ã«æ»ããŸãããã
$> LANG=C sort buhg.txt > buhg.srt
$> LANG=C sort mail.txt > mail.srt
$> LANG=C join buhg.srt mail.srt > result
ã€ã³ã¿ãŒãããã®çŽæéãããšã©ãŒãªãåäœããŸããã ãããŠããã¯ããšã«ããã第äžç·ã«ããã«ããããããã§ãã
åé¡ã¯è§£æ±ºãããããã§ããã念ã®ãããå¥ã®ãã·ã¢èªãšã³ã³ãŒããè©ŠããŠã¿ãŸããã - Windows CP1251:
$> iconv -f UTF-8 -t CP1251 buhg.txt
| LANG=ru_RU.CP1251 sort
| iconv -f CP1251 -t UTF8
å¥åŠãªããšã«ããœãŒãçµæã¯ãã±ãŒã«ãšäžèŽããŸãã "C"ã®ãããã£ãŠãäŸå šäœããšã©ãŒãªãã§å®è¡ãããŸãã ããçš®ã®ç¥ç§äž»çŸ©ã
ç§ã¯ããã°ã©ãã³ã°ã«ãããç¥ç§äž»çŸ©ã¯å¥œãã§ã¯ãããŸããããªããªããç¥ç§äž»çŸ©ã¯ééããé ããŠããŸãããšãå€ãããã§ãã ãããã©ã®ããã«æ©èœããããçå£ã«æ€èšããå¿ èŠããããŸãã sort ãããŠããã¯äœã«åœ±é¿ãäžããŸããïŒ LC_COLLATE .
æåŸã«ã次ã®ãããªè³ªåã«çããŠã¿ãŸãã
- ãªã女æ§ã®å§ãééã£ãŠãœãŒããããã®ã§ãã?
- çç± LANG=ru_RU.CP1251 åçã§ããããšãå€æãã LANG=C
- ãªãã sort О join ãœãŒããããæååã®é åºã«é¢ããããŸããŸãªèãæ¹
- ç§ã®ãã¹ãŠã®äŸã«ãšã©ãŒãããã®ã¯ãªãã§ãã?
- æåŸã«æååã奜ã¿ã«åãããŠäžŠã¹æ¿ããæ¹æ³
Unicode ã§ã®äžŠã¹æ¿ã
æåã«çŽ¹ä»ããã®ã¯ããã¯ãã«ã«ã¬ããŒã No.10 ã§ãã
ç §å â æååã®ãæ¯èŒãã¯ããããã䞊ã¹æ¿ãã¢ã«ãŽãªãºã ã®åºç€ã§ãã ã¢ã«ãŽãªãºã èªäœã¯ç°ãªãå ŽåããããŸã (ãããã«ãããããŒãžãããé«éã) ããããããæååã®ãã¢ã®æ¯èŒã䜿çšããŠãåºçŸããé åºã決å®ããŸãã
èªç¶èšèªã§ã®æååã®äžŠã¹æ¿ãã¯ãããªãè€éãªåé¡ã§ãã æãåçŽãªã·ã³ã°ã«ãã€ããšã³ã³ãŒãã£ã³ã°ã§ãã£ãŠããã¢ã«ãã¡ãããã®æåã®é åºã¯ãè±èªã®ã©ãã³ã¢ã«ãã¡ããããšã¯äœããã®ç¹ã§ç°ãªã£ãŠããŠãããããã®æåããšã³ã³ãŒããããæ°å€ã®é åºãšäžèŽããªããªããŸãã ã€ãŸãããã€ãèªã®ã¢ã«ãã¡ãããã§ã¯ã à éã«ç«ã€ РО Pããšã³ã³ãŒãã£ã³ã°ã§ã¯ CP850 圌女ã¯éã«å ¥ã ÿ О Ã.
Unicode ã§è¡ãããŠããããã«ãç¹å®ã®ãšã³ã³ãŒãã£ã³ã°ããæœè±¡åããŠãããé åºã§é 眮ããããçæ³çãªãæåãæ€èšããããšãã§ããŸãã ãšã³ã³ãŒãã£ã³ã° UTF8, UTF16 ãŸãã¯åè§ KOI8-R (Unicode ã®éå®ããããµãã»ãããå¿ èŠãªå Žå) ã¯æåã®ç°ãªãæ°å€è¡šçŸãæäŸããŸãããããŒã¹ ããŒãã«ã®åãèŠçŽ ãåç §ããŸãã
ã·ã³ãã« ããŒãã«ãæåããäœæãããšããŠããããã«æ®éçãªã·ã³ãã«é åºãå²ãåœãŠãããšã¯ã§ããªãããšãããããŸããã åãæåã䜿çšããååœã®ã¢ã«ãã¡ãããã§ã¯ããããã®æåã®é åºãç°ãªãå ŽåããããŸãã ããšãã°ããã©ã³ã¹èªã§ã¯ à ååãšã¿ãªãããæååãšããŠãœãŒããããŸã AEã ãã«ãŠã§ãŒèªã§ à ã®åŸã«ããå¥ã®æçŽã«ãªããŸãã Zã ã¡ãªã¿ã«ã次ã®ãããªååã«å ããŠã à ããã€ãã®èšå·ã§æžãããæåããããŸãã ã€ãŸãããã§ã³èªã®ã¢ã«ãã¡ãããã«ã¯æ¬¡ã®æåããããŸã Chã®éã«ç«ã£ãŠããŸã H О I.
ã¢ã«ãã¡ãããã®éãã«å ããŠã䞊ã¹æ¿ãã«åœ±é¿ãäžããåœã®äŒçµ±ãä»ã«ããããŸãã ç¹ã«ã倧æåãšå°æåã§æ§æãããåèªã¯èŸæžã«ã©ã®ãããªé åºã§çŸããã¹ãããšããçåãçããŸãã 䞊ã¹æ¿ãã¯å¥èªç¹ã®äœ¿çšã«ãã£ãŠåœ±é¿ãåããå ŽåããããŸãã ã¹ãã€ã³èªã§ã¯ãçåæã®å é ã«éçå笊ã䜿çšãããŸã (é³æ¥œã奜ãã§ããïŒïŒã ãã®å Žåãçåæãã¢ã«ãã¡ãããã®å€åŽã®å¥ã®ã¯ã©ã¹ã¿ãŒã«ã°ã«ãŒãåãã¹ãã§ã¯ãªãããšã¯æããã§ãããä»ã®å¥èªç¹ãå«ãè¡ãã©ã®ããã«åé¡ããã?
ãšãŒãããèšèªãšã¯å€§ããç°ãªãèšèªã§ã®æååã®äžŠã¹æ¿ãã«ã€ããŠã¯è©³ãã説æããŸããã å³ããå·ŠãŸãã¯äžããäžãžã®æžã蟌ã¿æ¹åãæã€èšèªã§ã¯ãè¡å
ã®æåã¯èªã¿åãé ã«æ ŒçŽãããå¯èœæ§ãé«ããéã¢ã«ãã¡ãããè¡šèšäœç³»ã§ãã£ãŠããè¡ãæåããšã«é åºä»ããç¬èªã®æ¹æ³ãããããšã«æ³šæããŠãã ããã ã ããšãã°ã象圢æåã¯ã¹ã¿ã€ã«ã«ãã£ãŠäžŠã¹æ¿ããããšãã§ããŸã (
äžèšã®æ©èœã«åºã¥ããŠãUnicode ããŒãã«ã«åºã¥ããŠæååãæ¯èŒããããã®åºæ¬èŠä»¶ãå®åŒåãããŸããã
- æååã®æ¯èŒã¯ãã³ãŒã ããŒãã«å ã®æåã®äœçœ®ã«ã¯äŸåããŸããã
- åäžã®æåã圢æããäžé£ã®æåã¯æšæºåœ¢åŒã«å€æãããŸã (A + äžã®åã¯æ¬¡ãšåãã§ã à );
- æååãæ¯èŒããå Žåãæåã¯æååã®ã³ã³ããã¹ãã§èæ ®ãããå¿ èŠã«å¿ããŠãé£æ¥ããæåãšçµåãã㊠XNUMX ã€ã®æ¯èŒåäœã«ãªããŸã (Ch ãã§ã³èª) ãŸãã¯ããã€ãã«åãããŠããŸã (à ãã©ã³ã¹èªã§ïŒ;
- ãã¹ãŠã®åœã®ç¹åŸŽ (ã¢ã«ãã¡ãããã倧æå/å°æåãå¥èªç¹ãæžã蟌ã¿ã¿ã€ãã®é åº) ã¯ãé åº (çµµæå) ãæåã§å²ãåœãŠããŸã§èšå®ããå¿ èŠããããŸãã
- æ¯èŒã¯äžŠã¹æ¿ãã ãã§ãªããä»ã®å€ãã®å Žæã§ãéèŠã§ããããšãã°ãè¡ç¯å²ãæå®ããå ŽåïŒè¡ç¯å²ã {A... z} ã«çœ®ãæããå ŽåïŒã§ãã bash);
- æ¯èŒã¯ããªãè¿ éã«è¡ãå¿ èŠããããŸãã
ããã«ãã¬ããŒãã®äœæè ã¯ãã¢ã«ãŽãªãºã éçºè ãäŸåãã¹ãã§ã¯ãªãæ¯èŒç¹æ§ãå®åŒåããŸããã
- æ¯èŒã¢ã«ãŽãªãºã ã§ã¯ãèšèªããšã«åå¥ã®æåã»ãããå¿ èŠãšãã¹ãã§ã¯ãããŸããïŒãã·ã¢èªãšãŠã¯ã©ã€ãèªã¯ã»ãšãã©ã®ããªã«æåãå ±æããŠããŸãïŒã
- æ¯èŒã¯ Unicode ããŒãã«å ã®æåã®é åºã«äŸåãã¹ãã§ã¯ãããŸããã
- ç°ãªãæåçæèã§ã¯åãæååãç°ãªãéã¿ãæã€å¯èœæ§ããããããæååã®éã¿ã¯æååã®å±æ§ã§ãã£ãŠã¯ãªããŸããã
- è¡ã®éã¿ã¯ãããŒãžãŸãã¯åå²æã«å€æŽãããå¯èœæ§ããããŸãïŒ x < y ããã¯åŸããªã xz < yz);
- åãéã¿ãæã€ç°ãªãæååã¯ããœãŒã ã¢ã«ãŽãªãºã ã®èŠ³ç¹ããã¯çãããšã¿ãªãããŸãã ãã®ãããªæååã«è¿œå ã®é åºãå°å ¥ããããšã¯å¯èœã§ãããããã©ãŒãã³ã¹ãäœäžããå¯èœæ§ããããŸãã
- ãœãŒããç¹°ãè¿ããšãåãéã¿ãæã€è¡ã亀æãããå ŽåããããŸãã å ç¢æ§ã¯ç¹å®ã®äžŠã¹æ¿ãã¢ã«ãŽãªãºã ã®ããããã£ã§ãããæååæ¯èŒã¢ã«ãŽãªãºã ã®ããããã£ã§ã¯ãããŸãã (åã®æ®µèœãåç §)ã
- æåçäŒçµ±ãæŽç·Ž/å€åããã«ã€ããŠã䞊ã¹æ¿ãã«ãŒã«ã¯æéã®çµéãšãšãã«å€æŽãããå¯èœæ§ããããŸãã
ãŸããæ¯èŒã¢ã«ãŽãªãºã ã¯ãåŠçãããæååã®ã»ãã³ãã£ã¯ã¹ã«ã€ããŠäœãç¥ããªãããšãèŠå®ãããŠããŸãã ãããã£ãŠãæ°åã®ã¿ã§æ§æãããæååã¯æ°å€ãšããŠæ¯èŒãããã¹ãã§ã¯ãªããè±èªåã®ãªã¹ãã§ã¯èšäº (ããŒãã«ãº).
æå®ãããèŠä»¶ããã¹ãŠæºããããã«ããã«ãã¬ãã« (å®éã«ã¯ XNUMX ã¬ãã«) ã®ããŒãã« ãœãŒã ã¢ã«ãŽãªãºã ãææ¡ãããŠããŸãã
以åã¯ãæååå ã®æåã¯æ£èŠåœ¢åŒã«å€æãããæ¯èŒåäœã«ã°ã«ãŒãåãããŠããŸããã åæ¯èŒåäœã«ã¯ãããã€ãã®æ¯èŒã¬ãã«ã«å¯Ÿå¿ããããã€ãã®éã¿ãå²ãåœãŠãããŸãã æ¯èŒåäœã®éã¿ã¯ãå€ããå°ãªããæ¯èŒã§ããé åºä»ãã»ãã (ãã®å Žåã¯æŽæ°) ã®èŠçŽ ã§ãã ç¹å¥ãªæå³ ç¡èŠããã (0x0) ã¯ã察å¿ããæ¯èŒã¬ãã«ã§ãã®ãŠããããæ¯èŒã«é¢äžããªãããšãæå³ããŸãã æååã®æ¯èŒã¯ã察å¿ããã¬ãã«ã®éã¿ã䜿çšããŠæ°åç¹°ãè¿ãããšãã§ããŸãã åã¬ãã«ã§ãXNUMX è¡ã®æ¯èŒãŠãããã®éã¿ãé çªã«æ¯èŒãããŸãã
åœã®äŒçµ±ããšã«ã¢ã«ãŽãªãºã ã®å®è£ ãç°ãªããšãä¿æ°ã®å€ãç°ãªãå ŽåããããŸãããUnicode æšæºã«ã¯éã¿ã®åºæ¬çãªè¡šãå«ãŸããŠããŸãã ãããã©ã«ãã® Unicode ç §åèŠçŽ ããŒãã«ã (ãã¥ã»ããïŒã å€æ°ãèšå®ããããšã«æ³šæããŠãã ãã LC_COLLATE å®éã«ã¯ãæååæ¯èŒé¢æ°ã«ãããéã¿ããŒãã«ã®éžæã瀺ããŸãã
éã¿ä»ãä¿æ° ãã¥ã»ãã 次ã®ããã«æŽçãããŠããŸãã
- æåã®ã¬ãã«ã§ã¯ããã¹ãŠã®æåãåã倧æåãšå°æåã«å€æãããçºé³èšå·ã¯ç Žæ£ãããå¥èªç¹ (ãã¹ãŠã§ã¯ãããŸãã) ãç¡èŠãããŸãã
- XNUMX çªç®ã®ã¬ãã«ã§ã¯ãçºé³èšå·ã®ã¿ãèæ ®ãããŸãã
- XNUMX çªç®ã®ã¬ãã«ã§ã¯ã倧æåãšå°æåã®ã¿ãèæ ®ãããŸãã
- XNUMX çªç®ã®ã¬ãã«ã§ã¯ãå¥èªç¹ã®ã¿ãèæ ®ãããŸãã
æ¯èŒã¯è€æ°ã®ãã¹ã§è¡ãããŸãããŸãã第 XNUMX ã¬ãã«ã®ä¿æ°ãæ¯èŒãããŸãã éã¿ãäžèŽããå Žåã¯ã第 XNUMX ã¬ãã«ã®éã¿ãšã®æ¯èŒãç¹°ãè¿ãå®è¡ãããŸãã ãããããããã XNUMX çªç®ãš XNUMX çªç®ã§ãã
è¡ã«ç°ãªãéã¿ãæã€äžèŽããæ¯èŒåäœãå«ãŸããå Žåãæ¯èŒã¯çµäºããŸãã XNUMX ã€ã®ã¬ãã«ãã¹ãŠã§åãéã¿ãæã€è¡ã¯ãäºãã«çãããšã¿ãªãããŸãã
ãã®ã¢ã«ãŽãªãºã (è¿œå ã®æè¡ç詳现ãå€æ°å«ãŸããŠãã) ã«ãããã¬ããŒã No. 10 ãšããååãä»ããããŸããã ãUnicodeç §åã¢ã«ãŽãªãºã ã (ACU).
ããã§ããã®äŸã®äžŠã¹æ¿ãåäœãããå°ãæ確ã«ãªããŸãã Unicodeæšæºãšæ¯èŒããŠã¿ããšè¯ãã§ãããã
å®è£
ããã¹ãããã«ã¯ ACU ç¹å¥ãªãã®ããããŸã
è¡ã次ã«åŸã£ãŠæ£ãããœãŒããããŠããããšãæåã§ãã§ãã¯ãã ãã¥ã»ãã ããã¯éåžžã«é¢åã§ããã幞ããªããšã«ãUnicode ãæäœããããã®ã©ã€ãã©ãªã®å®è£
äŸããããŸãã
ãã®å³æžé€šã®ãŠã§ããµã€ãã§ã¯ã IBMããã¢ããŒãžããããŸãã
ÐбакаМПв ÐОÑ
аОл;ЌалÑÑ
ÐлкОМа Ðлла;кÑаМПвÑОÑа
ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
ÐваМПва Ðлла;аЎвПкаÑ
ã¡ãªã¿ã«ãµã€ã㯠ICU å¥èªç¹ãåŠçãããšãã®æ¯èŒã¢ã«ãŽãªãºã ãæ確ã«ãªã£ãŠããŸãã äŸã§
Unicode ã¯åœ¹ã«ç«ã¡ãŸããããå¥åŠãªåäœã®çç±ãæ¢ããŠãã ãã sort в Linux ã©ããå¥ã®å Žæã«è¡ããªããã°ãªããŸããã
glibc ã§ã®ãœãŒã
ãŠãŒãã£ãªãã£ã®ãœãŒã¹ã³ãŒãã®ã¯ã€ãã¯ãã¥ãŒ sort ã® GNUã³ã¢ãŠãŒãã£ãªã㣠ãŠãŒãã£ãªãã£èªäœã§ã¯ãããŒã«ãªãŒãŒã·ã§ã³ã¯å€æ°ã®çŸåšã®å€ãåºåããããšã«åž°çããããšã瀺ããŸããã LC_COLLATE ãããã°ã¢ãŒãã§å®è¡ããå Žå:
$ sort --debug buhg.txt > buhg.srt
sort: using âen_US.UTF8â sorting rules
æååæ¯èŒã¯æšæºé¢æ°ã䜿çšããŠå®è¡ãããŸãã ã¹ãã³ãŒã«ãã€ãŸããèå³æ·±ããã®ã¯ãã¹ãŠã©ã€ãã©ãªã«ãããŸã glibcã®.
Ðа ãŠã£ã ãããžã§ã¯ã glibcã® æååæ¯èŒå°çš
æãèå³æ·±ãæ
å ±ã¯ã ãŠã£ã ãžã®ãªã³ã¯ããããŸãã
ã¢ã«ãŽãªãºã ãšè£å©ããŒãã«ã«é¢ãããã¹ãŠã®æ å ±ãåŸãããã®ã§ãå ã®åé¡ã«æ»ãããã·ã¢èªãã±ãŒã«ã§æååãæ£ãã䞊ã¹æ¿ããæ¹æ³ãç解ã§ããŸãã
ISO 14651 / 14652
察象ãšãªãããŒãã«ã®ãœãŒã¹ã³ãŒã CTT ã»ãšãã©ã®ãã£ã¹ããªãã¥ãŒã·ã§ã³ã§ Linux ãã£ã¬ã¯ããªã«ãããŸã /usr/share/i18n/locales/ã ããŒãã«èªäœã¯ãã¡ã€ã«å ã«ãããŸã iso14651_t1_commonã 次ã«ãããã¯ãã¡ã€ã«ãã£ã¬ã¯ãã£ãã§ã iso14651_t1_common ãã³ã㌠ãã¡ã€ã«ã«å«ãŸããŠãã iso14651_t1ã次ã«ã以äžãå«ãåœå ãã¡ã€ã«ã«å«ãŸããŸãã en_US О ru_RUã ã»ãšãã©ã®ãã£ã¹ããªãã¥ãŒã·ã§ã³ã§ã¯ Linux ãã¹ãŠã®ãœãŒã¹ ãã¡ã€ã«ã¯åºæ¬ã€ã³ã¹ããŒã«ã«å«ãŸããŠããŸãããããããååšããªãå Žåã¯ããã£ã¹ããªãã¥ãŒã·ã§ã³ããè¿œå ã®ããã±ãŒãžãã€ã³ã¹ããŒã«ããå¿ èŠããããŸãã
ãã¡ã€ã«æ§é iso14651_t1 ååãæ§æããã«ãŒã«ãæ確ã§ã¯ãªããéåžžã«åé·ã«èŠãããããããŸããããããèŠãŠã¿ããšããã¹ãŠãéåžžã«åçŽã§ãã æ§é ã¯èŠæ Œã«èšèŒãããŠããŸã ISO 14652ããã®ã³ããŒã¯ Web ãµã€ãããããŠã³ããŒãã§ããŸãã
ãã¡ã€ã«æ§é ã¯æ¬¡ã®ããã«ãªããŸãã
ããã©ã«ãã§ã¯ããã®æåã¯ãšã¹ã±ãŒãæåãšããŠäœ¿çšããã# æåã®åŸã®è¡ã®çµããã¯ã³ã¡ã³ãã«ãªããŸãã ã©ã¡ãã®ã·ã³ãã«ãåå®çŸ©ã§ããŸããããã¯ãããŒãã«ã®æ°ããããŒãžã§ã³ã§è¡ãããŸãã
escape_char /
comment_char %
ãã¡ã€ã«ã«ã¯æ¬¡ã®åœ¢åŒã®ããŒã¯ã³ãå«ãŸããŸãã ãŸã㯠ïŒã©ã x - XNUMX é²æ°)ã ããã¯ããšã³ã³ãŒãã£ã³ã°ã«ããã Unicode ã³ãŒã ãã€ã³ãã® XNUMX é²è¡šçŸã§ãã UCS-4 (UTF-32ïŒã å±±ãã£ãå ã®ä»ã®ãã¹ãŠã®èŠçŽ (以äžãå«ã) , ãªã©ïŒã¯ãã³ã³ããã¹ãå€ã§ã¯ã»ãšãã©æå³ãæããªãåçŽãªæååå®æ°ãšã¿ãªãããŸãã
ã²ã LC_COLLATE ããã¯ã次ã«æååã®æ¯èŒãèšè¿°ããããŒã¿ãå§ãŸãããšã瀺ããŠããŸãã
ãŸããæ¯èŒè¡šã®éã¿ã®ååãšã·ã³ãã«ã®çµã¿åããã®ååãæå®ããŸãã äžè¬ã«ãXNUMX çš®é¡ã®åå㯠XNUMX ã€ã®ç°ãªããšã³ãã£ãã£ã«å±ããŸãããå®éã®ãã¡ã€ã«ã§ã¯ããããæ··åšããŠããŸãã éã¿ã®ååã¯ããŒã¯ãŒãã§æå®ããŸãã ç §åèšå· (æ¯èŒæå) æ¯èŒãããšãã«ãåãéã¿ãæ〠Unicode æåãåçã®æåãšã¿ãªãããããã§ãã
çŸåšã®ãã¡ã€ã« ãªããžã§ã³ã®ã»ã¯ã·ã§ã³ã®åèšé·ã¯çŽ 900 è¡ã§ãã ååãšããã€ãã®çš®é¡ã®æ§æã®ä»»ææ§ã瀺ãããã«ãããã€ãã®å ŽæããäŸãæç²ããŸããã
LC_COLLATE
collating-symbol <RES-1>
collating-symbol <BLK>
collating-symbol <MIN>
collating-symbol <WIDE>
...
collating-symbol <ARABIC>
collating-symbol <ETHPC>
collating-symbol <OSMANYA>
...
collating-symbol <S1D000>..<S1D35F>
collating-symbol <SFFFF> % Guaranteed largest symbol value. Keep at end of this list
...
collating-element <U0413_0301> from "<U0413><U0301>"
collating-element <U0413_0341> from "<U0413><U0341>"
- ç §åèšå· æååããã°ã«èšé²ããŸã ãªã¹ãã〠é³éã®ååã®è¡šã®äžã§
- ç §åèšå·.. ãã¬ãã£ãã¯ã¹ã§æ§æãããäžé£ã®ååãç»é²ããŸã S ããã³ XNUMX é²æ°ã®æ¥å°ŸèŸ 1D000 ЎП 1D35F.
- FFFF в ç §åèšå· XNUMX é²æ°ã§ã¯å€§ããªç¬Šå·ãªãæŽæ°ã®ããã«èŠããŸããã ããã¯ããèŠãããããããªããã ã®ååã§ã
- åå ãšã³ã³ãŒãã«ãããã³ãŒããã€ã³ããæå³ããŸã UCS-4
- ç §åèŠçŽ ãã " ã Unicode ãããã®ãã¢ã®æ°ããååãç»é²ããŸãã
éã¿ã®ååãå®çŸ©ããããå®éã®éã¿ãæå®ããŸãã æ¯èŒã§ã¯å€§å°é¢ä¿ã®ã¿ãéèŠã§ãããããéã¿ã¯ãªã¹ãåã®åçŽãªã·ãŒã±ã³ã¹ã«ãã£ãŠæ±ºå®ãããŸãã ã軜ãããŠã§ã€ããæåã«ãªã¹ãããã次ã«ãéãããŠã§ã€ãããªã¹ããããŸãã å Unicode æåã«ã¯ XNUMX ã€ã®ç°ãªãéã¿ãå²ãåœãŠãããŠããããšãæãåºããŠãã ããã ããã§ã¯ããããã¯åäžã®é åºä»ããããã·ãŒã±ã³ã¹ã«çµåãããŸãã çè«çã«ã¯ãä»»æã®èšå·åã XNUMX ã€ã®ã¬ãã«ã®ãããã§ã䜿çšã§ããŸãããã³ã¡ã³ãã«ãããšãéçºè ã¯é ã®äžã§ååãã¬ãã«ã«åããŠããŸãã
% Symbolic weight assignments
% Third-level weight assignments
<RES-1>
<BLK>
<MIN>
<WIDE>
...
% Second-level weight assignments
<BASE>
<LOWLINE> % COMBINING LOW LINE
<PSILI> % COMBINING COMMA ABOVE
<DASIA> % COMBINING REVERSED COMMA ABOVE
...
% First-level weight assignments
<S0009> % HORIZONTAL TABULATION
<S000A> % LINE FEED
<S000B> % VERTICAL TABULATION
...
<S0434> % CYRILLIC SMALL LETTER DE
<S0501> % CYRILLIC SMALL LETTER KOMI DE
<S0452> % CYRILLIC SMALL LETTER DJE
<S0503> % CYRILLIC SMALL LETTER KOMI DJE
<S0453> % CYRILLIC SMALL LETTER GJE
<S0499> % CYRILLIC SMALL LETTER ZE WITH DESCENDER
<S0435> % CYRILLIC SMALL LETTER IE
<S04D7> % CYRILLIC SMALL LETTER IE WITH BREVE
<S0454> % CYRILLIC SMALL LETTER UKRAINIAN IE
<S0436> % CYRILLIC SMALL LETTER ZHE
æåŸã«å®ééè¡šã§ãã
éã¿ã»ã¯ã·ã§ã³ã¯ããŒã¯ãŒãè¡ã§å²ãŸããŠããŸã 泚æéå§ Ðž 泚æçµäºã è¿œå ãªãã·ã§ã³ 泚æéå§ æ¯èŒã®åã¬ãã«ã§ã©ã®æ¹åã«è¡ãã¹ãã£ã³ããããã決å®ããŸãã ããã©ã«ãèšå®ã¯æ¬¡ã®ãšããã§ã ãã©ã¯ãŒãã ã»ã¯ã·ã§ã³ã®æ¬æã¯ãã·ã³ãã« ã³ãŒããšãã® XNUMX ã€ã®éã¿ãå«ãè¡ã§æ§æãããŸãã æåã³ãŒãã¯ãæåèªäœãã³ãŒããã€ã³ãããŸãã¯äºåã«å®çŸ©ãããã·ã³ãã«åã§è¡šãããšãã§ããŸãã éã¿ã¯ãã·ã³ãã«åãã³ãŒã ãã€ã³ãããŸãã¯ã·ã³ãã«èªäœã«äžããããšãã§ããŸãã ã³ãŒã ãã€ã³ããŸãã¯æåã䜿çšãããå Žåããã®éã¿ã¯ã³ãŒã ãã€ã³ãã®æ°å€ (Unicode ããŒãã«å ã®äœçœ®) ãšåãã«ãªããŸãã (ç§ãç解ããŠããããã«) æ瀺çã«æå®ãããŠããªãæåã¯ãUnicode ããŒãã«å ã®äœçœ®ãšäžèŽããäž»ãªéã¿ã§ããŒãã«ã«å²ãåœãŠãããŠãããšã¿ãªãããŸãã ç¹å¥ãªééå€ IGNORE ã¯ãã·ã³ãã«ãé©åãªæ¯èŒã¬ãã«ã§ç¡èŠãããããšãæå³ããŸãã
ã¹ã±ãŒã«ã®æ§é ã瀺ãããã«ãéåžžã«æçœãª XNUMX ã€ã®æçãéžæããŸããã
- å®å šã«ç¡èŠãããæå
- æåã® XNUMX ã€ã®ã¬ãã«ã®æ°åã® XNUMX ã«çžåœããã·ã³ãã«
- ããªã«æåã®å é ãçºé³èšå·ãå«ãŸããŠããªããããäž»ã«ç¬¬ XNUMX ã¬ãã«ãšç¬¬ XNUMX ã¬ãã«ã§ãœãŒããããŸãã
order_start forward;forward;forward;forward,position
<U0000> IGNORE;IGNORE;IGNORE;IGNORE % NULL (in 6429)
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in 6429)
<U0002> IGNORE;IGNORE;IGNORE;IGNORE % START OF TEXT (in 6429)
...
<U0033> <S0033>;<BASE>;<MIN>;<U0033> % DIGIT THREE
<UFF13> <S0033>;<BASE>;<WIDE>;<UFF13> % FULLWIDTH DIGIT THREE
<U2476> <S0033>;<BASE>;<COMPAT>;<U2476> % PARENTHESIZED DIGIT THREE
<U248A> <S0033>;<BASE>;<COMPAT>;<U248A> % DIGIT THREE FULL STOP
<U1D7D1> <S0033>;<BASE>;<FONT>;<U1D7D1> % MATHEMATICAL BOLD DIGIT THREE
...
<U0430> <S0430>;<BASE>;<MIN>;<U0430> % CYRILLIC SMALL LETTER A
<U0410> <S0430>;<BASE>;<CAP>;<U0410> % CYRILLIC CAPITAL LETTER A
<U04D1> <S04D1>;<BASE>;<MIN>;<U04D1> % CYRILLIC SMALL LETTER A WITH BREVE
<U0430_0306> <S04D1>;<BASE>;<MIN>;<U04D1> % CYRILLIC SMALL LETTER A WITH BREVE
...
<U0431> <S0431>;<BASE>;<MIN>;<U0431> % CYRILLIC SMALL LETTER BE
<U0411> <S0431>;<BASE>;<CAP>;<U0411> % CYRILLIC CAPITAL LETTER BE
<U0432> <S0432>;<BASE>;<MIN>;<U0432> % CYRILLIC SMALL LETTER VE
<U0412> <S0432>;<BASE>;<CAP>;<U0412> % CYRILLIC CAPITAL LETTER VE
...
order_end
ããã§ãèšäºã®å é ããäŸã®äžŠã¹æ¿ãã«æ»ãããšãã§ããŸãã åŸ ã¡äŒãã¯éã¿ããŒãã«ã®ãã®éšåã«ãããŸãã
<U0020> IGNORE;IGNORE;IGNORE;<U0020> % SPACE
<U0021> IGNORE;IGNORE;IGNORE;<U0021> % EXCLAMATION MARK
<U0022> IGNORE;IGNORE;IGNORE;<U0022> % QUOTATION MARK
...
ãã®è¡šã§ã¯ãè¡šã®å¥èªç¹ã ASCII (ã¹ããŒã¹ãå«ã) ã¯ãæååãæ¯èŒããå Žåãã»ãšãã©ã®å Žåç¡èŠãããŸãã å¯äžã®äŸå€ã¯ãäžèŽããäœçœ®ã«ããå¥èªç¹ãé€ããã¹ãŠãäžèŽããè¡ã§ãã ç§ã®äŸã®æ¯èŒã¢ã«ãŽãªãºã ã®è¡ (䞊ã¹æ¿ãåŸ) ã¯æ¬¡ã®ããã«ãªããŸãã
ÐбакаМПвÐОÑ
аОлЌалÑÑ
ÐлкОМаÐллакÑаМПвÑОÑа
ÐваМПваÐллаЌалÑÑ
ÐваМПвÐМЎÑейÑлеÑаÑÑ
ç®çè¡šã§ã¯ããã·ã¢èªã®å€§æåã¯å°æåã®åŸã«æ¥ãããšãèæ ®ãããš (第 XNUMX ã¬ãã«ã§ã¯) ããéãã§ã )ã䞊ã¹æ¿ãã¯å®å šã«æ£ããããã§ãã
å€æ°ãèšå®ããå Žå LC_COLLATE=C ãã€ãããšã®æ¯èŒãæå®ããç¹å¥ãªããŒãã«ãããŒããããŸã
static const uint32_t collseqwc[] =
{
8, 1, 8, 0x0, 0xff,
/* 1st-level table */
6 * sizeof (uint32_t),
/* 2nd-level table */
7 * sizeof (uint32_t),
/* 3rd-level table */
L'x00', L'x01', L'x02', L'x03', L'x04', L'x05', L'x06', L'x07',
L'x08', L'x09', L'x0a', L'x0b', L'x0c', L'x0d', L'x0e', L'x0f',
...
L'xf8', L'xf9', L'xfa', L'xfb', L'xfc', L'xfd', L'xfe', L'xff'
};
Unicode ã§ã¯ã³ãŒã ãã€ã³ã Ð ã A ã®åã«æ¥ããããæååã¯ããã«å¿ããŠäžŠã¹æ¿ããããŸãã
ããã¹ãããŒãã«ãšãã€ããªããŒãã«
æããã«ãæååæ¯èŒã¯éåžžã«äžè¬çãªæäœã§ãããããŒãã«è§£æ㯠CTT ããªãè²»çšã®ãããæç¶ãã ããŒãã«ãžã®ã¢ã¯ã»ã¹ãæé©åããããã«ãããŒãã«ã¯æ¬¡ã®ã³ãã³ãã§ãã€ããªåœ¢åŒã«ã³ã³ãã€ã«ãããŸãã ãã±ãŒã«å®çŸ©.
ããŒã ãã±ãŒã«å®çŸ© åœæ°çç¹æ§ã®è¡šãå«ããã¡ã€ã«ããã©ã¡ãŒã¿ãŒãšããŠåãå ¥ããŸã (ãªãã·ã§ã³) -i)ããã¹ãŠã®æåã Unicode ãããã§è¡šãããUnicode ããããšç¹å®ã®ãšã³ã³ãŒãã£ã³ã°ã®æåãšã®å¯Ÿå¿ãã¡ã€ã« (ãªãã·ã§ã³) -fïŒã äœæ¥ã®çµæãæåŸã®ãã©ã¡ãŒã¿ã§æå®ãããååãæã€ãã±ãŒã«çšã®ãã€ã㪠ãã¡ã€ã«ãäœæãããŸãã
glibc ã¯ãããã©ãã£ã·ã§ãã«ããšãã¢ãã³ããšãã XNUMX ã€ã®ãã€ã㪠ãã¡ã€ã«åœ¢åŒããµããŒãããŠããŸãã
åŸæ¥ã®åœ¢åŒã¯ããã±ãŒã«ã®ååã次ã®ãµããã£ã¬ã¯ããªã®ååã§ããããšãæå³ããŸãã /usr/lib/locale/ã ãã®ãµããã£ã¬ã¯ããªã«ã¯ãã€ã㪠ãã¡ã€ã«ãä¿åãããŸã LC_COLLATE, LC_CTYPE, LC_TIME çã ã ãã¡ã€ã« LC_IDENTIFICATION ãã±ãŒã«ã®æ£åŒå (ãã£ã¬ã¯ããªåãšã¯ç°ãªãå ŽåããããŸã) ãšã³ã¡ã³ããå«ãŸããŸãã
ææ°ã®åœ¢åŒã§ã¯ããã¹ãŠã®ãã±ãŒã«ãåäžã®ã¢ãŒã«ã€ãã«ä¿åããå¿ èŠããããŸãã /usr/lib/locale/locale-archiveã䜿çšããŠãã¹ãŠã®ããã»ã¹ã®ä»®æ³ã¡ã¢ãªã«ããããããŸãã glibcã®ã ææ°ã®åœ¢åŒã®ãã±ãŒã«åã¯ãããçšåºŠã®æ£èŠåã®å¯Ÿè±¡ãšãªããŸãããšã³ã³ãŒãåã«ã¯ãå°æåã«å€æãããæ°åãšæåã®ã¿ãæ®ããŸãã ãã㧠ru_RU.KOI8-RãšããŠä¿åãããŸã ru_RU.koi8r.
å ¥åãã¡ã€ã«ã¯ããã£ã¬ã¯ããªå ã ãã§ãªãçŸåšã®ãã£ã¬ã¯ããªã§ãæ€çŽ¢ãããŸãã /usr/share/i18n/locales/ О /usr/share/i18n/charmaps/ ãã¡ã€ã«çš CTT ãšãšã³ã³ãŒããã¡ã€ã«ãããããã
ããšãã°ã次ã®ã³ãã³ãã¯
localedef -i ru_RU -f MAC-CYRILLIC ru_RU.MAC-CYRILLIC
ãã¡ã€ã«ãã³ã³ãã€ã«ããŸã /usr/share/i18n/locales/ru_RU ãšã³ã³ãŒããã¡ã€ã«ã䜿çšãã /usr/share/i18n/charmaps/MAC-ããªã«æå.gz ãããŠçµæãã«ä¿åããŸã /usr/lib/locale/locale-archive ååã®äžã§ ru_RU.ãã¯ããªã«æå
å€æ°ãèšå®ãããš LANG = en_US.UTF-8 ãã® glibcã® ã¯ã次ã®äžé£ã®ãã¡ã€ã«ãšãã£ã¬ã¯ããªã§ãã±ãŒã« ãã€ããªãæ€çŽ¢ããŸãã
/usr/lib/locale/locale-archive
/usr/lib/locale/en_US.UTF-8/
/usr/lib/locale/en_US/
/usr/lib/locale/enUTF-8/
/usr/lib/locale/en/
ãã±ãŒã«ãåŸæ¥ã®åœ¢åŒãšææ°ã®åœ¢åŒã®äž¡æ¹ã§ååšããå Žåã¯ãææ°ã®åœ¢åŒãåªå ãããŸãã
次ã®ã³ãã³ãã䜿çšããŠãã³ã³ãã€ã«ããããã±ãŒã«ã®ãªã¹ãã衚瀺ã§ããŸãã ãã±ãŒã«-a.
æ¯èŒè¡šã®æºå
ããã§ãç¥èã身ã«ã€ããŠãç¬èªã®çæ³çãªæååæ¯èŒããŒãã«ãäœæã§ããããã«ãªããŸãã ãã®è¡šã¯ãÐ ã®æåãå«ããã·ã¢èªã®æåãæ£ããæ¯èŒããåæã«è¡šã«åŸã£ãŠå¥èªç¹ãèæ ®ããå¿ èŠããããŸãã ASCII.
ç¬èªã®ãœãŒã ããŒãã«ãæºåããããã»ã¹ã¯ XNUMX ã€ã®æ®µéã§æ§æãããŸããéã¿ããŒãã«ãç·šéããã³ãã³ãã䜿çšããŠããããã€ããªåœ¢åŒã«ã³ã³ãã€ã«ããŸãã ãã±ãŒã«å®çŸ©.
ç·šéã³ã¹ããæå°éã«æããŠæ¯èŒè¡šã調æŽããã«ã¯ã次ã®åœ¢åŒã䜿çšããŸãã ISO 14652 æ¢åã®ããŒãã«ã®éã¿ã調æŽããã»ã¯ã·ã§ã³ãæäŸãããŠããŸãã ã»ã¯ã·ã§ã³ã¯ããŒã¯ãŒãã§å§ãŸããŸã å泚æåŸ çœ®æãå®è¡ãããäœçœ®ã瀺ãã ãã®ã»ã¯ã·ã§ã³ã¯æ¬¡ã®è¡ã§çµãããŸã å泚æçµäºã ããŒãã«ã®è€æ°ã®ã»ã¯ã·ã§ã³ãä¿®æ£ããå¿ èŠãããå Žåã¯ããã®ãããªã»ã¯ã·ã§ã³ããšã«ã»ã¯ã·ã§ã³ãäœæãããŸãã
æ°ããããŒãžã§ã³ã®ãã¡ã€ã«ãã³ããŒããŸãã iso14651_t1_common О ru_RU ãªããžããªãã glibcã® ããŒã ãã£ã¬ã¯ã㪠~/.local/share/i18n/locales/ ã«ç§»åããã»ã¯ã·ã§ã³ãå°ãç·šéããŸãã LC_COLLATE в ru_RUã ãã¡ã€ã«ã®æ°ããããŒãžã§ã³ã¯ç§ã®ããŒãžã§ã³ãšå®å šã«äºææ§ããããŸã glibcã®ã å€ãããŒãžã§ã³ã®ãã¡ã€ã«ã䜿çšããå Žåã¯ãã·ã³ãã«åãšãããŒãã«å ã§çœ®æãéå§ãããå Žæãå€æŽããå¿ èŠããããŸãã
LC_COLLATE
% Copy the template from ISO/IEC 14651
copy "iso14651_t1"
reorder-after <U000D>
<U0020> <S0020>;<BASE>;<MIN>;<U0020> % SPACE
<U0021> <S0021>;<BASE>;<MIN>;<U0021> % EXCLAMATION MARK
<U0022> <S0022>;<BASE>;<MIN>;<U0022> % QUOTATION MARK
...
<U007D> <S007D>;<BASE>;<MIN>;<U007D> % RIGHT CURLY BRACKET
<U007E> <S007E>;<BASE>;<MIN>;<U007E> % TILDE
reorder-end
END LC_COLLATE
å®éã«ã¯ããã£ãŒã«ããå€æŽããå¿ èŠããããŸãã LC_IDENTIFICATION ãã±ãŒã«ãæãããã« ru_MYããã ããç§ã®äŸã§ã¯ããã±ãŒã«ã®æ€çŽ¢ããã¢ãŒã«ã€ããé€å€ãããããããã¯å¿ èŠãããŸããã§ããã ãã±ãŒã«ã¢ãŒã«ã€ã.
ãã® ãã±ãŒã«å®çŸ© å€æ°ãä»ããŠãã©ã«ããŒå ã®ãã¡ã€ã«ãæäœããŸãã I18NPATH å ¥åãã¡ã€ã«ãæ€çŽ¢ãããã£ã¬ã¯ããªãè¿œå ã§ãããã€ã㪠ãã¡ã€ã«ãä¿åãããã£ã¬ã¯ããªãã¹ã©ãã·ã¥ãå«ããã¹ãšããŠæå®ã§ããŸãã
$> I18NPATH=~/.local/share/i18n localedef -i ru_RU -f UTF-8 ~/.local/lib/locale/ru_MY.UTF-8
POSIX ã§ããã瀺åããŠããŸã èšèª ã¹ã©ãã·ã¥ã§å§ãŸããã±ãŒã« ãã¡ã€ã«ãå«ããã£ã¬ã¯ããªãžã®çµ¶å¯Ÿãã¹ãæžã蟌ãããšã¯ã§ããŸããã glibc㮠в Linux ãã¹ãŠã®ãã¹ã¯ããŒã¹ ãã£ã¬ã¯ããªããã«ãŠã³ããããå€æ°ãéããŠãªãŒããŒã©ã€ãã§ããŸãã ãããã¹ã ã€ã³ã¹ããŒã«åŸ LOCPATH=~/.local/lib/locale/ ããŒã«ãªãŒãŒã·ã§ã³ã«é¢é£ãããã¹ãŠã®ãã¡ã€ã«ã¯ãç§ã®ãã©ã«ããŒå ã§ã®ã¿æ€çŽ¢ãããŸãã å€æ°ãèšå®ããããã±ãŒã«ã®ã¢ãŒã«ã€ã ãããã¹ ç¡èŠãããŸããã
決å®çãªãã¹ãã¯æ¬¡ã®ãšããã§ãã
$> LANG=ru_MY.UTF-8 LOCPATH=~/.local/lib/locale/ sort buhg.txt
ÐбакаМПв ÐОÑ
аОл;ЌалÑÑ
ÐлкОМа Ðлла;кÑаМПвÑОÑа
ÐваМПв ÐМЎÑей;ÑлеÑаÑÑ
ÐваМПва Ðлла;аЎвПкаÑ
äžæ³ïŒ ãã£ãïŒ
ããã€ãã®ãšã©ãŒ
åé ã§æèµ·ãããæååã®äžŠã¹æ¿ãã«é¢ãã質åã«ã¯ãã§ã«çããŸããããç®ã«èŠãããšã©ãŒãšç®ã«èŠããªããšã©ãŒã«é¢ããããã€ãã®è³ªåããŸã æ®ã£ãŠããŸãã
å ã®åé¡ã«æ»ããŸãããã
ãããŠããã°ã©ã 㯠sort ãããŠããã°ã©ã join ãšåãæååæ¯èŒé¢æ°ã䜿çšãã glibcã®ã ã©ãããŠãããªã£ãã®ã join ã³ãã³ãã«ãã£ãŠãœãŒããããè¡ã§ãœãŒããšã©ãŒãçºçããŸãã sort ãã±ãŒã«ã§ en_US.UTF-8? çãã¯ç°¡åã§ãã sort æååå šäœãæ¯èŒãã join ããŒã®ã¿ãæ¯èŒããŸããããã©ã«ãã§ã¯ãããŒã¯æååã®å é ããæåã®ç©ºçœæåãŸã§ã§ãã ç§ã®äŸã§ã¯ãè¡ã®æåã®åèªã®äžŠã¹æ¿ããè¡å šäœã®äžŠã¹æ¿ããšäžèŽããªãããããšã©ãŒ ã¡ãã»ãŒãžã衚瀺ãããŸããã
ãã±ãŒã« "C"ã® ãœãŒããããæååã§ã¯ãæåã®ã¹ããŒã¹ãŸã§ã®æåã®éšåæååããœãŒããããããšãä¿èšŒãããŸãããããã¯ãšã©ãŒããã¹ã¯ããã ãã§ãã ãšã©ãŒ ã¡ãã»ãŒãžã衚瀺ãããã«ã誀ã£ããã¡ã€ã«çµåçµæãããããããŒã¿ (å§ãåãã§åãç°ãªã人ã ) ãéžæããå¯èœæ§ããããŸãã ããããããããªã join ãã«ããŒã ã§ãã¡ã€ã«è¡ãçµåããå Žåãæ£ããæ¹æ³ã¯ããã£ãŒã«ãåºåãæåãæ瀺çã«æå®ããè¡å šäœã§ã¯ãªãã㌠ãã£ãŒã«ãã§äžŠã¹æ¿ããããšã§ãã ãã®å ŽåãããŒãžã¯æ£ããè¡ãããã©ã®ãã±ãŒã«ã§ããšã©ãŒã¯çºçããŸããã
$> sort -t ; -k 1 buhg.txt > buhg.srt
$> sort -t ; -k 1 mail.txt > mail.srt
$> join -t ; buhg.srt mail.srt > result
ãšã³ã³ãŒãã§æ£åžžã«å®è¡ãããäŸ CP1251 å¥ã®ãšã©ãŒãå«ãŸããŠããŸãã å®éã®ãšãããç§ãç¥ã£ãŠãããã¹ãŠã®ãã£ã¹ããªãã¥ãŒã·ã§ã³ã§ã¯ Linux ããã±ãŒãžã«ã³ã³ãã€ã«æžã¿ãã±ãŒã«ããããŸãã ru_RU.CP1251ã ã³ã³ãã€ã«ããããã±ãŒã«ãèŠã€ãããªãå Žåã¯ã sort ç§ãã¡ã芳å¯ããããã«ãé»ã£ãŠãã€ãããšã®æ¯èŒã䜿çšãããŸãã
ãšããã§ãã³ã³ãã€ã«ããããã±ãŒã«ã«ã¢ã¯ã»ã¹ã§ããªãããšã«é¢é£ããå°ããªäžå ·åããã XNUMX ã€ãããŸãã ããŒã LOCPATH=/tmp ãã±ãŒã« -a ãã¹ãŠã®ãã±ãŒã«ã®ãªã¹ãã衚瀺ãããŸã ãã±ãŒã«ã¢ãŒã«ã€ããã ããå€æ°ãèšå®ãããŠããŸã ãããã¹ ãã¹ãŠã®ããã°ã©ã ïŒã»ãšãã©ã®ããã°ã©ã ãå«ãïŒ ããŒã«ã«) ãããã®ãã±ãŒã«ã¯äœ¿çšã§ããªããªããŸãã
$> LOCPATH=/tmp locale -a | grep en_US
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
$> LC_COLLATE=en_US.UTF-8 sort --debug
sort: using âen_US.UTF-8â sorting rules
$> LOCPATH=/tmp LC_COLLATE=en_US.UTF-8 sort --debug
sort: using simple byte comparison
ãŸãšã
ããªããæååããã€ãã®éåã§ãããšèããããšã«æ £ããŠããããã°ã©ãã§ããã°ãããªãã®éžæã§ãã LC_COLLATE=C.
ããªããèšèªåŠè ãŸãã¯èŸæžã®ã³ã³ãã€ã©ã§ããå Žåã¯ãèªåã®ãã±ãŒã«ã§ã³ã³ãã€ã«ããããšããå§ãããŸãã
åçŽãªãŠãŒã¶ãŒã®å Žåã¯ã次ã®ã³ãã³ããå®è¡ããããšããäºå®ã«æ £ããã ãã§ååã§ãã ls -a ãããã§å§ãŸããã¡ã€ã«ãšæåã§å§ãŸããã¡ã€ã«ãæ··åšããŠåºåãããŸãã æ·±å€ã®åžä»€å®ã¯ãå éšé¢æ°ã䜿çšããŠååã䞊ã¹æ¿ãããããã§å§ãŸããã¡ã€ã«ããªã¹ãã®å é ã«çœ®ããŸãã
ãªãã¡ã¬ã³ã¹
åºæïŒ habr.com