SlideShare a Scribd company logo
🍣=🍺 Powered by Rabbit 2.1.6
🍣=🍺
とみたまさひろ
MyNA会
2015/04/22
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
とみた まさひろ
http://tmtms.hatenablog.com
http://twitter.com/tmtms
https://github.com/tmtm
MySQL 3.21 に日本語charsetを追加
MySQLのRubyバインディング作成
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
もっともRTされたツイート
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
もっともブクマされたブログ
🍣=🍺 Powered by Rabbit 2.1.6
自己紹介
長野県北部在住
日本MySQLユーザ会代表
名ばかり代表
「たまには何かしゃべれや (#゚Д゚)ゴルァ!!」
と言われたのでしゃべります
🍣=🍺 Powered by Rabbit 2.1.6
🍣=🍺 問題
🍣=🍺 Powered by Rabbit 2.1.6
MySQL的には🍣と🍺は同じ
🍣=🍺 Powered by Rabbit 2.1.6
ちなみに🍛と💩も(ry
🍣=🍺 Powered by Rabbit 2.1.6
PostgreSQLなら問題ないらしい
http://soudai1025.blogspot.jp/2015/03/postgresqlunicode-6.html
🍣=🍺 Powered by Rabbit 2.1.6
何故?
🍣=🍺 Powered by Rabbit 2.1.6
kamipo++
utf8_unicode_ci に対する日本の開発者の見解
http://blog.kamipo.net/entry/2015/03/08/145045
MySQL と Unicode Collation Algorithm (UCA)
http://blog.kamipo.net/entry/2015/03/17/103457
MySQL と寿司ビール問題
http://blog.kamipo.net/entry/2015/03/23/093052
🍣=🍺 Powered by Rabbit 2.1.6
MySQLの文字は Charset と
Collation がある
🍣=🍺 Powered by Rabbit 2.1.6
Charset
🍣=🍺 Powered by Rabbit 2.1.6
いわゆる文字コード
🍣=🍺 Powered by Rabbit 2.1.6
文字のバイト表現
🍣=🍺 Powered by Rabbit 2.1.6
Charset: utf8mb4
「A」 = 41
「あ」= E3 81 82
「🍣」= F0 9F 8D A3
「🍺」= F0 9F 8D BA
🍣=🍺 Powered by Rabbit 2.1.6
Collation
🍣=🍺 Powered by Rabbit 2.1.6
文字の照合規則・照合順序
🍣=🍺 Powered by Rabbit 2.1.6
Collation 一覧
mysql> show collation;
+--------------------------+----------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| big5_bin | big5 | 84 | | Yes | 1 |
| dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 |
| dec8_bin | dec8 | 69 | | Yes | 1 |
| cp850_general_ci | cp850 | 4 | Yes | Yes | 1 |
| cp850_bin | cp850 | 80 | | Yes | 1 |
| hp8_english_ci | hp8 | 6 | Yes | Yes | 1 |
| hp8_bin | hp8 | 72 | | Yes | 1 |
| koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 |
| koi8r_bin | koi8r | 74 | | Yes | 1 |
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
🍣=🍺 Powered by Rabbit 2.1.6
Charset 毎に Collation がある
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4 の Collation
全部で16個
mysql> show collation like 'utf8mb4%';
+------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4 の Collation
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+------------------------+---------+-----+---------+----------+---------+
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4 の Collation
utf8mb4_general_ci
utf8mb4_bin
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
utf8mb4_言語_ci
(utf8m4_japanese_ci は無い)
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_general_ci
utf8mb4 charset のデフォルト collation
ASCII大文字小文字を区別しない(A=a)
絵文字を区別しない(🍣=🍺)
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_bin
varchar(99) binary
全文字を区別する(A≠a, 🍣≠🍺)
PostgreSQLと同じならこれでいい
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_unicode_ci
Unicode Collation Algorithm 4.0.0
http://www.unicode.org/reports/tr10/
http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html
ASCII大文字小文字を区別しない(A=a)
絵文字を区別しない(🍣=🍺)
ひらがな、カタカナ、濁点有無、全角、半
角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_unicode_520_ci
Unicode Collation Algorithm 5.2.0
ASCII大文字小文字を区別しない(A=a)
絵文字を区別する(🍣≠🍺)
ひらがな、カタカナ、濁点有無、全角、半
角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
🍣=🍺 Powered by Rabbit 2.1.6
ハハ=パパ=ババ 問題
誰得
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_*_ci
Collation A : a 🍣 : 🍺 は : ぱ
general = = ≠
bin ≠ ≠ ≠
unicode = = =
unicode_
520
= ≠ =
🍣=🍺 Powered by Rabbit 2.1.6
ぼくらが本当に欲しかったもの
Collation A : a 🍣 : 🍺 は : ぱ
general = = ≠
bin ≠ ≠ ≠
unicode = = =
unicode_
520
= ≠ =
japanese = ≠ ≠
🍣=🍺 Powered by Rabbit 2.1.6
だ、だれか
utf8mb4_japanese_ci を作って
(;´Д`)
🍣=🍺 Powered by Rabbit 2.1.6
おまけ
🍣=🍺 Powered by Rabbit 2.1.6
同じ文字とみなされるかどうかは
weight_string() で確かめられる
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_general_ci
mysql> select hex(weight_string('🍣' collate utf8mb4_general_ci));
+----------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_general_ci)) |
+----------------------------------------------------+
| FFFD |
+----------------------------------------------------+
mysql> select hex(weight_string('🍺' collate utf8mb4_general_ci));
+----------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_general_ci)) |
+----------------------------------------------------+
| FFFD |
+----------------------------------------------------+
🍣=🍺 Powered by Rabbit 2.1.6
utf8mb4_unicode_520_ci
mysql> select hex(weight_string('🍣' collate utf8mb4_unicode_520_ci));
+--------------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_unicode_520_ci)) |
+--------------------------------------------------------+
| FBC3F363 |
+--------------------------------------------------------+
mysql> select hex(weight_string('🍺' collate utf8mb4_unicode_520_ci));
+--------------------------------------------------------+
| hex(weight_string('?' collate utf8mb4_unicode_520_ci)) |
+--------------------------------------------------------+
| FBC3F37A |
+--------------------------------------------------------+
🍣=🍺 Powered by Rabbit 2.1.6
おまけ2
🍣=🍺 Powered by Rabbit 2.1.6
パ と パ
utf8_unicode_ci では「パ」=「ハ」=「ハ」
「パ」は一文字、「パ」は二文字
'パ' LIKE 'パ' => 偽
'パ' = 'パ' => 真
🍣=🍺 Powered by Rabbit 2.1.6
= と LIKE は違うらしい
Per the SQL standard, LIKE
performs matching on a per-
character basis, thus it can
produce results different
from the = comparison
operator
http://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html#operator_like
🍣=🍺 Powered by Rabbit 2.1.6
おわり

More Related Content

PDF
Mysql56 replication
PDF
Altitude SF 2017: QUIC - A low-latency secure transport for HTTP
PDF
Quic illustrated
PDF
Networks, Linux, Containers, Pods
DOCX
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
PDF
MySQLの全文検索に関するあれやこれや
PDF
MySQL 5.7 InnoDB 日本語全文検索
PDF
お前の罪を数えろ
Mysql56 replication
Altitude SF 2017: QUIC - A low-latency secure transport for HTTP
Quic illustrated
Networks, Linux, Containers, Pods
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
MySQLの全文検索に関するあれやこれや
MySQL 5.7 InnoDB 日本語全文検索
お前の罪を数えろ

More from Masahiro Tomita (20)

PDF
Ruby 2.5
PDF
本当はこわいMySQLプロトコル
PDF
ネットワークこわい
PDF
MySQLの文字コード事情 2017春版
PDF
MySQLの文字コード事情 2017版
PDF
PDF
MySQLの文字コード事情
PDF
進捗と品質
PDF
MySQLを拡張する
PDF
「理論から学ぶデータベース実践入門」読書会スペシャル
PDF
MyNAができるまで
PDF
文字化け
PDF
PDF
メールの暗号化
PDF
文字化け
PDF
進捗と品質
PDF
アジャイルジャパン長野サテライト
PDF
本当はこわいエンコーディングの話
PDF
Sequelのすすめ
Ruby 2.5
本当はこわいMySQLプロトコル
ネットワークこわい
MySQLの文字コード事情 2017春版
MySQLの文字コード事情 2017版
MySQLの文字コード事情
進捗と品質
MySQLを拡張する
「理論から学ぶデータベース実践入門」読書会スペシャル
MyNAができるまで
文字化け
メールの暗号化
文字化け
進捗と品質
アジャイルジャパン長野サテライト
本当はこわいエンコーディングの話
Sequelのすすめ
Ad

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
sap open course for s4hana steps from ECC to s4
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Ad

🍣=🍺

  • 1. 🍣=🍺 Powered by Rabbit 2.1.6 🍣=🍺 とみたまさひろ MyNA会 2015/04/22
  • 2. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 とみた まさひろ http://tmtms.hatenablog.com http://twitter.com/tmtms https://github.com/tmtm MySQL 3.21 に日本語charsetを追加 MySQLのRubyバインディング作成
  • 3. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 もっともRTされたツイート
  • 4. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 もっともブクマされたブログ
  • 5. 🍣=🍺 Powered by Rabbit 2.1.6 自己紹介 長野県北部在住 日本MySQLユーザ会代表 名ばかり代表 「たまには何かしゃべれや (#゚Д゚)ゴルァ!!」 と言われたのでしゃべります
  • 6. 🍣=🍺 Powered by Rabbit 2.1.6 🍣=🍺 問題
  • 7. 🍣=🍺 Powered by Rabbit 2.1.6 MySQL的には🍣と🍺は同じ
  • 8. 🍣=🍺 Powered by Rabbit 2.1.6 ちなみに🍛と💩も(ry
  • 9. 🍣=🍺 Powered by Rabbit 2.1.6 PostgreSQLなら問題ないらしい http://soudai1025.blogspot.jp/2015/03/postgresqlunicode-6.html
  • 10. 🍣=🍺 Powered by Rabbit 2.1.6 何故?
  • 11. 🍣=🍺 Powered by Rabbit 2.1.6 kamipo++ utf8_unicode_ci に対する日本の開発者の見解 http://blog.kamipo.net/entry/2015/03/08/145045 MySQL と Unicode Collation Algorithm (UCA) http://blog.kamipo.net/entry/2015/03/17/103457 MySQL と寿司ビール問題 http://blog.kamipo.net/entry/2015/03/23/093052
  • 12. 🍣=🍺 Powered by Rabbit 2.1.6 MySQLの文字は Charset と Collation がある
  • 13. 🍣=🍺 Powered by Rabbit 2.1.6 Charset
  • 14. 🍣=🍺 Powered by Rabbit 2.1.6 いわゆる文字コード
  • 15. 🍣=🍺 Powered by Rabbit 2.1.6 文字のバイト表現
  • 16. 🍣=🍺 Powered by Rabbit 2.1.6 Charset: utf8mb4 「A」 = 41 「あ」= E3 81 82 「🍣」= F0 9F 8D A3 「🍺」= F0 9F 8D BA
  • 17. 🍣=🍺 Powered by Rabbit 2.1.6 Collation
  • 18. 🍣=🍺 Powered by Rabbit 2.1.6 文字の照合規則・照合順序
  • 19. 🍣=🍺 Powered by Rabbit 2.1.6 Collation 一覧 mysql> show collation; +--------------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | | dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 | | dec8_bin | dec8 | 69 | | Yes | 1 | | cp850_general_ci | cp850 | 4 | Yes | Yes | 1 | | cp850_bin | cp850 | 80 | | Yes | 1 | | hp8_english_ci | hp8 | 6 | Yes | Yes | 1 | | hp8_bin | hp8 | 72 | | Yes | 1 | | koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 | | koi8r_bin | koi8r | 74 | | Yes | 1 | | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | Yes | 1 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | Yes | 1 | | latin1_general_cs | latin1 | 49 | | Yes | 1 |
  • 20. 🍣=🍺 Powered by Rabbit 2.1.6 Charset 毎に Collation がある
  • 21. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4 の Collation 全部で16個 mysql> show collation like 'utf8mb4%'; +------------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +------------------------+---------+-----+---------+----------+---------+ | utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
  • 22. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4 の Collation | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | +------------------------+---------+-----+---------+----------+---------+
  • 23. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4 の Collation utf8mb4_general_ci utf8mb4_bin utf8mb4_unicode_ci utf8mb4_unicode_520_ci utf8mb4_言語_ci (utf8m4_japanese_ci は無い)
  • 24. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_general_ci utf8mb4 charset のデフォルト collation ASCII大文字小文字を区別しない(A=a) 絵文字を区別しない(🍣=🍺)
  • 25. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_bin varchar(99) binary 全文字を区別する(A≠a, 🍣≠🍺) PostgreSQLと同じならこれでいい
  • 26. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_unicode_ci Unicode Collation Algorithm 4.0.0 http://www.unicode.org/reports/tr10/ http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html ASCII大文字小文字を区別しない(A=a) 絵文字を区別しない(🍣=🍺) ひらがな、カタカナ、濁点有無、全角、半 角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
  • 27. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_unicode_520_ci Unicode Collation Algorithm 5.2.0 ASCII大文字小文字を区別しない(A=a) 絵文字を区別する(🍣≠🍺) ひらがな、カタカナ、濁点有無、全角、半 角を区別しない(は=ば=ぱ=ハ=バ=パ=ハ)
  • 28. 🍣=🍺 Powered by Rabbit 2.1.6 ハハ=パパ=ババ 問題 誰得
  • 29. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_*_ci Collation A : a 🍣 : 🍺 は : ぱ general = = ≠ bin ≠ ≠ ≠ unicode = = = unicode_ 520 = ≠ =
  • 30. 🍣=🍺 Powered by Rabbit 2.1.6 ぼくらが本当に欲しかったもの Collation A : a 🍣 : 🍺 は : ぱ general = = ≠ bin ≠ ≠ ≠ unicode = = = unicode_ 520 = ≠ = japanese = ≠ ≠
  • 31. 🍣=🍺 Powered by Rabbit 2.1.6 だ、だれか utf8mb4_japanese_ci を作って (;´Д`)
  • 32. 🍣=🍺 Powered by Rabbit 2.1.6 おまけ
  • 33. 🍣=🍺 Powered by Rabbit 2.1.6 同じ文字とみなされるかどうかは weight_string() で確かめられる
  • 34. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_general_ci mysql> select hex(weight_string('🍣' collate utf8mb4_general_ci)); +----------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_general_ci)) | +----------------------------------------------------+ | FFFD | +----------------------------------------------------+ mysql> select hex(weight_string('🍺' collate utf8mb4_general_ci)); +----------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_general_ci)) | +----------------------------------------------------+ | FFFD | +----------------------------------------------------+
  • 35. 🍣=🍺 Powered by Rabbit 2.1.6 utf8mb4_unicode_520_ci mysql> select hex(weight_string('🍣' collate utf8mb4_unicode_520_ci)); +--------------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_unicode_520_ci)) | +--------------------------------------------------------+ | FBC3F363 | +--------------------------------------------------------+ mysql> select hex(weight_string('🍺' collate utf8mb4_unicode_520_ci)); +--------------------------------------------------------+ | hex(weight_string('?' collate utf8mb4_unicode_520_ci)) | +--------------------------------------------------------+ | FBC3F37A | +--------------------------------------------------------+
  • 36. 🍣=🍺 Powered by Rabbit 2.1.6 おまけ2
  • 37. 🍣=🍺 Powered by Rabbit 2.1.6 パ と パ utf8_unicode_ci では「パ」=「ハ」=「ハ」 「パ」は一文字、「パ」は二文字 'パ' LIKE 'パ' => 偽 'パ' = 'パ' => 真
  • 38. 🍣=🍺 Powered by Rabbit 2.1.6 = と LIKE は違うらしい Per the SQL standard, LIKE performs matching on a per- character basis, thus it can produce results different from the = comparison operator http://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html#operator_like
  • 39. 🍣=🍺 Powered by Rabbit 2.1.6 おわり