SlideShare a Scribd company logo
HTTP: The Definitive Guide
(ch.16 국제화)
아키텍트를 꿈꾸는 사람들
Cecil
Contents
• 국제 콘텐츠를 다루기 위한 HTTP 지원
• 국제화된 URI
• 기타 고려사항
다국어 컨텐츠를 다루는 HTTP
Accept-Charset: iso-8859-1, utf-8
Accept-Language: fr, en;q=0.8
es along a list of supported content encodings in the Accept-Encoding
ader. If the HTTP request does not contain an Accept-Encoding header, a
assume that the client will accept any encoding (equivalent to passing
coding: *).
4 shows an example of Accept-Encoding in an HTTP transaction.
Requestmessage
GET /logo.gif HTTP/1.1
Accept-encoding: gzip
[...]
HTTP/1.1 200 OK
Content-type: image/gif
Content-encoding: gzip
[...]
Responsemessage
gzip
...011010011...
gunzip
.011010011...
Client
HTTP/1.1 200 OK
GET /bigfile.html HTTP/1.1
[...]
Requestmessage
Responsemessage
Content-Type: text/html; charset=utf-8
Content-Language: fr 인코딩 방식
언어 태그
언어 인코딩
only with transporting the character data and the associated language and charset
labels. The presentation of the character shapes is handled by the user’s graphics dis-
play software (browser, operating system, fonts), as shown in Figure 16-2c.
The Wrong Charset Gives the Wrong Characters
Figure 16-2. HTTP “charset” combines a character encoding scheme and a coded character set
65 LATIN CAPITAL LETTER A
66 LATIN CAPITAL LETTER B
224 ARABICTATWEEL
225 ARABIC LETTER FEH
226 ARABIC LETTER QAF
227 ARABIC LETTER KAF
...11100001
Databits
encodingscheme
(usingiso-8859-6’sencoding)
225
Charactercode
(iniso-8859-6set)
Codedcharacterset
Uniquecharacter
"ARABIC LETTER FEH"
Fontsandpresentationlogic
Glyph
(a) Decode using encoding scheme (b) Find character using coded
character set
(c) Find display shape using fonts and
formatting software
MIME charset tag describes the combination of character
encoding scheme and coded character set mapping
(iso-8859-6coded
characterset) 글자를 비트로 인코딩하고,
비트를 글자로 디코딩하는 방법
Charset: 특정 코딩된 문자 집합과
특정 문자 인코딩 구조의 결합
주요 문자 집합
• US-ASCII
• 정보 교환을 위한 미국 표준 코드 가장 많이 사용됨.
• 코드값 0~127만 사용
• ISO-8859
• 국제적인 글쓰기를 위해 필요한 글자들을 하이 비트를 위해 추가한 US-ASCII의 확장
• UCS(Universal Character Set)
• 전 세계의 모든 글자를 하나의 코딩된 문자 집합으로 표현
• 기본 집합은 50,000 글자로 구성되어 있음
• 수백만개의 글자를 위한 확장 코드 공간을 가짐
문자 인코딩 구조
• 고정폭: 8비트
• 각 코딩된 문자를 고정된 길이의 비트로 표현
• 빠르게 처리 될 수 있지만, 공간을 낭비할 우려가 있음.
• 가변폭(비모달): UTF-8
• 다른 문자 코드 번호에 다른 길이의 비트를 사용
• 자주 사용되는 글자일 수록 비트의 길이가 짧음
• 가변폭(모달): iso-2022-jp
• 다른 모드로의 전환을 위해 특별한 escape 패턴을 사용
비모달:UTF-8 vs 모달(iso-2022-jp)
8-bit
The 8-bit fixed-width identity encoding simply encodes each character code with its
corresponding 8-bit value. It supports only character sets with a code range of 256
characters. The iso-8859 family of character sets uses the 8-bit identity encoding.
UTF-8
UTF-8 is a popular character encoding scheme designed for UCS (UTF stands for
“UCS Transformation Format”). UTF-8 uses a nonmodal, variable-length encoding
for the character code values, where the leading bits of the first byte tell the length of
the encoded character in bytes, and any subsequent byte contains six bits of code
value (see Table 16-2).
If the first encoded byte has a high bit of 0, the length is just 1 byte, and the remain-
ing 7 bits contain the character code. This has the nice result of ASCII compatibility
(but not iso-8859 compatibility, because iso-8859 uses the high bit).
For example, character code 90 (ASCII “Z”) would be encoded as 1 byte (01011010),
while code 5073 (13-bit binary value 1001111010001) would be encoded into 3 bytes:
11100001 10001111 10010001
iso-2022-jp
iso-2022-jp is a widely used encoding for Japanese Internet documents. iso-2022-jp is
a variable-length, modal encoding, with all values less than 128 to prevent problems
with non–8-bit-clean software.
The encoding context always is set to one of four predefined character sets.* Special
“escape sequences” shift from one set to another. iso-2022-jp initially uses the US-
ASCII character set, but it can switch to the JIS X 0201 (JIS-Roman) character set or
the much larger JIS X 0208-1978 and JIS X 0208-1983 character sets using 3-byte
escape sequences.
Table 16-2. UTF-8 variable-width, nonmodal encoding
Character code bits Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6
0–7 0ccccccc - - - - -
8–11 110ccccc 10cccccc - - - -
12–16 1110cccc 10cccccc 10cccccc - - -
17–21 11110ccc 10cccccc 10cccccc 10cccccc - -
22–26 111110cc 10cccccc 10cccccc 10cccccc 10cccccc -
27–31 1111110c 10cccccc 10cccccc 10cccccc 10cccccc 10cccccc
UTF-8: 첫 비트의 선두 비트들은 인코딩된 문자의 길이를 표현
iso-2022-jp: 확장 문자를 기반으로 네가지 미리 정의된 문자집합중 하나로 설정
The escape sequences are shown in Table 16-3. In practice, Japanese text begins with
“ESC $ @” or “ESC $ B” and ends with “ESC ( B” or “ESC ( J”.
When in the US-ASCII or JIS-Roman modes, a single byte is used per character.
When using the larger JIS X 0208 character set, two bytes are used per character
code. The encoding restricts the bytes sent to be between 33 and 126.*
Table 16-3. iso-2022-jp character set switching escape sequences
Escape sequence Resulting coded character set Bytes per code
ESC ( B US-ASCII 1
ESC ( J JIS X 0201-1976 (JIS Roman) 1
ESC $ @ JIS X 0208-1978 2
ESC $ B JIS X 0208-1983 2
언어 태그
• Regional languages (as in “sgn-US-MA” for Martha’s Vineyard sign language)
• Standardized nonvariant languages (e.g., “i-navajo”)
• Nonstandard languages (e.g., “x-snowboarder-slang”*)
Subtags
Language tags have one or more parts, separated by hyphens, called subtags:
• The first subtag called the primary subtag. The values are standardized.
• The second subtag is optional and follows its own naming standard.
• Any trailing subtags are unregistered.
The primary subtag contains only letters (A–Z). Subsequent subtags can contain let-
ters or numbers, up to eight characters in length. An example is shown in Figure 16-9.
Capitalization
All tags are case-insensitive—the tags “en” and “eN” are equivalent. However, low-
ercasing conventionally is used to represent general languages, while uppercasing is
used to signify particular countries. For example, “fr” means all languages classified
as French, while “FR” signifies the country France.†
IANA Language Tag Registrations
The values of the first and second language subtags are defined by various standards
Figure 16-9. Language tags are separated into subtags
sgn-US-MA
Firstsubtag
(signlanguage)
Secondsubtag
(America)
Thirdsubtag
(Massachusetts
regionalvariant)
Martha’sVineyard sign language
언어에 이름을 붙이기 위한 짧고 표준화된 문자열
•첫번째 서브태그: ISO-639 표준 언어 집합에 속한 언어 토큰
•두번째 서브태그: ISO3166 국가 코드와 지역 표준 집합에서 선택된 코드
•세번째 서브태그: 확장용, 특별한 규칙 없음
•ex) en-US, en-GS …
국제화된 URI
URI는 식별자의 가독성과 공유 가능성 보장을 위해
US-ASCII 만으로 구성
URI Escape: 예약된 문자나 다른 지원하지 않는 글자들을
안전하게 URI에 삽입할 수 있는 방법(% 문자 사용)
filenames that contain international characters. This is incorrect and may cause
problems with some applications.
Figure 16-10. URI characters are transported as escaped code bytes but processed unescaped
Big Sale at Joe’s
Big Sale at Joe’s
http://www.joes-hardware.com/big%20sale.txt
...
o=111
m=109
/=47
b=98
i=105
g=103
%=37
2=50
0=48
s=115
...
Externalform
(email,web,billboard,radio)
Whatyouenterandsend
(incurrentcharacterset)
...
111
109
47
98
105
103
32
115
...
Whatyouprocess
(inUS-ASCIIcharacterset)
Conceptual characters URI code bytes Unescaped ASCII code byte
기타 고려사항
• HTTP 헤더
• 반드시 US-ASCII 문자 집합의 글자로만 구성되어야 함
• 날짜
• 올바른 GMT 날짜형식을 사용을 권고
• 도메인 이름
• 국제화 도메인 이름(Internationalizing Domain Name)
• 대부분의 웹 브라우저가 퓨니코드를 지원
• 퓨니코드: 유니코드 문자열을 호스트 명에서 사용 가능한 문자로 변환하는 방법
• ex) 한글.com -> xn—bj0bj06e.com
Q&A
References
• David Gourley, Brian Totty, Marjorie Sayer, Sailu Reddy,
Anshu Aggarwal. HTTP 완벽 가이드(이응준, 정상일 옮김). 서울시
마포구: 인사이트, 2014

More Related Content

PDF
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
PPTX
Bits, Bits operator, bitwise function
PDF
Bsdconv
PDF
Journey of Bsdconv
PDF
X86 assembly nasm syntax
PPTX
Basittttt
PDF
함수적 사고 2장
PDF
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
Bits, Bits operator, bitwise function
Bsdconv
Journey of Bsdconv
X86 assembly nasm syntax
Basittttt
함수적 사고 2장

Viewers also liked (20)

PPTX
실무로 배우는 시스템 성능 최적화 Ch7
PPTX
실무로 배우는 시스템 성능 최적화 Ch6
PPTX
실무로 배우는 시스템 성능 최적화 Ch8
PDF
HTTP 완벽가이드 21장
PDF
7가지 동시성 모델 4장
PDF
Elastic search 클러스터관리
PDF
DDD Repository
PDF
7가지 동시성 모델 - 데이터 병렬성
PDF
하둡완벽가이드 Ch9
PDF
Bounded Context
PPTX
DDD Start Ch#3
PDF
Java 초보자를 위한 hadoop 설정
PPTX
실무로배우는시스템성능최적화 Ch1
PDF
HTTP 완벽가이드 6장.
PDF
HTTP 완벽가이드 1장.
PPTX
Logstash, ElasticSearch, Kibana
PDF
Pair RDD - Spark
PDF
Learning spark ch1-2
PDF
Cluster - spark
PDF
웹데이터분석학 Ch10. 숨겨진 웹데이터 분석 함정을 위한 최상의 해결책
실무로 배우는 시스템 성능 최적화 Ch7
실무로 배우는 시스템 성능 최적화 Ch6
실무로 배우는 시스템 성능 최적화 Ch8
HTTP 완벽가이드 21장
7가지 동시성 모델 4장
Elastic search 클러스터관리
DDD Repository
7가지 동시성 모델 - 데이터 병렬성
하둡완벽가이드 Ch9
Bounded Context
DDD Start Ch#3
Java 초보자를 위한 hadoop 설정
실무로배우는시스템성능최적화 Ch1
HTTP 완벽가이드 6장.
HTTP 완벽가이드 1장.
Logstash, ElasticSearch, Kibana
Pair RDD - Spark
Learning spark ch1-2
Cluster - spark
웹데이터분석학 Ch10. 숨겨진 웹데이터 분석 함정을 위한 최상의 해결책
Ad

Similar to HTTP 완벽가이드 16장 (20)

PPTX
Unicode 101
PDF
UTF-8: The Secret of Character Encoding
PPTX
Jun 29 new privacy technologies for unicode and international data standards ...
PPT
Unicode Fundamentals
PPSX
Character encoding and unicode format
PDF
Unicode, PHP, and Character Set Collisions
PPTX
Unicode
PPTX
Xml For Dummies Chapter 6 Adding Character(S) To Xml
PDF
Data encryption and tokenization for international unicode
PDF
Unicode - Hacking The International Character System
PPTX
Unicode and character sets
DOC
Comprehasive Exam - IT
PPT
Character Encoding issue with PHP
PPT
Software Internationalization Crash Course
PPTX
8086 architecture By Er. Swapnil Kaware
PPTX
Coal (1)
PPTX
When 7-bit ASCII ain't enough - about NLS, Collation, Charsets, Unicode and such
PDF
Abap slide class4 unicode-plusfiles
PPT
Lecture_ASCII and Unicode.ppt
PDF
Unicode basics in python
Unicode 101
UTF-8: The Secret of Character Encoding
Jun 29 new privacy technologies for unicode and international data standards ...
Unicode Fundamentals
Character encoding and unicode format
Unicode, PHP, and Character Set Collisions
Unicode
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Data encryption and tokenization for international unicode
Unicode - Hacking The International Character System
Unicode and character sets
Comprehasive Exam - IT
Character Encoding issue with PHP
Software Internationalization Crash Course
8086 architecture By Er. Swapnil Kaware
Coal (1)
When 7-bit ASCII ain't enough - about NLS, Collation, Charsets, Unicode and such
Abap slide class4 unicode-plusfiles
Lecture_ASCII and Unicode.ppt
Unicode basics in python
Ad

More from HyeonSeok Choi (11)

PDF
밑바닥부터시작하는딥러닝 Ch05
PDF
밑바닥부터시작하는딥러닝 Ch2
PDF
프로그래머를위한선형대수학1.2
PDF
알고리즘 중심의 머신러닝 가이드 Ch04
PDF
딥러닝 제대로시작하기 Ch04
PDF
밑바닥부터시작하는딥러닝 Ch05
PDF
Elastic search 검색
PDF
PDF
데이터 과학 입문 13장
PDF
데이터 과학 입문 5장
PDF
대용량아키텍처와성능튜닝 8장성능엔지니어링정의와범위
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch2
프로그래머를위한선형대수학1.2
알고리즘 중심의 머신러닝 가이드 Ch04
딥러닝 제대로시작하기 Ch04
밑바닥부터시작하는딥러닝 Ch05
Elastic search 검색
데이터 과학 입문 13장
데이터 과학 입문 5장
대용량아키텍처와성능튜닝 8장성능엔지니어링정의와범위

Recently uploaded (20)

PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
TLE Review Electricity (Electricity).pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
project resource management chapter-09.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
August Patch Tuesday
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mushroom cultivation and it's methods.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Hybrid model detection and classification of lung cancer
PPTX
1. Introduction to Computer Programming.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Chapter 5: Probability Theory and Statistics
TLE Review Electricity (Electricity).pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Group 1 Presentation -Planning and Decision Making .pptx
DP Operators-handbook-extract for the Mautical Institute
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Weekly Chronicles - August'25-Week II
project resource management chapter-09.pdf
A comparative analysis of optical character recognition models for extracting...
MIND Revenue Release Quarter 2 2025 Press Release
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
August Patch Tuesday
1 - Historical Antecedents, Social Consideration.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Mushroom cultivation and it's methods.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Hybrid model detection and classification of lung cancer
1. Introduction to Computer Programming.pptx

HTTP 완벽가이드 16장

  • 1. HTTP: The Definitive Guide (ch.16 국제화) 아키텍트를 꿈꾸는 사람들 Cecil
  • 2. Contents • 국제 콘텐츠를 다루기 위한 HTTP 지원 • 국제화된 URI • 기타 고려사항
  • 3. 다국어 컨텐츠를 다루는 HTTP Accept-Charset: iso-8859-1, utf-8 Accept-Language: fr, en;q=0.8 es along a list of supported content encodings in the Accept-Encoding ader. If the HTTP request does not contain an Accept-Encoding header, a assume that the client will accept any encoding (equivalent to passing coding: *). 4 shows an example of Accept-Encoding in an HTTP transaction. Requestmessage GET /logo.gif HTTP/1.1 Accept-encoding: gzip [...] HTTP/1.1 200 OK Content-type: image/gif Content-encoding: gzip [...] Responsemessage gzip ...011010011... gunzip .011010011... Client HTTP/1.1 200 OK GET /bigfile.html HTTP/1.1 [...] Requestmessage Responsemessage Content-Type: text/html; charset=utf-8 Content-Language: fr 인코딩 방식 언어 태그
  • 4. 언어 인코딩 only with transporting the character data and the associated language and charset labels. The presentation of the character shapes is handled by the user’s graphics dis- play software (browser, operating system, fonts), as shown in Figure 16-2c. The Wrong Charset Gives the Wrong Characters Figure 16-2. HTTP “charset” combines a character encoding scheme and a coded character set 65 LATIN CAPITAL LETTER A 66 LATIN CAPITAL LETTER B 224 ARABICTATWEEL 225 ARABIC LETTER FEH 226 ARABIC LETTER QAF 227 ARABIC LETTER KAF ...11100001 Databits encodingscheme (usingiso-8859-6’sencoding) 225 Charactercode (iniso-8859-6set) Codedcharacterset Uniquecharacter "ARABIC LETTER FEH" Fontsandpresentationlogic Glyph (a) Decode using encoding scheme (b) Find character using coded character set (c) Find display shape using fonts and formatting software MIME charset tag describes the combination of character encoding scheme and coded character set mapping (iso-8859-6coded characterset) 글자를 비트로 인코딩하고, 비트를 글자로 디코딩하는 방법 Charset: 특정 코딩된 문자 집합과 특정 문자 인코딩 구조의 결합
  • 5. 주요 문자 집합 • US-ASCII • 정보 교환을 위한 미국 표준 코드 가장 많이 사용됨. • 코드값 0~127만 사용 • ISO-8859 • 국제적인 글쓰기를 위해 필요한 글자들을 하이 비트를 위해 추가한 US-ASCII의 확장 • UCS(Universal Character Set) • 전 세계의 모든 글자를 하나의 코딩된 문자 집합으로 표현 • 기본 집합은 50,000 글자로 구성되어 있음 • 수백만개의 글자를 위한 확장 코드 공간을 가짐
  • 6. 문자 인코딩 구조 • 고정폭: 8비트 • 각 코딩된 문자를 고정된 길이의 비트로 표현 • 빠르게 처리 될 수 있지만, 공간을 낭비할 우려가 있음. • 가변폭(비모달): UTF-8 • 다른 문자 코드 번호에 다른 길이의 비트를 사용 • 자주 사용되는 글자일 수록 비트의 길이가 짧음 • 가변폭(모달): iso-2022-jp • 다른 모드로의 전환을 위해 특별한 escape 패턴을 사용
  • 7. 비모달:UTF-8 vs 모달(iso-2022-jp) 8-bit The 8-bit fixed-width identity encoding simply encodes each character code with its corresponding 8-bit value. It supports only character sets with a code range of 256 characters. The iso-8859 family of character sets uses the 8-bit identity encoding. UTF-8 UTF-8 is a popular character encoding scheme designed for UCS (UTF stands for “UCS Transformation Format”). UTF-8 uses a nonmodal, variable-length encoding for the character code values, where the leading bits of the first byte tell the length of the encoded character in bytes, and any subsequent byte contains six bits of code value (see Table 16-2). If the first encoded byte has a high bit of 0, the length is just 1 byte, and the remain- ing 7 bits contain the character code. This has the nice result of ASCII compatibility (but not iso-8859 compatibility, because iso-8859 uses the high bit). For example, character code 90 (ASCII “Z”) would be encoded as 1 byte (01011010), while code 5073 (13-bit binary value 1001111010001) would be encoded into 3 bytes: 11100001 10001111 10010001 iso-2022-jp iso-2022-jp is a widely used encoding for Japanese Internet documents. iso-2022-jp is a variable-length, modal encoding, with all values less than 128 to prevent problems with non–8-bit-clean software. The encoding context always is set to one of four predefined character sets.* Special “escape sequences” shift from one set to another. iso-2022-jp initially uses the US- ASCII character set, but it can switch to the JIS X 0201 (JIS-Roman) character set or the much larger JIS X 0208-1978 and JIS X 0208-1983 character sets using 3-byte escape sequences. Table 16-2. UTF-8 variable-width, nonmodal encoding Character code bits Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 0–7 0ccccccc - - - - - 8–11 110ccccc 10cccccc - - - - 12–16 1110cccc 10cccccc 10cccccc - - - 17–21 11110ccc 10cccccc 10cccccc 10cccccc - - 22–26 111110cc 10cccccc 10cccccc 10cccccc 10cccccc - 27–31 1111110c 10cccccc 10cccccc 10cccccc 10cccccc 10cccccc UTF-8: 첫 비트의 선두 비트들은 인코딩된 문자의 길이를 표현 iso-2022-jp: 확장 문자를 기반으로 네가지 미리 정의된 문자집합중 하나로 설정 The escape sequences are shown in Table 16-3. In practice, Japanese text begins with “ESC $ @” or “ESC $ B” and ends with “ESC ( B” or “ESC ( J”. When in the US-ASCII or JIS-Roman modes, a single byte is used per character. When using the larger JIS X 0208 character set, two bytes are used per character code. The encoding restricts the bytes sent to be between 33 and 126.* Table 16-3. iso-2022-jp character set switching escape sequences Escape sequence Resulting coded character set Bytes per code ESC ( B US-ASCII 1 ESC ( J JIS X 0201-1976 (JIS Roman) 1 ESC $ @ JIS X 0208-1978 2 ESC $ B JIS X 0208-1983 2
  • 8. 언어 태그 • Regional languages (as in “sgn-US-MA” for Martha’s Vineyard sign language) • Standardized nonvariant languages (e.g., “i-navajo”) • Nonstandard languages (e.g., “x-snowboarder-slang”*) Subtags Language tags have one or more parts, separated by hyphens, called subtags: • The first subtag called the primary subtag. The values are standardized. • The second subtag is optional and follows its own naming standard. • Any trailing subtags are unregistered. The primary subtag contains only letters (A–Z). Subsequent subtags can contain let- ters or numbers, up to eight characters in length. An example is shown in Figure 16-9. Capitalization All tags are case-insensitive—the tags “en” and “eN” are equivalent. However, low- ercasing conventionally is used to represent general languages, while uppercasing is used to signify particular countries. For example, “fr” means all languages classified as French, while “FR” signifies the country France.† IANA Language Tag Registrations The values of the first and second language subtags are defined by various standards Figure 16-9. Language tags are separated into subtags sgn-US-MA Firstsubtag (signlanguage) Secondsubtag (America) Thirdsubtag (Massachusetts regionalvariant) Martha’sVineyard sign language 언어에 이름을 붙이기 위한 짧고 표준화된 문자열 •첫번째 서브태그: ISO-639 표준 언어 집합에 속한 언어 토큰 •두번째 서브태그: ISO3166 국가 코드와 지역 표준 집합에서 선택된 코드 •세번째 서브태그: 확장용, 특별한 규칙 없음 •ex) en-US, en-GS …
  • 9. 국제화된 URI URI는 식별자의 가독성과 공유 가능성 보장을 위해 US-ASCII 만으로 구성 URI Escape: 예약된 문자나 다른 지원하지 않는 글자들을 안전하게 URI에 삽입할 수 있는 방법(% 문자 사용) filenames that contain international characters. This is incorrect and may cause problems with some applications. Figure 16-10. URI characters are transported as escaped code bytes but processed unescaped Big Sale at Joe’s Big Sale at Joe’s http://www.joes-hardware.com/big%20sale.txt ... o=111 m=109 /=47 b=98 i=105 g=103 %=37 2=50 0=48 s=115 ... Externalform (email,web,billboard,radio) Whatyouenterandsend (incurrentcharacterset) ... 111 109 47 98 105 103 32 115 ... Whatyouprocess (inUS-ASCIIcharacterset) Conceptual characters URI code bytes Unescaped ASCII code byte
  • 10. 기타 고려사항 • HTTP 헤더 • 반드시 US-ASCII 문자 집합의 글자로만 구성되어야 함 • 날짜 • 올바른 GMT 날짜형식을 사용을 권고 • 도메인 이름 • 국제화 도메인 이름(Internationalizing Domain Name) • 대부분의 웹 브라우저가 퓨니코드를 지원 • 퓨니코드: 유니코드 문자열을 호스트 명에서 사용 가능한 문자로 변환하는 방법 • ex) 한글.com -> xn—bj0bj06e.com
  • 11. Q&A
  • 12. References • David Gourley, Brian Totty, Marjorie Sayer, Sailu Reddy, Anshu Aggarwal. HTTP 완벽 가이드(이응준, 정상일 옮김). 서울시 마포구: 인사이트, 2014