Tuesday, January 31, 2012

Announcing the Unicode Standard, Version 6.1

Mountain View, January 31, 2012. The Unicode Consortium announces the release of Version 6.1 of the Unicode Standard, continuing Unicode's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added. For full details, see http://www.unicode.org/versions/Unicode6.1.0/.

This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.

Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:

26FA FE0E U+26FA+U+FE0E/ TENT text style
26FA FE0F U+26FA+U+FE0F/ TENT emoji style
26FD FE0E U+26FD+U+FE0E/ FUEL PUMP text style
26FD FE0F U+26FD+U+FE0F/ FUEL PUMP emoji style

Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:
  • UTS #10, Unicode Collation Algorithm
  • UTS #46, Unicode IDNA Compatibility Processing

Friday, January 6, 2012

Release candidate for Unicode 6.1 character data

Because Unicode is at the foundation of all modern software using text, it is important to verify that problems are not introduced with new versions. If your implementation uses Unicode data, please download and test the final release candidate of the Unicode 6.1 data (UCD) with your implementation now. Please note that the Unicode Collation Algorithm (UCA) and the Unicode IDNA Compatibility Processing are correlated with version 6.1; if you have an implementation of them, please check the data below as well.

That data can be found in:
  1. Unicode
    1. http://unicode.org/Public/6.1.0/ucd/ (data, semicolon-delimited)
    2. http://unicode.org/Public/6.1.0/ucdxml/ (data, xml)
    3. http://www.unicode.org/reports/tr44/proposed.html (documentation)
  2. UCA
    1. http://unicode.org/Public/UCA/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr10/proposed.html (documentation)
  3. IDNA compatibility
    1. http://unicode.org/Public/idna/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr46/proposed.html (documentation)
Note that at this point in the process, no substantive changes can be made unless:
  1. a problem is found in carrying out the actions directed by the Unicode Technical Committee for the release, or
  2. an editorial problem is found in the data comments or documentation.
The Unicode Consortium is planning to move up the release date of Unicode 6.1 (UCD and UAXes) to January instead of February, so any final comments should be made by January 6th. You can send your comments using the Contact Form (http://www.unicode.org/reporting.html).

The draft code charts for Unicode 6.1 have also been updated. We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.1 and to ensure that there are no regressions in glyph shapes for previously encoded characters. For links to the charts, see http://unicode.org/versions/beta.html.

Tuesday, December 13, 2011

Two New Public Review Issues: UTR #36, UTS #39

The Unicode Technical Committee has posted two new issues for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on January 30, 2012.

Please see the page for links to discussion and relevant documents.
Briefly, the new issues are:

Issue #208 Proposed Update UTR #36: Unicode Security Considerations
http://www.unicode.org/review/pri208/

This UTR is being prepared for an update to bring the IDNA 2008
references up to date. Public review and comment is invited on this draft.

Issue #209 Proposed Update Unicode Technical Standard #39 Unicode
Security Mechanisms
http://www.unicode.org/review/pri209/

This UTS is being prepared for an update to align with Unicode 6.1.
Public review and comment is invited on this draft.


To supply feedback on these issues, see
http://www.unicode.org/review/#feedback .

----
All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
see http://www.unicode.org/consortium/distlist.html#announcements

CLDR v21 Milestone 2 available for testing

Milestone releases of CLDR provide an opportunity to test a snapshot of the next version of CLDR; they are not intended for use in production. CLDR v21 is not a data submission release; instead, the CLDR group is engaged in improving tools, and making specific changes to data.

Note that the CLDR v21 release is intended to support Unicode 6.1, and depends on some new Unicode 6.1 property values for grapheme break and line break. This Milestone 2 release depends on values from the beta versions of Unicode 6.1 data files.

New additions in this Milestone 2 release include:
  • Changes to the segmentation data to match Unicode 6.1. The behaviors associated with the former "th" grapheme break tailoring and "he" line break tailoring have been moved into the root behavior, so those tailorings are no longer necessary and have been deleted.
  • Two new calendar element structures needed for support of the Chinese lunar calendar (and other calendars such as the Hindu lunar calendars); for more information see http://cldr.unicode.org/development/development-process/design-proposals/chinese-calendar-support:
    • Addition of the <monthPatterns> element structure to indicate how to modify standard month names to mark intercalary leap months, as well as (for some calendars) months adjacent to leap months and combined months. This is supported via the standard month pattern characters 'M' and 'L', so the pattern character 'l' (SMALL LETTER L) formerly provided as a way to mark leap months has been deprecated (it was never supported by underlying data).
    • Addition of the <cyclicNameSets> element structure to support cyclic names for years (and other calendar entities in some calendars).
  • A new "ar_001" locale for Modern Standard Arabic as the default content for "ar". This will permit the "ar_EG" locale (formerly the default content for "ar") to use some Egypt-specific names.
  • Addition of codes for South Sudan
  • Other specific data fixes such as for Ukrainian collation, Ewe day periods, various metazones, and some specific translation errors.

Highlights in the Milestone 1 release (Sept. 29) included:
  • Work in support of pending -t- extension in BCP47
  • Deprecation of 'commonlyUsed' element in timezone names
  • Removal of "whole-locale" aliases (data for constructing is in supplementaldata.xml)
  • First cut at incorporating European Ordering Rules (EOR)

The data is available from SVN under "tag/release-21-d02" as described in
The full list of changes in this milestone is
The current draft LDML specification is

Thursday, October 6, 2011

UAX #15 and Chapter 3 of Unicode Standard Updated Again for Unicode 6.1 Beta

The text of Chapter 3, Conformance of the Unicode Standard core specification, and the Version 6.1.0 Proposed Update for UAX #15, Unicode Normalization Forms have been updated for the ongoing beta review for Version 6.1.0. These latest changes move the example code for various normative Hangul-related algorithms into the immediate vicinity of the definition of those algorithms in the core specification. Please see PRI #191 and PRI #206 for details. http://www.unicode.org/review/pri191/ http://www.unicode.org/review/pri206/

Please also note that the public review periods for many open issues, especially the 6.1 beta review, will be closing on October 24 ahead of the November UTC meeting.

New version of UTR #45 published

An new version of UTR #45, U-Source Ideographs has been published. http://www.unicode.org/reports/tr45/tr45-5.html This version corrects the syntax of U-source identifiers to match ISO/IEC 10646 data files. It also introduces a new syntax with "UCI" labels to identify U-sources for unified ideographs which have been "orphaned" of their original sources.

Wednesday, October 5, 2011

Proposed Draft UTR #50, Unicode Properties for Vertical Text Layout now available

The Unicode Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/pri207/

Review period for the new item closes on October 24, 2011.

Please see the page for links to discussion and relevant documents. Briefly, the new issue is:

PRI #207, Proposed Draft UTR #50, Unicode Properties for Vertical Text Layout

The layout of Japanese text follows different conventions than the layout of Western texts. Many of the requirements are described in the W3C Working Group Note "Requirements for Japanese Text Layout". This new proposed draft technical report describes two Unicode character properties which can be used to implement those requirements.

This is a moderated Public Review Issue. An unmoderated discussion takes place on the designated forum:
http://www.unicode.org/forum/viewforum.php?f=35
The moderator of this PRI will summarize the discussion and provide it as formal feedback to the Unicode Technical Committee. Towards the end of the PRI, formal feedback will also be accepted from other parties, if they feel that the summary does not reflect accurately their concerns.

Friday, September 23, 2011

PRI #185 has been modified

PRI 185 Extension of UBA for improved display of URL/IRIs has been modified as per discussion in the Unicode Technical Committee, based on feedback received. It is also now a moderated public review issue, to allow for extended informal discussion of the issues. For details, see http://www.unicode.org/review/pri185/. A forum has been created for this PRI, and further discussion should take place there, rather than via the email lists. See http://www.unicode.org/forum/viewforum.php?f=34