Friday, February 26, 2021

Unicode 14.0 Alpha Review

Vithkuqi chart image The repertoire for Unicode 14.0 is now open for early review and comment. During alpha review the repertoire is reasonably mature and stable, but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be entertained.

This early review is provided so that reviewers may consider the character repertoire issues prior to the start of beta review (currently scheduled to start in June, 2021). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Feedback for the alpha review should be reported under PRI #428 using the Unicode contact form by April 12, 2021.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, February 24, 2021

Enhancements to Unicode Regular Expressions

Regex image A Proposed Update UTS #18, Unicode Regular Expressions is now available for review and feedback.

Regular expressions are a key tool in software development. Back in 2000, few regular expression engines supported Unicode, even at a basic level. UTS #18 set out to raise the bar, describing how regular expression engines could be adapted to deal with Unicode correctly and completely. Since that time, major programming languages and libraries have adopted level 1 features (supporting all Unicode literals, basic character properties, subtraction, intersection, ...), and some also adopted some level 2 features (full character properties, grapheme clusters, ...).

A major enhancement to UTS #18 in 2020 focused on the addition of Character Classes with strings. The initial impetus for this was to handle emoji effectively in browsers, as most emoji consist of more than one code point. Supporting strings directly in character classes frees up programs from having to download large amounts of data or handle complicated syntax. Using a property like RGI_Emoji allows a regular expression to match both individual codes such as "😁" and multi-codepoint strings such as "🇫🇷". This extension to strings is also important for internationalization. For example, the alphabets used by many languages contain multi-code-point strings, so this extension allows them to be handled easily.

Additional enhancements are in progress this year, based on working with members of the ECMAScript committee, including more clarifications, better guidance on implementation, and addressing some tricky issues dealing with complementing (inverting) Character Classes. The end goal of all of these enhancements in 2020 and 2021 is to significantly raise the level of Unicode support in programming languages and libraries.

For more information, see https://www.unicode.org/review/pri427/.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, February 2, 2021

Unicode Consortium looking to hire an Executive Director

Since its founding, the Unicode Consortium has grown and expanded its charter and scope. We’re embarking on a new chapter in the evolution of the Consortium by initiating the search for a leader with proven executive talents to fill the newly-created position of Executive Director. Learn more: https://www.unicode.org/consortium/edappinfo.html


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, January 28, 2021

Unicode Consortium Elects New Directors to its Board

The Consortium is pleased to announce the following Board of Directors election results from its annual Member’s meeting:

Elected to new 3-year board terms:

Brent Getlin, Director of Product Development and General Manager, Fonts and Type, Adobe, Inc.
Brent is the Director of Product Development and General Manager for Adobe Fonts and Type at Adobe. Previously, Brent managed Adobe's mobile gaming engineering and Macromedia Flash video encoder. Brent holds a Bachelor of Science degree in Computer Engineering from Southern Methodist University.

Teresa Marshall, VP of Globalization and Localization, Salesforce, Inc.
As VP of Globalization and Localization, Teresa drives globalization efforts across Salesforce, including internationalization, international product management and localization. She started her career as a German linguist and has held program and operational management positions at a number of Silicon Valley companies as well as academic positions in the field of language translation. Teresa holds a MA in Translation and Interpreting from the Monterey Institute of International Studies.

Re-elected to another 3-year term on the board:

David Singer, Apple, Inc.
David Singer is the senior engineer who coordinates standards activity for software engineering at Apple. In this role, he serves directly in both technical roles (multimedia systems at MPEG and 3GPP) and strategic roles (Advisory Committee and Advisory Board at the W3C, past Blu-ray Director), and indirectly oversees Apple’s involvement in a wide range of standards bodies and consortia, including ITU-T and ITU-R, SMPTE, and INCITS. David holds a BA and PhD from the University of Cambridge, England.

Newly elected to a 2-year term:

Dr. Mark Davis, Google, Inc.
Dr. Mark Davis co-founded the Unicode project and has been the president of the Unicode Consortium since its incorporation in 1991. Having held positions at IBM and Apple, Mark joined Google in 2006 where he has been working on software internationalization focusing on effective and secure use of Unicode (especially in the index and search pipeline), the software internationalization libraries (including ICU), and stable international identifiers.

“We also wish to thank retiring directors Marypat Meuli and James Robertson for their combined many years of service to the Consortium as board members.” said Davis.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, January 26, 2021

Salesforce Joins as Full Member of the Unicode Consortium

The Unicode Consortium is pleased to announce that Salesforce has joined as a full member.

“Salesforce is pleased to join the Unicode Consortium and advance the ability for software to reach people in their native and local languages,” said Teresa Marshall, VP, Globalization and Localization, Salesforce.

In addition to Salesforce joining as a full member, Marshall was also elected to the Consortium’s Board of Directors. At Salesforce, Marshall drives globalization efforts, including internationalization, international product management and localization. She started her career as a German linguist and has held program and operational management positions at a number of Silicon Valley companies, as well as academic positions in the field of language translation. Teresa holds an MA in Translation and Interpreting from the Monterey Institute of International Studies.

“We are delighted to have Salesforce join as our newest full member,” said Mark Davis, President, Unicode Consortium. “As platforms grow to serve a growing international user base, it becomes increasingly important to invest in and develop standards that allow efficient support of local languages. Unicode exists for precisely that purpose.”

Full members of the Consortium have a vote in all technical committees, and in the governance of the Consortium. A full list of Consortium members can be found here: https://home.unicode.org/membership/members/

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, December 18, 2020

Unicode 2020 Bulldog Award

Image of Kristi LeeThe 2020 Unicode Bulldog Award recognizes Kristi Lee for her significant contributions to the work of the Unicode Consortium’s CLDR Technical Committee. Upon joining the CLDR committee as Microsoft’s representative, Kristi quickly focused on improving CLDR development and release processes, enabling the CLDR team to work far more efficiently and effectively. This has improved the functionality of the CLDR Survey Tool, and thus better serves the users of CLDR releases. Among many other improvements, she instituted and organized periodic CLDR face-to-face meetings where the team can focus on strategic planning. Through all these efforts, Kristi has brought strong leadership to enable more streamlined development and a better focus on future directions.

In 2020, Kristi was formally made Vice Chair of the CLDR Technical Committee, a role she had effectively filled for some time!

Thursday, October 29, 2020

ICU 68 Released

ICU LogoUnicode® ICU 68 has just been released. ICU 68 updates to CLDR 38 locale data with many additions and corrections. ICU 68 brings support for locale-dependent smart unit preferences (road distance, temperature, etc.), implements locale ID canonicalization conformant with CLDR, and includes many other bug fixes and enhancements.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see http://site.icu-project.org/download/68.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, October 28, 2020

Unicode CLDR Language Data v38 released

[CLDR v38 image] The final release of Unicode CLDR version 38 is now available. Unicode CLDR provides an update to the key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 focused on enhancing the support for existing locales: Support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for many more non-emoji symbols (~400), plus for Emoji v13.1. In this version, there is also substantially higher coverage for (in order of completeness): Norwegian Nynorsk, Hausa, Igbo, Breton, Quechua, Yoruba, Fulah (Adlam script), Chakma, Asturian, Sanskrit, and Dogri.

The Survey Tool has improvements in performance, and introduced structured forum requests to improve coordination among translators. We would like to thank the 393 language experts who contributed to this release.

There are some changes that affect existing specifications and data: for example, the plural rules for French changed to add a new category; the specification for using aliases is more rigorous, and some alias data has changed — along with the specification for handling locale identifier canonicalization. For more information, see Migration.

The overall changes to the data items were:
Added Deleted Changed
155,131 33,805 45,895

See additional details in the CLDR v38 Release note.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]