SlideShare a Scribd company logo
1
Diakritika in Unicode
Reinhold Heuvelmann
Ç↔C+◌̧
Code Charts
http://www.unicode.org/charts/ , Stichwort "combining"
– http://www.unicode.org/charts/PDF/U0300.pdf
– http://www.unicode.org/charts/PDF/U1AB0.pdf
– http://www.unicode.org/charts/PDF/U1DC0.pdf
– http://www.unicode.org/charts/PDF/U20D0.pdf
– http://www.unicode.org/charts/PDF/UFE20.pdf
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20172
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20173
Stacking Sequences, Beispiel 1
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20174
http://www.unicode.org/versions/Unicode9.0.0/ch02.pdf
Stacking Sequences, Beispiel 2
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20175
Aus den FAQ zu
Characters and Combining Marks
Q: Why are new combinations of Latin letters with
diacritical marks not suitable for addition to Unicode?
A: There are several reasons. First, Unicode encodes many
diacritical marks, and the combinations can already be
produced, as noted in the answers to some questions
above. If precomposed equivalents were added, the
number of multiple spellings would be increased, and
decompositions would need to be defined and
maintained for them, adding to the complexity of
existing decomposition tables in implementations.
...
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20176
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20177
Aus den FAQ zu
Characters and Combining Marks
...
Finally, normalization form NFC (the composed form
favored for use on the Web) is frozen—no new letter
combinations can be added to it. Therefore, the normalized
NFC representation of any new precomposed letters would
still use decomposed sequences, which can already be
expressed by combining character sequences in Unicode.
Nothing would be gained by adding the letter with
diacritical mark as a precomposed character; on the
contrary, adding such a letter would add one or more
multiple spellings to be reckoned with, incrementally
complicating all Unicode implementations for no net gain.
Kombinationen, Kombinationen
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20178
In der Anwendung durch die DNB
– aktuell keine Zeichensatz-Konversion bei der Erstellung
von bibliografischen Daten
– "garbage in - garbage out"
– MARC 21 ist neutral in Bezug auf Unicode composed
vs. decomposed
– Wo beginnen, wo enden?
– Tools sind vorhanden:
"uconv -f utf-8 -t utf-8 -x NFC [Datei]"
(mit Dank an Johann Rolschewski)
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20179
Danke
r.heuvelmann@dnb.de
| 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 201710
MARC 21 Specifications for Record Structure, Character Sets, and
Exchange Media, CHARACTER SETS AND ENCODING OPTIONS:
Part 3: Unicode Encoding Environment
https://www.loc.gov/marc/specifications/speccharucs.html
Part 4: Conversion Between Environments
https://www.loc.gov/marc/specifications/speccharconversion.html
Assessment of Options for Handling Full Unicode Character
Encodings in MARC21
https://www.loc.gov/marc/marbi/2004/2004-report01.pdf
https://www.loc.gov/marc/marbi/2005/2005-report01.pdf

More Related Content

PDF
Das_Doppelhaus
PDF
Open Access und Lizenzangaben im MARC-Format
PDF
European BIBFRAME Workshop
PDF
RDA / MARC / BIBFRAME: some observations
PDF
Zum "IFLA Library Reference Model"
PDF
From enthusiasm to hesitation, and beyond: some German remarks on BIBFRAME
PDF
Open Access und Lizenzangaben in MARC 21 - Update
PDF
Open Access and License Representation in MARC 21
Das_Doppelhaus
Open Access und Lizenzangaben im MARC-Format
European BIBFRAME Workshop
RDA / MARC / BIBFRAME: some observations
Zum "IFLA Library Reference Model"
From enthusiasm to hesitation, and beyond: some German remarks on BIBFRAME
Open Access und Lizenzangaben in MARC 21 - Update
Open Access and License Representation in MARC 21

More from Reinhold Heuvelmann (20)

PDF
Open Access und Lizenzangaben in MARC 21
PDF
Overview of Format Activities in Die Deutsche Bibliothek
PDF
Provenance in MARC 21
PDF
Linked Data at the German National Library
PDF
Entitäten, Relationen und mehr - Erweiterungen in MARC 21 Authority durch di...
PDF
Some requirements for a future metadata format
PDF
PDF
GND and URIs: Integration and Identification
PDF
PDF
Warum ausgerechnet BIBFRAME?
PDF
BIBFRAME: Wie geht es weiter?
PDF
2015 02-24 dnb-linking_data
PDF
Typen von Publikationen nach RDA
PDF
BIBFRAME on its way
PDF
BIBFRAME: Potential und Risiko
PDF
BIBFRAME Report from the German National Library
PDF
Die Implementierung von Content Type, Media Type und Carrier Type
PDF
MODS und MADS
PDF
Das MARC-Feld 924 "Bestandsinformationen"
PDF
Content type, Media type, Carrier type und ihre Implementierung
Open Access und Lizenzangaben in MARC 21
Overview of Format Activities in Die Deutsche Bibliothek
Provenance in MARC 21
Linked Data at the German National Library
Entitäten, Relationen und mehr - Erweiterungen in MARC 21 Authority durch di...
Some requirements for a future metadata format
GND and URIs: Integration and Identification
Warum ausgerechnet BIBFRAME?
BIBFRAME: Wie geht es weiter?
2015 02-24 dnb-linking_data
Typen von Publikationen nach RDA
BIBFRAME on its way
BIBFRAME: Potential und Risiko
BIBFRAME Report from the German National Library
Die Implementierung von Content Type, Media Type und Carrier Type
MODS und MADS
Das MARC-Feld 924 "Bestandsinformationen"
Content type, Media type, Carrier type und ihre Implementierung
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Getting Started with Data Integration: FME Form 101
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
August Patch Tuesday
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Tartificialntelligence_presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Getting Started with Data Integration: FME Form 101
OMC Textile Division Presentation 2021.pptx
A comparative analysis of optical character recognition models for extracting...
August Patch Tuesday
Group 1 Presentation -Planning and Decision Making .pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25-Week II
Building Integrated photovoltaic BIPV_UPV.pdf
Machine Learning_overview_presentation.pptx
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Tartificialntelligence_presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Ad

Diakritika in Unicode

  • 1. 1 Diakritika in Unicode Reinhold Heuvelmann Ç↔C+◌̧
  • 2. Code Charts http://www.unicode.org/charts/ , Stichwort "combining" – http://www.unicode.org/charts/PDF/U0300.pdf – http://www.unicode.org/charts/PDF/U1AB0.pdf – http://www.unicode.org/charts/PDF/U1DC0.pdf – http://www.unicode.org/charts/PDF/U20D0.pdf – http://www.unicode.org/charts/PDF/UFE20.pdf | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20172
  • 3. | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20173
  • 4. Stacking Sequences, Beispiel 1 | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20174 http://www.unicode.org/versions/Unicode9.0.0/ch02.pdf
  • 5. Stacking Sequences, Beispiel 2 | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20175
  • 6. Aus den FAQ zu Characters and Combining Marks Q: Why are new combinations of Latin letters with diacritical marks not suitable for addition to Unicode? A: There are several reasons. First, Unicode encodes many diacritical marks, and the combinations can already be produced, as noted in the answers to some questions above. If precomposed equivalents were added, the number of multiple spellings would be increased, and decompositions would need to be defined and maintained for them, adding to the complexity of existing decomposition tables in implementations. ... | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20176
  • 7. | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20177 Aus den FAQ zu Characters and Combining Marks ... Finally, normalization form NFC (the composed form favored for use on the Web) is frozen—no new letter combinations can be added to it. Therefore, the normalized NFC representation of any new precomposed letters would still use decomposed sequences, which can already be expressed by combining character sequences in Unicode. Nothing would be gained by adding the letter with diacritical mark as a precomposed character; on the contrary, adding such a letter would add one or more multiple spellings to be reckoned with, incrementally complicating all Unicode implementations for no net gain.
  • 9. In der Anwendung durch die DNB – aktuell keine Zeichensatz-Konversion bei der Erstellung von bibliografischen Daten – "garbage in - garbage out" – MARC 21 ist neutral in Bezug auf Unicode composed vs. decomposed – Wo beginnen, wo enden? – Tools sind vorhanden: "uconv -f utf-8 -t utf-8 -x NFC [Datei]" (mit Dank an Johann Rolschewski) | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 20179
  • 10. Danke r.heuvelmann@dnb.de | 10 | Diakritika in Unicode | Datenbezieher-Workshop 30. Mai 201710 MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, CHARACTER SETS AND ENCODING OPTIONS: Part 3: Unicode Encoding Environment https://www.loc.gov/marc/specifications/speccharucs.html Part 4: Conversion Between Environments https://www.loc.gov/marc/specifications/speccharconversion.html Assessment of Options for Handling Full Unicode Character Encodings in MARC21 https://www.loc.gov/marc/marbi/2004/2004-report01.pdf https://www.loc.gov/marc/marbi/2005/2005-report01.pdf