SlideShare a Scribd company logo

      
       
      
     
      
       Processing ODF 
      
     
      
       
        
         Lars Oppermann 
        
       
       
        
         
          
           Software Engineer 
          
         
        
       
       
        
         
          
           
            
             Sun Microsystems

      
       Office Productivity Documents 
      
     
      
       
        
         Traditionally, office documents are confined to applications. 
        
       
       
        
         We use office documents to express and share important information 
        
       
       
        
         Office documents are an important part of collaboration 
        
       
       
        
         Office documents have been hard to process without the application in which they were created.

      
       Better: Open File Formats and ODF 
      
     
      
       
        
         Standardized, free to use for anyone 
        
       
       
        
         Easy to access programmatically 
        
       
       
        
         Based on existing standards 
        
       
       
        
         
          
           Zip, XML 
          
         
        
       
       
        
         
          
           HTML, CSS, SVG, MathML etc... 
          
         
        
       
       
        
         Programming platforms support for base technologies 
        
       
       
        
         Well known and understood vocabularies

      
       OpenDocument Format (ODF) 
      
     
      
       
        
         Specification available at http://www.oasis-open.org 
        
       
       
        
         Get “Open Document Essentials” by David Eisenberg! 
        
       
       
        
         Look at documents and experiment 
        
       
       
        
         
          
           Unzip and use your favorite text editor 
          
         
        
       
       
        
         
          
           Edit doc in OpenOffice and see what happens

      
       Looking into an ODF file 
      
     
      
       
        
         Basic anatomy of an ODF file 
        
       
       
        
         
          
           Zip container 
          
         
        
       
       
        
         
          
           
            
             Manifest, mimetype and streams 
            
           
          
         
        
       
       
        
         
          
           
            
             content.xml 
            
           
          
         
        
       
       
        
         
          
           
            
             meta.xml 
            
           
          
         
        
       
       
        
         
          
           
            
             styles.xml

      
       Building a search engine 
      
     
      
       
        
         Adding ODF support to apache Lucene 
        
       
       
        
         
          
           Wrapping ODF XML as lucene documents 
          
         
        
       
       
        
         
          
           Scanning a directory and build an index 
          
         
        
       
       
        
         Search for document content and meta data

      
       ODF search engine (cont.)

      
       Direct XML Processing 
      
     
      
       
        
         Available in most programming environments 
        
       
       
        
         Widely used and understood 
        
       
       
        
         Flexible 
        
       
       
        
         Low level of abstraction

      
       XSLT 
      
     
      
       
        
         Good for processing content on the XML level 
        
       
       
        
         
          
           Conversion 
          
         
        
       
       
        
         
          
           Extraction 
          
         
        
       
       
        
         
          
           Merging 
          
         
        
       
       
        
         Complicated if more abstraction is needed 
        
       
       
        
         
          
           Items that have semantics beyond the XML infoset need special attention. E.g. style inheritance. 
          
         
        
       
       
        
         Some infrastructure needed to work on packages 
        
       
       
        
         
          
           Using “flat” representation of ODF may be an alternative

      
       Frameworks and Toolkits 
      
     
      
       
        
         More abstraction 
        
       
       
        
         Hide XML, expose more ODF semantics 
        
       
       
        
         
          
           Style inheritance, style management 
          
         
        
       
       
        
         
          
           Page templates 
          
         
        
       
       
        
         
          
           Links, references and footnotes 
          
         
        
       
       
        
         Bridge ODF and language paradigms 
        
       
       
        
         
          
           Use platform specific interface conventions 
          
         
        
       
       
        
         
          
           Platform specific containers and collections

      
       odftoolkit Project

      
       Odf4j

      
       Odf4j (cont.) 
      
     
      
       
        
         ODF support for the Java platform 
        
       
       
        
         Supports ODF at various levels of abstraction: 
        
       
       
        
         
          
           Package 
          
         
        
       
       
        
         
          
           XML 
          
         
        
       
       
        
         
          
           Object Model 
          
         
        
       
       
        
         No automatic XML/OO mapping 
        
       
       
        
         
          
           Rather implement semantics defined in the ODF specification that are not part of the schema 
          
         
        
       
       
        
         Try to bridge ODF, XML and platform paradigms

      
       AODL 
      
     
      
       
        
         ODF for the .NET platform 
        
       
       
        
         Design goals like odf4j 
        
       
       
        
         Implemented in C# 
        
       
       
        
         Offers object model for main ODF document types 
        
       
       
        
         Includes experimental PDF and HTML generators

      
       AODL demo application

      
       ODF-DOM 
      
     
      
       
        
         Alternative to custom object model 
        
       
       
        
         
          
           Interfaces derived from org.w3c.dom.Element et al. 
          
         
        
       
       
        
         
          
           Allows closer integration with other XML tools, e.g. parsers/serializers, transformers etc. 
          
         
        
       
       
        
         
          
           Integration of high-level APIs not as natural as with custom object model, e.g. text portions

      
       Limitations 
      
     
      
       
        
         Information that is derived by rendition is not normally available at the file format level 
        
       
       
        
         
          
           Page/line numbers, list numbering 
          
         
        
       
       
        
         
          
           Computed fields 
          
         
        
       
       
        
         Derived information can be persisted by rendering application 
        
       
       
        
         Processing chains should not rely on derived values if document could have been modified by non-rendering application

      
       Future Work 
      
     
      
       
        
         Harmonize toolkits 
        
       
       
        
         Coherent programming experience for ODF on all platforms 
        
       
       
        
         Create higher level tools on top of frameworks

      
       Future Work (cont.) 
      
     
      
       
        
         New metadata mechanism 
        
       
       
        
         
          
           TC subcommittee finished proposal 
          
         
        
       
       
        
         
          
           currently being reviewed 
          
         
        
       
       
        
         Enables new ways to extend ODF 
        
       
       
        
         
          
           based on semantic web technologies 
          
         
        
       
       
        
         
          
           RDF, OWL 
          
         
        
       
       
        
         
          
           integrates content and metadata

      
       Links 
      
     
      
       
        
         OpenDocument TC 
        
       
       
        
         
          
           http://www.oasis-open.org/ committees/tc_home.php?wg_abbrev=office 
          
         
        
       
       
        
         OpenDocument Essentials 
        
       
       
        
         
          
           http://books.evc-cit.info/ 
          
         
        
       
       
        
         odftoolkit project (odf4j, AODL) 
        
       
       
        
         
          
           http://odftoolkit.openoffice.org 
          
         
        
       
       
        
         ODF Perl module 
        
       
       
        
         
          
           http://search.cpan.org/dist/OpenOffice-OODoc/

      
       
      
     
      
       Processing ODF 
      
     
      
       
        
         Lars Oppermann 
        
       
       
        
         
          
           [email_address]

More Related Content

ODP
An RDF Metadata Model for OpenDocument Format 1.2
ODP
Office OpenXML: a technical approach for OOo.
ODP
ODF Toolkit with .NET Support
ODP
Reliable interoperation between OpenOffice & MS office by UOML
PPTX
Overview of XSL, XPath and XSL-FO
PDF
XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)
PPTX
Xml applications
ODP
UNO based ODF Toolkit API
An RDF Metadata Model for OpenDocument Format 1.2
Office OpenXML: a technical approach for OOo.
ODF Toolkit with .NET Support
Reliable interoperation between OpenOffice & MS office by UOML
Overview of XSL, XPath and XSL-FO
XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)
Xml applications
UNO based ODF Toolkit API

What's hot (20)

PPT
CustomizingStyleSheetsForHTMLOutputs
PDF
XML
PPTX
Overview of the DITA Open Toolkit
PPT
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
PDF
PPT
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
PPT
Introduction to XML
PDF
Introduction to xml
PDF
Introduction to XML
PDF
Understanding Dom
PDF
Building XML Based Applications
PDF
Introduction to XML and Databases
PPTX
Dita ot pipeline webinar
PPTX
Intro xml
PPT
Xml Presentation-3
PPS
Xml basics for beginning
PPTX
Basics of XML
PDF
Xml databases
PPT
Ch2 neworder
CustomizingStyleSheetsForHTMLOutputs
XML
Overview of the DITA Open Toolkit
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
Introduction to XML
Introduction to xml
Introduction to XML
Understanding Dom
Building XML Based Applications
Introduction to XML and Databases
Dita ot pipeline webinar
Intro xml
Xml Presentation-3
Xml basics for beginning
Basics of XML
Xml databases
Ch2 neworder
Ad

Viewers also liked (7)

ODP
Doc.next - The Future of the Documentation Project
ODP
Intelligent Impress
ODP
The adoption of ODF in the South African public sector
PPS
Los10 Autos Mas Caros Www 1 [1].Diapositivas.Com
ODP
An Application Using Writer as a GUI for Creating and Maintaining [e]BNFs
ODP
The Policy, Planning and Pragmatic Reasons
ODP
C++ development within OOo
Doc.next - The Future of the Documentation Project
Intelligent Impress
The adoption of ODF in the South African public sector
Los10 Autos Mas Caros Www 1 [1].Diapositivas.Com
An Application Using Writer as a GUI for Creating and Maintaining [e]BNFs
The Policy, Planning and Pragmatic Reasons
C++ development within OOo
Ad

Similar to Processing OpenDocument Format (20)

ODP
XML based filters in OpenOffice.org
PPTX
Open XML Formats For CIO's
PDF
C P Doc Rev Story
PDF
EclipseConEurope2012 SOA - Models As Operational Documentation
PDF
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
ODP
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
PDF
Jmp107 Web Services
PDF
Aural Interfaces to Databases based on VoiceXML
PDF
XML Bible
PDF
Migrating from Unstructured to Structured FrameMaker
PPT
The Big Documentation Extravaganza
PDF
XML in software development
ODT
Api for-odfpy
PDF
XML-Javascript
PDF
XML-Javascript
PDF
XSD%20and%20jCAM%20tutorial
PDF
XSD%20and%20jCAM%20tutorial
PDF
Innovation & value creation in the document space
PPT
Everything You Always Wanted To Know About XML But Were Afraid To Ask
PDF
Single API for library services (poster)
XML based filters in OpenOffice.org
Open XML Formats For CIO's
C P Doc Rev Story
EclipseConEurope2012 SOA - Models As Operational Documentation
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
Jmp107 Web Services
Aural Interfaces to Databases based on VoiceXML
XML Bible
Migrating from Unstructured to Structured FrameMaker
The Big Documentation Extravaganza
XML in software development
Api for-odfpy
XML-Javascript
XML-Javascript
XSD%20and%20jCAM%20tutorial
XSD%20and%20jCAM%20tutorial
Innovation & value creation in the document space
Everything You Always Wanted To Know About XML But Were Afraid To Ask
Single API for library services (poster)

More from Alexandro Colorado (20)

ODP
Bitcuners revolucion blockchain
ODP
Presentacion Krita
ODP
Bitcuners porque bitcoins
ODP
ChamiloCon Enseñando con Tecnología
ODP
Curso de desarrollo web para principiantes
ODP
ChamiloCon: Recursos de Software Libre
ODP
Krita - Tu tambien puedes pintar un arbol Feliz
ODP
Gobernancia y particionacion en comunidades de Software Libre v2
PDF
Blender - FLISOL Cancun 2014
ODP
The Hitchhicker's Guide to Opensource
ODP
OpenERP: El ecosistema de negocios
ODP
Aprendiendo GnuPG
ODP
Catalogo decursos
ODP
Practicas virtuales v2.2
ODP
Introducción al curso de Extensiones de OpenOffice
ODP
Comunidades software libre
ODP
Practicas virtuales v2
ODP
Practicas virtuales
ODP
Economia digital
Bitcuners revolucion blockchain
Presentacion Krita
Bitcuners porque bitcoins
ChamiloCon Enseñando con Tecnología
Curso de desarrollo web para principiantes
ChamiloCon: Recursos de Software Libre
Krita - Tu tambien puedes pintar un arbol Feliz
Gobernancia y particionacion en comunidades de Software Libre v2
Blender - FLISOL Cancun 2014
The Hitchhicker's Guide to Opensource
OpenERP: El ecosistema de negocios
Aprendiendo GnuPG
Catalogo decursos
Practicas virtuales v2.2
Introducción al curso de Extensiones de OpenOffice
Comunidades software libre
Practicas virtuales v2
Practicas virtuales
Economia digital

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectroscopy.pptx food analysis technology
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.

Processing OpenDocument Format

  • 1. Processing ODF Lars Oppermann Software Engineer Sun Microsystems
  • 2. Office Productivity Documents Traditionally, office documents are confined to applications. We use office documents to express and share important information Office documents are an important part of collaboration Office documents have been hard to process without the application in which they were created.
  • 3. Better: Open File Formats and ODF Standardized, free to use for anyone Easy to access programmatically Based on existing standards Zip, XML HTML, CSS, SVG, MathML etc... Programming platforms support for base technologies Well known and understood vocabularies
  • 4. OpenDocument Format (ODF) Specification available at http://www.oasis-open.org Get “Open Document Essentials” by David Eisenberg! Look at documents and experiment Unzip and use your favorite text editor Edit doc in OpenOffice and see what happens
  • 5. Looking into an ODF file Basic anatomy of an ODF file Zip container Manifest, mimetype and streams content.xml meta.xml styles.xml
  • 6. Building a search engine Adding ODF support to apache Lucene Wrapping ODF XML as lucene documents Scanning a directory and build an index Search for document content and meta data
  • 7. ODF search engine (cont.)
  • 8. Direct XML Processing Available in most programming environments Widely used and understood Flexible Low level of abstraction
  • 9. XSLT Good for processing content on the XML level Conversion Extraction Merging Complicated if more abstraction is needed Items that have semantics beyond the XML infoset need special attention. E.g. style inheritance. Some infrastructure needed to work on packages Using “flat” representation of ODF may be an alternative
  • 10. Frameworks and Toolkits More abstraction Hide XML, expose more ODF semantics Style inheritance, style management Page templates Links, references and footnotes Bridge ODF and language paradigms Use platform specific interface conventions Platform specific containers and collections
  • 11. odftoolkit Project
  • 12. Odf4j
  • 13. Odf4j (cont.) ODF support for the Java platform Supports ODF at various levels of abstraction: Package XML Object Model No automatic XML/OO mapping Rather implement semantics defined in the ODF specification that are not part of the schema Try to bridge ODF, XML and platform paradigms
  • 14. AODL ODF for the .NET platform Design goals like odf4j Implemented in C# Offers object model for main ODF document types Includes experimental PDF and HTML generators
  • 15. AODL demo application
  • 16. ODF-DOM Alternative to custom object model Interfaces derived from org.w3c.dom.Element et al. Allows closer integration with other XML tools, e.g. parsers/serializers, transformers etc. Integration of high-level APIs not as natural as with custom object model, e.g. text portions
  • 17. Limitations Information that is derived by rendition is not normally available at the file format level Page/line numbers, list numbering Computed fields Derived information can be persisted by rendering application Processing chains should not rely on derived values if document could have been modified by non-rendering application
  • 18. Future Work Harmonize toolkits Coherent programming experience for ODF on all platforms Create higher level tools on top of frameworks
  • 19. Future Work (cont.) New metadata mechanism TC subcommittee finished proposal currently being reviewed Enables new ways to extend ODF based on semantic web technologies RDF, OWL integrates content and metadata
  • 20. Links OpenDocument TC http://www.oasis-open.org/ committees/tc_home.php?wg_abbrev=office OpenDocument Essentials http://books.evc-cit.info/ odftoolkit project (odf4j, AODL) http://odftoolkit.openoffice.org ODF Perl module http://search.cpan.org/dist/OpenOffice-OODoc/
  • 21. Processing ODF Lars Oppermann [email_address]