SlideShare a Scribd company logo
pdf.js
Julian Viereck
  @jviereck
Overview
• What is pdf.js
• How PDF is structured
• Processing in pdf.js
• Images & Fonts
• Problems
• Todo
• Demo
What is pdf.js

•   building faithful & efficient PDF renderer
•   HTML5 technology experiment
•   no native code
•   secure (web sandbox)
•   Mozilla Labs Project - Open Source
How PDF is structured
 Header      PDF version

             sequence of objets
   Body

 [Objects]   fonts, drawing cmds, images,
             words, bookmarks, form fields
xRef Table   mapping objID     byte offset
  Trailer    root objID, xRef byte offset
 PDF file     root obj = ref to pages catalog
Processing in pdf.js
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)                    Internal

• page.startRendering(graphics)            Representation




 • read & convert all PDF cmds ➟ IR PartialEvaluator
 • load required objects (fonts, images)
 • graphics.executeIR(IR)                CanvasGraphics
1. page=PDFDoc.getPage(2)     5 0 obj
                                                stream maybe
   ➟ obj#3                    <<                   encoded!
                               /Length 8 0 R
2. page.startRendering(...)   >>
   ➟ obj#4, obj#5              stream
                                 /GS1 gs
                                 /F0 12 Tf
 3 0 obj                         BT
 <<                                100 700 Td
 /Type /Page                       (Hello World!) Tj
 /MediaBox 	

 0 612 792]
                [0               ET
 /Resources 	

 4 0 R            50 600 m
 /Parent 	

	

 2 0 R            400 600 l
 /Contents 	

5 0 R              S
 >>                            endstream
 endobj                       endobj
xRef, catalog,                                  IR
5 0 obj           + resources            PartialEvaluator         Form
<<
 /Length 8 0 R
>>                                  setGState: 	

   [ LW: 10 ]
 stream                             dependency:	

   [ font0 ]
   /GS1 gs                          setFont: 	

     font0, 12
   /F0 12 Tf                        beginText
   BT                               moveText: 	

    100, 700
     100 700 Td                     showText: 	

    “Hello World!”
     (Hello World!) Tj              endText
   ET                               moveTo: 	

      50, 600
   50 600 m                         lineTo: 	

      400, 600
   400 600 l                        stroke
   S
 endstream
endobj                                  CanvasGraphics
Images
• JPEG streams:
 • DOMImg.src = 'data:image/jpeg;base64,'
    + window.btoa(bytesToString(bytes));
• If not JPEG stream:
 • read bytes, convert to colorspace
 • imgData = canvas.getImageData()
 • fillWithPixelData(bytes, imgData)
 • canvas.putImageData(imgData)
Fonts
• There are lots of different font formats!
 • fonts are converted to OpenType
 • use CSS:
      @font-face { font-family:'font0'; src:url
    (data:font/opentype;base64, ...)
• some fonts can’t be converted :(
 • use drawing commands?
Problems
                                       platform =
                                     browser + OS

• No way to detect font is loaded (hacks)
• Font width (wrong on some platforms)
• Subpixel font size depending on platform
• Text selection
• Printing
• Speed
 • use workers (postMessage lose shape)
 • partial rendering
Todo
• more font work, printing, speed
• support more rendering spec
• explore using SVG
• PDF forms, “advanced PDF features”
• infrastructure: automated testing, requireJS
• test more PDF (need your help!)
Demo
Contact
Github:
 https://github.com/andreasgal/pdf.js
Mailing list:
 https://groups.google.com/group/
mozilla.dev.pdf-js/topics
IRC:
 irc.mozilla.org #pdfjs

More Related Content

KEY
MongoDB: How it Works
PDF
PDF.JS at SwissJeese 2012
PDF
FrozenRails Training
PDF
MongoDB + node.js で作るソーシャルゲーム
KEY
第一回MongoDBソースコードリーディング
PPTX
MongoDB Shell Tips & Tricks
PPTX
MongoDB's New Aggregation framework
PPT
Full-Stack JavaScript with Node.js
MongoDB: How it Works
PDF.JS at SwissJeese 2012
FrozenRails Training
MongoDB + node.js で作るソーシャルゲーム
第一回MongoDBソースコードリーディング
MongoDB Shell Tips & Tricks
MongoDB's New Aggregation framework
Full-Stack JavaScript with Node.js

What's hot (20)

PDF
MongoDB at MercadoLibre
PDF
CouchDB Open Source Bridge
PPTX
Mastering the MongoDB Javascript Shell
ODP
MongoDB & PHP
PDF
CouchDB in The Room
PPTX
Shankar's mongo db presentation
PDF
faastCrystal
PDF
JSONSchema with golang
PDF
Apache AVRO (Boston HUG, Jan 19, 2010)
PDF
Dirty - How simple is your database?
PPT
PhpstudyTokyo MongoDB PHP CakePHP
PPTX
Shell Tips & Tricks
PDF
Trimming The Cruft
PDF
Mongo Presentation by Metatagg Solutions
PDF
Getting Started with MongoDB
PPT
javaScript.ppt
PDF
Apache CouchDB Presentation @ Sept. 2104 GTALUG Meeting
PDF
NoSQL - An introduction to CouchDB
PPTX
Mongo DB 102
PPTX
MongoDb In Action
MongoDB at MercadoLibre
CouchDB Open Source Bridge
Mastering the MongoDB Javascript Shell
MongoDB & PHP
CouchDB in The Room
Shankar's mongo db presentation
faastCrystal
JSONSchema with golang
Apache AVRO (Boston HUG, Jan 19, 2010)
Dirty - How simple is your database?
PhpstudyTokyo MongoDB PHP CakePHP
Shell Tips & Tricks
Trimming The Cruft
Mongo Presentation by Metatagg Solutions
Getting Started with MongoDB
javaScript.ppt
Apache CouchDB Presentation @ Sept. 2104 GTALUG Meeting
NoSQL - An introduction to CouchDB
Mongo DB 102
MongoDb In Action
Ad

Similar to 2011 09-pdfjs (20)

PPT
GTC 2012: GPU-Accelerated Path Rendering
PDF
JUI 2009 O3D Programming
PDF
From Flash to Canvas - a penchant for black holes
PDF
PDF in Smalltalk
PDF
Building a Visualization Language
PDF
Pdf secrets v2
PPTX
HTML5 Graphics - Canvas and SVG
PDF
Power of canvas
PDF
MS TechDays 2011 - HTML 5 All the Awesome Bits
KEY
Getting Started with CoreGraphics
ODP
Working With Canvas
PDF
SVGo: a Go Library for SVG generation
PDF
Desingning reusable web components
PDF
Desingning reusable web components
PDF
Zenika - iText in Action
PPTX
Ultrasound Image Viewer - Qt + SGX
PPTX
Introduction to video streaming on SGX through v3dfx-base
PPTX
Html5 Canvas and Mobile Graphics
PDF
The Web Becomes Graceful
PDF
JavaOne 2009 - 2d Vector Graphics in the browser with Canvas and SVG
GTC 2012: GPU-Accelerated Path Rendering
JUI 2009 O3D Programming
From Flash to Canvas - a penchant for black holes
PDF in Smalltalk
Building a Visualization Language
Pdf secrets v2
HTML5 Graphics - Canvas and SVG
Power of canvas
MS TechDays 2011 - HTML 5 All the Awesome Bits
Getting Started with CoreGraphics
Working With Canvas
SVGo: a Go Library for SVG generation
Desingning reusable web components
Desingning reusable web components
Zenika - iText in Action
Ultrasound Image Viewer - Qt + SGX
Introduction to video streaming on SGX through v3dfx-base
Html5 Canvas and Mobile Graphics
The Web Becomes Graceful
JavaOne 2009 - 2d Vector Graphics in the browser with Canvas and SVG
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PPTX
1. Introduction to Computer Programming.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
cloud_computing_Infrastucture_as_cloud_p
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Assigned Numbers - 2025 - Bluetooth® Document
SOPHOS-XG Firewall Administrator PPT.pptx
Empathic Computing: Creating Shared Understanding
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
1. Introduction to Computer Programming.pptx
Network Security Unit 5.pdf for BCA BBA.
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
cloud_computing_Infrastucture_as_cloud_p

2011 09-pdfjs

  • 2. Overview • What is pdf.js • How PDF is structured • Processing in pdf.js • Images & Fonts • Problems • Todo • Demo
  • 3. What is pdf.js • building faithful & efficient PDF renderer • HTML5 technology experiment • no native code • secure (web sandbox) • Mozilla Labs Project - Open Source
  • 4. How PDF is structured Header PDF version sequence of objets Body [Objects] fonts, drawing cmds, images, words, bookmarks, form fields xRef Table mapping objID byte offset Trailer root objID, xRef byte offset PDF file root obj = ref to pages catalog
  • 5. Processing in pdf.js • get plain Uint8Array via XHR2, build Stream • new PDFDoc(stream): read xRef, root object • page = PDFDoc.getPage(N) Internal • page.startRendering(graphics) Representation • read & convert all PDF cmds ➟ IR PartialEvaluator • load required objects (fonts, images) • graphics.executeIR(IR) CanvasGraphics
  • 6. 1. page=PDFDoc.getPage(2) 5 0 obj stream maybe ➟ obj#3 << encoded! /Length 8 0 R 2. page.startRendering(...) >> ➟ obj#4, obj#5 stream /GS1 gs /F0 12 Tf 3 0 obj BT << 100 700 Td /Type /Page (Hello World!) Tj /MediaBox 0 612 792] [0 ET /Resources 4 0 R 50 600 m /Parent 2 0 R 400 600 l /Contents 5 0 R S >> endstream endobj endobj
  • 7. xRef, catalog, IR 5 0 obj + resources PartialEvaluator Form << /Length 8 0 R >> setGState: [ LW: 10 ] stream dependency: [ font0 ] /GS1 gs setFont: font0, 12 /F0 12 Tf beginText BT moveText: 100, 700 100 700 Td showText: “Hello World!” (Hello World!) Tj endText ET moveTo: 50, 600 50 600 m lineTo: 400, 600 400 600 l stroke S endstream endobj CanvasGraphics
  • 8. Images • JPEG streams: • DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes)); • If not JPEG stream: • read bytes, convert to colorspace • imgData = canvas.getImageData() • fillWithPixelData(bytes, imgData) • canvas.putImageData(imgData)
  • 9. Fonts • There are lots of different font formats! • fonts are converted to OpenType • use CSS: @font-face { font-family:'font0'; src:url (data:font/opentype;base64, ...) • some fonts can’t be converted :( • use drawing commands?
  • 10. Problems platform = browser + OS • No way to detect font is loaded (hacks) • Font width (wrong on some platforms) • Subpixel font size depending on platform • Text selection • Printing • Speed • use workers (postMessage lose shape) • partial rendering
  • 11. Todo • more font work, printing, speed • support more rendering spec • explore using SVG • PDF forms, “advanced PDF features” • infrastructure: automated testing, requireJS • test more PDF (need your help!)
  • 12. Demo
  • 13. Contact Github: https://github.com/andreasgal/pdf.js Mailing list: https://groups.google.com/group/ mozilla.dev.pdf-js/topics IRC: irc.mozilla.org #pdfjs