SlideShare a Scribd company logo
Generic compression,
decompression, archive
library for 7zip: py7zr
Hiroshi Miura
https://github.com/miurahr
Popular compression formats for data science
Namep Born Algorithm and Strategy Python
lzo 2005 Improved LZ77 python-lzo (**)
quicklz 2006 Improved LZ77, speed python-quicklz (**)
brotli 2009 Improved LZ77, Hafmann
encode, 2nd context model
python-brotli(**)
lz4 2011 Improved LZ77, speed python-lz4 (**)
snappy 2011 Improved LZ77, speed python-snappy(**)
Zstarndard 2015 Improved LZ77, speed 、
entropy encode
python-zstandard(**)
** Binding to C library
Popular data compression and archiving
formats
Name Born Compression Algorithm Tools Python
TAR 1979 None GNU tar tarfile
ZIP 1989 Deflate,(bzip2,LZMA,PPMd *) PKZIP, WinZip zipfile
GZIP 1992 Deflate GNU gzip, zlib gzip
xz 1996 LZMA, LZMA2 XZ Utils, 7-zip lzma
Bzip2 1996 RLE,BWT,MTF,huffman code, delta Bzip2, 7-zip, bz2
7zip 1999 LZMA, LZMA2, Bzip2, PPMd, Deflate 7-Zip p7zip py7zr
Pure Python 7zip library - py7zr
●
Utilize lzma support on Python core (> python 3.3)
– Python 3.7 Supports LZMA, LZMA2, BCJ, Delta
– No support for BCJ2, PPMd compression algorithms.
●
7-zip compression and decompression with Pure python
– Supports UNIX extensions for file permission as compatible with p7zip.
●
Quality
– CI/CD, coverage with azure-pipelines and travis-CI
– Static type checks with mypy
– Documentations
Usage
import py7zr
sf = py7zr.SevenZipFile("sample.7z", “r”)
sf.list()
sf.extractall(path=”tmp”)
sf.close()
$ pip install py7zr
$ py7zr l sample.7z
Inside py7zr: Class design
a) archiveinfo package: hold classes to represent 7zip header
structures.
b) py7zr package: provide
compression/decompression APIs
c)compressor package:
Implement compression algorithms
Inside py7zr: design patterns
●
Utilize Observer pattern
●
Minimum memory foot print for compression/decompression
●
Small number of file descriptor utilization.
●
Support selectable decompression
from archive.
●
Not implement yet: progress display
Intoroduction of py7zr
Inside py7zr
●
Multi-threading compression/decompression for large scale
archive file
– lzma core library in Python core is not thread safe
– Generate LZMADecompressor object for each threads.
●
Unit tests and file extraction tests
Copyright and license
●
py7zr is distributed under GNU general public license 2.1 and
later
●
Copyrights
– 2019 Hiroshi Miura
– pylzma copyright(c) 2004-2015 by Joachim Bauch
– 7-Zip copyright (c) 1999-2010 Igor Pavlov
Active community
●
Community development on github project
https://github.com/miurahr/py7zr
●
As usual, forks and pull requests are welcome.
●
Decompression is now beta quality, compression is now alpha.
●
Implementation of compression is under active development.

More Related Content

PPTX
Xampp installation guide
PDF
JavaScript - Chapter 12 - Document Object Model
PPTX
Event In JavaScript
PPTX
Advance Java Programming(CM5I) Event handling
PDF
Visual Basic IDE Introduction
PDF
Best Practices in Qt Quick/QML - Part II
 
PPT
Samba server configuration
PDF
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Xampp installation guide
JavaScript - Chapter 12 - Document Object Model
Event In JavaScript
Advance Java Programming(CM5I) Event handling
Visual Basic IDE Introduction
Best Practices in Qt Quick/QML - Part II
 
Samba server configuration
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...

What's hot (20)

PPTX
Introduction to git & GitHub
PPTX
Introduction about Python by JanBask Training
PDF
Course 102: Lecture 9: Input Output Internals
PPTX
Version control
PPTX
Html vs xhtml
PPTX
Virtual machine
PPTX
Daemons
PDF
PDF
Introduction to GitHub Actions
PDF
Python programming : Standard Input and Output
PPTX
Data Structures in Python
PPTX
Python variables and data types.pptx
PPTX
Window object
PPTX
Github in Action
PPTX
Javascript event handler
PDF
Git & GitHub WorkShop
PPTX
Data and time
PPTX
Responsive web-design through bootstrap
PDF
Linux systems - Linux Commands and Shell Scripting
Introduction to git & GitHub
Introduction about Python by JanBask Training
Course 102: Lecture 9: Input Output Internals
Version control
Html vs xhtml
Virtual machine
Daemons
Introduction to GitHub Actions
Python programming : Standard Input and Output
Data Structures in Python
Python variables and data types.pptx
Window object
Github in Action
Javascript event handler
Git & GitHub WorkShop
Data and time
Responsive web-design through bootstrap
Linux systems - Linux Commands and Shell Scripting
Ad

More from Hiroshi Miura (17)

ODP
Building production server on docker
ODP
How GNSS changes mapping?
PDF
Osm how to_mapping_2014
PDF
How to mapping_hondout
ODP
"Up" with vagrant and docker
ODP
赤羽マッピングパーティ
ODP
webdb forum_sinsai
ODP
tohoku univ alumni meeting
ODP
Japan OSS promotion Forum symposium - Neaoss wg2 activity
ODP
SotM2011 crisis mapping and sinsai.info
ODP
unihandecode: An Unicode transliteration library
ODP
Infotalk#33 "みんなでつくる震災復興支援プラットホームSinsai.infoを実現する技術とチーム運営"
PDF
日本国際地図学会 20110808 震災とジオメディア 
ODP
Sinsai.info Symposium: OSM now
PDF
Sinsai.info, CrisisMap and the next
PPT
Kof2005 Presen
PDF
Starting Osm Japan2008 Sot M2008
Building production server on docker
How GNSS changes mapping?
Osm how to_mapping_2014
How to mapping_hondout
"Up" with vagrant and docker
赤羽マッピングパーティ
webdb forum_sinsai
tohoku univ alumni meeting
Japan OSS promotion Forum symposium - Neaoss wg2 activity
SotM2011 crisis mapping and sinsai.info
unihandecode: An Unicode transliteration library
Infotalk#33 "みんなでつくる震災復興支援プラットホームSinsai.infoを実現する技術とチーム運営"
日本国際地図学会 20110808 震災とジオメディア 
Sinsai.info Symposium: OSM now
Sinsai.info, CrisisMap and the next
Kof2005 Presen
Starting Osm Japan2008 Sot M2008
Ad

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Big Data Technologies - Introduction.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
A Presentation on Artificial Intelligence
PDF
Getting Started with Data Integration: FME Form 101
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Spectroscopy.pptx food analysis technology
PDF
Electronic commerce courselecture one. Pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Programs and apps: productivity, graphics, security and other tools
Big Data Technologies - Introduction.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A Presentation on Artificial Intelligence
Getting Started with Data Integration: FME Form 101
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectroscopy.pptx food analysis technology
Electronic commerce courselecture one. Pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Intoroduction of py7zr

  • 1. Generic compression, decompression, archive library for 7zip: py7zr Hiroshi Miura https://github.com/miurahr
  • 2. Popular compression formats for data science Namep Born Algorithm and Strategy Python lzo 2005 Improved LZ77 python-lzo (**) quicklz 2006 Improved LZ77, speed python-quicklz (**) brotli 2009 Improved LZ77, Hafmann encode, 2nd context model python-brotli(**) lz4 2011 Improved LZ77, speed python-lz4 (**) snappy 2011 Improved LZ77, speed python-snappy(**) Zstarndard 2015 Improved LZ77, speed 、 entropy encode python-zstandard(**) ** Binding to C library
  • 3. Popular data compression and archiving formats Name Born Compression Algorithm Tools Python TAR 1979 None GNU tar tarfile ZIP 1989 Deflate,(bzip2,LZMA,PPMd *) PKZIP, WinZip zipfile GZIP 1992 Deflate GNU gzip, zlib gzip xz 1996 LZMA, LZMA2 XZ Utils, 7-zip lzma Bzip2 1996 RLE,BWT,MTF,huffman code, delta Bzip2, 7-zip, bz2 7zip 1999 LZMA, LZMA2, Bzip2, PPMd, Deflate 7-Zip p7zip py7zr
  • 4. Pure Python 7zip library - py7zr ● Utilize lzma support on Python core (> python 3.3) – Python 3.7 Supports LZMA, LZMA2, BCJ, Delta – No support for BCJ2, PPMd compression algorithms. ● 7-zip compression and decompression with Pure python – Supports UNIX extensions for file permission as compatible with p7zip. ● Quality – CI/CD, coverage with azure-pipelines and travis-CI – Static type checks with mypy – Documentations
  • 5. Usage import py7zr sf = py7zr.SevenZipFile("sample.7z", “r”) sf.list() sf.extractall(path=”tmp”) sf.close() $ pip install py7zr $ py7zr l sample.7z
  • 6. Inside py7zr: Class design a) archiveinfo package: hold classes to represent 7zip header structures. b) py7zr package: provide compression/decompression APIs c)compressor package: Implement compression algorithms
  • 7. Inside py7zr: design patterns ● Utilize Observer pattern ● Minimum memory foot print for compression/decompression ● Small number of file descriptor utilization. ● Support selectable decompression from archive. ● Not implement yet: progress display
  • 9. Inside py7zr ● Multi-threading compression/decompression for large scale archive file – lzma core library in Python core is not thread safe – Generate LZMADecompressor object for each threads. ● Unit tests and file extraction tests
  • 10. Copyright and license ● py7zr is distributed under GNU general public license 2.1 and later ● Copyrights – 2019 Hiroshi Miura – pylzma copyright(c) 2004-2015 by Joachim Bauch – 7-Zip copyright (c) 1999-2010 Igor Pavlov
  • 11. Active community ● Community development on github project https://github.com/miurahr/py7zr ● As usual, forks and pull requests are welcome. ● Decompression is now beta quality, compression is now alpha. ● Implementation of compression is under active development.