PyPy - is it ready for production

is it ready for production?

Mark Rees
Group CTO
Censof Holdings Berhad

pypy & me

not affiliated with pypy team
have followed it‟s development since
2004
use cpython and jython at work
used ironpython for small projects
the question:
would pypy improve performance of
some of our workloads?
i am a manager, who still is wants to be a
programmer, so i did the analysis

pypy
what is pypy?
- RPython translation toolchain, a framework for
generating dynamic programming language
implementations
- a implementation of Python in Python using the
framework
history
- first sprint 2003, EU project from 2004 – 2007
- open source project from 2007
https://bitbucket.org/pypy
- pypy 1.4 first release suitable for “production”
12/2010

pypy

want to know more about pypy
- http://pypy.org/
- david beazley pycon 2012 keynote
http://goo.gl/5PXFQ
- how the pypy jit works http://goo.gl/dKgFp
- why pypy by example http://goo.gl/vpQyJ

production ready – a definition
http://programmers.stackexchange.com/questions/61726/define-production-ready

it runs
it satisfies the project requirements
its design was well thought out
it's stable
it's maintainable
it's scalable
it's documented

it works with the python modules we use
it is as fast or faster than cpython

pypy – does it run?

of course, it runs

See http://pypy.readthedocs.org/en/latest/cpython_differences.html
for differences between PyPy and CPython

pypy – other production criteria
does it satisfy the project requirements
- yes
is it‟s design was well thought out
- I would assume so
is it stable
- yes
is it maintainable
- 7 out of 10
is it scalable
- stackless & greenlets built in
is it documented
- cpython docs for functionality, rpython toolchain 8 out
of 10

pypy – does it work with the modules we use

standard library modules supported:
__builtin__, __pypy__, _ast, _bisect, _codecs, _collections, _ffi, _hashlib,
_io, _locale, _lsprof, _md5, _minimal_curses, _multiprocessing, _random,
_rawffi, _sha, _socket, _sre, _ssl, _warnings, _weakref, _winreg, array,
binascii, bz2, cStringIO, clr, cmath, cpyext, crypt, errno, exceptions,
fcntl, gc, imp, itertools, marshal, math, mmap, operator, oracle, parser,
posix, pyexpat, select, signal, struct, symbol, sys, termios, thread, time,
token, unicodedata, zipimport, zlib

these modules are supported but written in
python:
cPickle, _csv, ctypes, datetime, dbm, _functools, grp, pwd, readline,
resource, sqlite3, syslog, tputil

many python libs are known to work, like:
ctypes, django, pyglet, sqlalchemy, PIL, sqlalchemy. See
https://bitbucket.org/pypy/compatibility/wiki/Home for a more
exhaustive list.

pypy – does it work with the modules we use
pypy c-api support is beta, worked most of
the time but failed with reportlab:
Fatal error in cpyext, CPython compatibility layer, calling
PySequence_GetItem
Either report a bug or consider not using this particular extension
<OpErrFmt object at 0x7f1e89587e88>
RPython traceback:
File "module_cpyext_api_2.c", line 51963, in PySequence_GetItem
File "module_cpyext_pyobject.c", line 1071, in
BaseCpyTypedescr_realize
File "objspace_std_objspace.c", line 3396, in
allocate_instance__W_ObjectObject
File "objspace_std_typeobject.c", line 3010, in
W_TypeObject_check_user_subclass
Segmentation fault
But this was the only compatibility issue we
had running all of our python code under
pypy and we could fallback to pure python
reportlab extensions anyway.

pypy – does it run as fast as cpython

but!

http://speed.pypy.org/

pypy django benchmark
DJANGO_TMPL = Template("""<table>
{% for row in table %}
<tr>{% for col in row %}<td>{{ col|escape }}</td>{% endfor %}</tr>
{% endfor %}
</table>
""")

def test_django(count):
table = [xrange(150) for _ in xrange(150)]
context = Context({"table": table})

# Warm up Django.
DJANGO_TMPL.render(context)
DJANGO_TMPL.render(context)

times = []
for _ in xrange(count):
t0 = time.time()
data = DJANGO_TMPL.render(context)
t1 = time.time()
times.append(t1 - t0)
return times

my csv to xml benchmark
def bench(data, output):
f = open(data, 'rb')
fn = [„age‟,….]
reader = csv.DictReader(f, fn)
writer = SAXWriter(output)
writer.start_doc()
writer.start_tag('data')
try:
for row in reader:
writer.start_tag('row')
for key in row.keys():
writer.tag(key.replace(' ', '_'), body=row[key])
writer.end_tag('row')
finally:
f.close()
writer.end_tag('data')
writer.end_doc()

my pypy benchmarks
https://bitbucket.org/hexdump42/pypy-benchmarks

average execution time (in seconds)

benchmark cpython pypy-jit pypy-jit
2.7.3 1.9 nightly
bm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x
04 faster faster

my pypy benchmarks


2.7.3 1.9 nightly
bm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x
04 faster faster
bm_csv 1.54/1.65 5.89 3.8122 x 5.78 3.5025 x
slower slower

my pypy benchmarks


2.7.3 1.9 nightly
bm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x
04 faster faster
bm_csv 1.54/1.65 5.89 3.8122 x 5.78 3.5025 x
slower slower
bm_openpyxml 1.31/1.21 3.26 2.4871 x 3.15 2.6051 x
slower slower

my pypy benchmarks


2.7.3 1.9 nightly
bm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x
04 faster faster
bm_csv 1.54/1.65 5.89 3.8122 x 5.78 3.5025 x
slower slower
bm_openpyxml 1.31/1.21 3.26 2.4871 x 3.15 2.6051 x
slower slower
bm_xhtml2pdf 1.91/1.95 3.27 1.7155 x 4.22 2.1637 x
slower slower

my pypy benchmarks

max memory use

2.7.3 1.9 nightly
bm_interp 5412/5248 12556 2.32 x 21880 4.1692 x
larger larger
bm_csv2xml 7048/7064 55180 7.8292 x 55232 7.8188 x
larger larger
bm_csv 5812/5180 52200 8.9814 x 52176 10.0726
larger x larger
bm_openpyxml 12656/ 77252 6.1040 x 80428 6.3549 x
12656 larger larger
bm_xhtml2pdf 48880/ 236792 4.8444 x 101376 2.906 x
34884 larger larger

what is the pypy jit doing?
https://bitbucket.org/pypy/jitviewer/

modified csv pypy benchmarks


2.7.3 1.9 nightly
bm_csv2xml_mod 88.25/90.02 23.65 3.7315 x 23.86 3.7728x
faster faster
bm_csv_mod 1.62/1.69 1.89 0.8571 x 1.72 0.9825 x
slower slower

is pypy ready for production

1. it runs
2. it satisfies the project requirements
3. its design was well thought out
4. it's stable
5. it's maintainable
6. it's scalable
7. it's documented
8. it works with the python modules we use
9. it is as fast or faster than cpython

some other reasons to consider pypy

cffi – foreign function interface for python
- http://cffi.readthedocs.org/
pypy version of numpy
py3k version of pypy
check out the STM/AME project

http://www.pypy.org/howtohelp.html

contact details

Mark Rees
mark at censof dot com
+Mark Rees
@hexdump42
hex-dump.blogspot.com

http://www.slideshare.net/hexdump42/pypy-is-it-ready-for-production

PyPy - is it ready for production

More Related Content

What's hot (19)

Similar to PyPy - is it ready for production (20)

More from Mark Rees (6)

Recently uploaded (20)

PyPy - is it ready for production

Editor's Notes