SlideShare a Scribd company logo
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand SOFA Statistics Developing and releasing a Python open source application Grant Paton-Simpson sofastatistics.com
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Overview Introducing the SOFA Statistics application
How SOFA works with SQL databases
Using HTML for output (via wxWebKit)
Experience with existing statistics modules
wxPython GUI toolkit (esp the grid widget)
The release process (esp packaging)
In 30 minutes flat out!
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Introducing SOFA Statistics SOFA stands for  S tatistics  O pen  F or  A ll
A cross platform desktop application for: Making report tables from data (database, spreadsheet, directly entered)
Producing charts
Running key statistical tests The slogan is “ease of use, learn as you go, and beautiful output”
May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
Currently version 0.8.10 and pushing on towards a  version 1.0 release
SOFA Architecture ... Linking not importing SQLite MySQL MS Access PostgreSQL SQL Server SOFA Scripts from GUI or by hand  (available for automation) HTML output (spreadsheet- friendly)
SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Working with SQL Not using an abstraction layer.  Wrote my own code using MySQLdb module etc  Already experienced with SQL
Want control over the information I get about data configuration (e.g character set)
Want control over how I interact with the databases for performance reasons Adding other databases, e.g. Oracle, is a process of copying an existing module and changing the implementation
SQL databases do things very differently SQLite with data type integrity (like dynamic typing)
PostgreSQL and SUM(expression)

More Related Content

PPTX
isolation and screening of microbes from different environment.pptx
PDF
IT6712 lab manual
PPT
Analog signals
PPTX
Synchronous and Asynchronous Transmission
PDF
Data Communication & Computer network: Shanon fano coding
PPTX
Transmission media
PPTX
The design philosophy of DARPA internet protocols
PDF
An overview of Hidden Markov Models (HMM)
isolation and screening of microbes from different environment.pptx
IT6712 lab manual
Analog signals
Synchronous and Asynchronous Transmission
Data Communication & Computer network: Shanon fano coding
Transmission media
The design philosophy of DARPA internet protocols
An overview of Hidden Markov Models (HMM)

What's hot (20)

PDF
TIC-TAC-TOE IN C
PDF
Transaction TCP
PPT
Ch3 5 v1
PPTX
Hidden Markov Model (HMM)
PPTX
Divide And Conquer.pptx
PPT
Hidden markov model ppt
PPT
Lecture 12
PDF
IP Datagram Structure
PDF
Distributed Deep Q-Learning
PPTX
Deciability (automata presentation)
PPTX
CNS UNIT-II.pptx
PPTX
ML_ Unit_1_PART_A
PPSX
Physical layer ppt
PPTX
Daa unit 2
PPTX
Transmission Mode.pptx
PPTX
PPTX
Lecture 17 Iterative Deepening a star algorithm
PPSX
Transmission modes
PPT
Radial Basis Function and Splines.
PPSX
TCP-IP Reference Model
TIC-TAC-TOE IN C
Transaction TCP
Ch3 5 v1
Hidden Markov Model (HMM)
Divide And Conquer.pptx
Hidden markov model ppt
Lecture 12
IP Datagram Structure
Distributed Deep Q-Learning
Deciability (automata presentation)
CNS UNIT-II.pptx
ML_ Unit_1_PART_A
Physical layer ppt
Daa unit 2
Transmission Mode.pptx
Lecture 17 Iterative Deepening a star algorithm
Transmission modes
Radial Basis Function and Splines.
TCP-IP Reference Model
Ad

Similar to Developing and releasing SOFA Statistics (20)

ODP
Hacking OOo 2.0
PPTX
Going open source with small teams
PPT
BP204 Integration of OpenOffice.org and IBM Lotus Notes and Domino
PDF
Season 7 Episode 1 - Tools for Data Scientists
PPT
SQL Server 2008 Integration Services
PDF
Extending DevOps to Big Data Applications with Kubernetes
PPT
Implementation of a SaaS based simulation platform using open standards and o...
PPSX
Intro to Talend Open Studio for Data Integration
PDF
How do we do it
PDF
The Ring programming language version 1.8 book - Part 95 of 202
PDF
Data Science - Part II - Working with R & R studio
PPTX
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
PPTX
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
PPT
Parallel Extentions to the .NET Framework
PDF
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
PPT
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
PPT
UnConference for Georgia Southern Computer Science March 31, 2015
PDF
Metaflow: The ML Infrastructure at Netflix
PPT
pythonOCC PDE2009 presentation
Hacking OOo 2.0
Going open source with small teams
BP204 Integration of OpenOffice.org and IBM Lotus Notes and Domino
Season 7 Episode 1 - Tools for Data Scientists
SQL Server 2008 Integration Services
Extending DevOps to Big Data Applications with Kubernetes
Implementation of a SaaS based simulation platform using open standards and o...
Intro to Talend Open Studio for Data Integration
How do we do it
The Ring programming language version 1.8 book - Part 95 of 202
Data Science - Part II - Working with R & R studio
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Parallel Extentions to the .NET Framework
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
UnConference for Georgia Southern Computer Science March 31, 2015
Metaflow: The ML Infrastructure at Netflix
pythonOCC PDE2009 presentation
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
project resource management chapter-09.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
A Presentation on Artificial Intelligence
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Encapsulation theory and applications.pdf
Group 1 Presentation -Planning and Decision Making .pptx
NewMind AI Weekly Chronicles - August'25-Week II
Univ-Connecticut-ChatGPT-Presentaion.pdf
Unlocking AI with Model Context Protocol (MCP)
Tartificialntelligence_presentation.pptx
Chapter 5: Probability Theory and Statistics
DP Operators-handbook-extract for the Mautical Institute
project resource management chapter-09.pdf
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Accuracy of neural networks in brain wave diagnosis of schizophrenia
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mushroom cultivation and it's methods.pdf
WOOl fibre morphology and structure.pdf for textiles
A Presentation on Artificial Intelligence
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
1 - Historical Antecedents, Social Consideration.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...

Developing and releasing SOFA Statistics

  • 1. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand SOFA Statistics Developing and releasing a Python open source application Grant Paton-Simpson sofastatistics.com
  • 2. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Overview Introducing the SOFA Statistics application
  • 3. How SOFA works with SQL databases
  • 4. Using HTML for output (via wxWebKit)
  • 5. Experience with existing statistics modules
  • 6. wxPython GUI toolkit (esp the grid widget)
  • 7. The release process (esp packaging)
  • 8. In 30 minutes flat out!
  • 9. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Introducing SOFA Statistics SOFA stands for S tatistics O pen F or A ll
  • 10. A cross platform desktop application for: Making report tables from data (database, spreadsheet, directly entered)
  • 12. Running key statistical tests The slogan is “ease of use, learn as you go, and beautiful output”
  • 13. May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
  • 14. Currently version 0.8.10 and pushing on towards a version 1.0 release
  • 15. SOFA Architecture ... Linking not importing SQLite MySQL MS Access PostgreSQL SQL Server SOFA Scripts from GUI or by hand (available for automation) HTML output (spreadsheet- friendly)
  • 16. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Working with SQL Not using an abstraction layer. Wrote my own code using MySQLdb module etc Already experienced with SQL
  • 17. Want control over the information I get about data configuration (e.g character set)
  • 18. Want control over how I interact with the databases for performance reasons Adding other databases, e.g. Oracle, is a process of copying an existing module and changing the implementation
  • 19. SQL databases do things very differently SQLite with data type integrity (like dynamic typing)
  • 21. HTML Report Table Output Tree-based for each dimension (rows, cols)
  • 22. Created an artificial limit of 5000 cells
  • 23. Scales linearly and Python not the bottleneck
  • 25. Statistics Modules Looked at using existing libraries but ended up using a modified subset of their code
  • 26. Was not my preferred approach. Benefits to plugging in a module: Saving time and effort
  • 27. Less risk (the results of statistical algorithms can wildly diverge because of small floating point errors compounding and multiplying)
  • 28. Any issues found and fixed can help everyone Reasons for creating own code (often based on existing) Standard code didn't return results separated from formatting
  • 29. No option of using decimal instead of floating point maths
  • 30. Half-baked code in some places
  • 31. Keeping the installer file size down But I do use existing libraries to test my code against (using nosetests) SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand
  • 32. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Demonstration of key functionality Background for discussion of the GUI toolkit and other topics
  • 33. NB still lots more functionality to be added
  • 34. Switch over to Jaunty
  • 35. Interactive visualisations using MatPlotLib (and wxWebKit) e.g. showing how a t-test works (ideas from Statistics Without Tears)
  • 36. e.g. impact of altering your sample size Output charting using Raphael JS (SVG & JS)
  • 37. Mac OS X package, more flexible packaging
  • 38. Add ability to import from Calc and SPSS
  • 39. Other databases e.g. Oracle, DB2, Interbase
  • 40. Increase test coverage SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Plans
  • 41. More languages e.g. French, Spanish, German
  • 43. Overall, not trying to compete with R (or RK Ward) The slogan is “ease of use, learn as you go, and beautiful output”
  • 44. May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
  • 45. Focus on making most common needs easy to satisfy
  • 46. Plugin extensions for rest SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Plans cont ...
  • 47. wxPython GUI Toolkit Cross-platform and native widgets Ubuntu (Dust theme) Windows XP
  • 48. wxPython in Action – Great book
  • 49. Mailing list (with Robin Dunn a regular contributor)
  • 50. Lots of online documentation (but googling and integration of different ideas often required)
  • 51. There is a GUI for making GUIs but I prefer handcoding Clean
  • 52. Can reuse code across different forms
  • 53. Can delegate parts of the GUI e.g. to database plugin modules Lots of sophisticated, configurable widgets Was able to make a data entry table work like I thought it should e.g. new row has column label of … , specific behaviour when tabbing Focus on grid control SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Documentation & Community
  • 54. With flexibility and power comes some complexity to handle
  • 55. Only 1100 lines of code to make the data grid you saw in the demonstration (inc validation, ability to add new rows and edit values etc)
  • 56. May be sensible to have more lines of documentation than code in some modules
  • 57. Resolving issues can take you to the edge of what is known/ documented Had data entry working like clockwork in Ubuntu
  • 58. Found out Windows intercepted Tabs and Returns before they could be exposed and reacted to
  • 59. But there was a solution SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand wx.Grid
  • 60. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Example of wx.Grid code self.frame.Bind(wx.grid.EVT_GRID_EDITOR_CREATED, self.OnGridEditorCreated) … def OnGridEditorCreated(self, event): """ Need to bind KeyDown to the control itself e.g. a choice control. wx.WANTS_CHARS makes it work. """ control = event.GetControl() control.WindowStyle |= wx.WANTS_CHARS control.Bind(wx.EVT_KEY_DOWN, self.OnGridKeyDown) event.Skip() … def OnGridKeyDown(self, event): keycode = event.GetKeyCode() if keycode in (wx.WXK_TAB, wx.WXK_RETURN): etc The user clicks on a cell to edit a value. We bind to that event. Now we can grab the control ... … and respond to its key down event Now we're away again :-)
  • 61. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Custom Controls
  • 62. Option of label display e.g. “Male” not 1
  • 63. Conditional formatting e.g. all values > 1000 red
  • 64. Choice of toolkit very important Can it support what you want to do, or will you hit a wall?
  • 65. If I wanted to display sparklines or pie charts as cells in a table … could I? Hard to know whether a good choice until already committed SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Extending the grid further
  • 66. Lots of steps to get 100% right. My steps are: Do preparatory clean up (debugging off, demo database tidy).
  • 67. Make sure I have translations for any new strings I've added.
  • 68. Make and test the new deb and Windows packages. I use VirtualBox to give me identical install environments each time.
  • 69. Add the new files to Sourceforge (I wanted to consolidate downloads to help me measure usage).
  • 70. Add a new release to Launchpad and Freshmeat complete with updated release notes and change log (used Bazaar to push to Launchpad so can browse my commit comments).
  • 71. Make announcements in both Launchpad and Freshmeat.
  • 72. Update the project homepage to account for the new download location, new features.
  • 73. Add a blog item to the project site.
  • 74. Update release version and release date on Wikipedia.
  • 75. Revisit any important threads commenting on open source statistics packages. SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Release process
  • 76. Initially very daunting – where do you start?
  • 77. Found ShowMeDo video by Austrian developer Horst Jens His example was a Python project so very similar requirements
  • 78. Ended up with very detailed step-by-step guidelines for packaging SOFA Statistics NB installing application for all users, not a given user Files are put into /usr/share/pyshared/sofa/...
  • 79. Any files needed by an individual user are transferred during first use of application /home/username/sofa/... SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Making Debian Package (for Ubuntu)
  • 80. Nullsoft Scriptable Install System. Free. Used by Firefox, OpenOffice etc
  • 81. Weird language – cross between PHP and assembler
  • 82. Plenty of documentation etc but best to start and then extend
  • 83. Issue – file size. Including mysqldb, numpy, wxpython, sqlite, python
  • 84. Put program in Program Files and user files in Documents and Settings\username\sofa\... SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand NSIS Windows Installer
  • 85. Running an open source project can be very satisfying
  • 86. Lots of new learning
  • 88. Lots to do. Not just glamour coding - someone has to “take out the trash”
  • 89. Phenomenal resources available in open source world – bazaar, loggerhead, nosetests, etc
  • 90. Hands up if ever considered it (or doing it) SOFA Paton-Simpson & Associates Ltd Auckland, New Zealand Final Thoughts