Big Data is frustrating

Processing
Storing
Indexing
Searching
             Photo by JavierPsilocybin flickr.com/santoposmoderno/4116782554
Parsing Big XML Docs?
use a stream reader*

  *in PHP we used XMLReader
Think About Storage




           Photo by itonys flickr.com/adstone/4549679025
Remember:
•DB size = Data + Indexes

•Indexes slow INSERTs

•Optimise your queries!
Use a dedicated search
      application
Thanks
Simon Hamp @simonhamp
Founder, Flipstorm

flipstorm.co.uk
lesslettuce.co.uk

More Related Content

PPT
PDF
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
PPTX
Building a Scalable and Modern Infrastructure at CARFAX
PPTX
Building and Improving Products with Hadoop
PPTX
Meetup Data-science OVH
PPTX
Preview - Massive Scale Content at Re:Invent 2015
PPTX
Mapping a Unified Experience Across Multiple Devices
PDF
Crafting Rich Experiences with Progressive Enhancement [WebVisions 2011]
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
Building a Scalable and Modern Infrastructure at CARFAX
Building and Improving Products with Hadoop
Meetup Data-science OVH
Preview - Massive Scale Content at Re:Invent 2015
Mapping a Unified Experience Across Multiple Devices
Crafting Rich Experiences with Progressive Enhancement [WebVisions 2011]

Viewers also liked (20)

PPTX
Webvisions 2011 - Geoloqi - Location as Invisible Interface
PPT
canada presentations
PPTX
PPSX
Happy Valentines Day Final
PDF
4900 Piazza Center Vraag 4 Visualisatie (lowres)
PDF
cosas de España
PPTX
Matthew Thomas Education Sector Portfolio 2011
PPT
презентація нова
PPTX
Johnsonville jj
PDF
Things from spain
PPT
My remote controlled car
PDF
Surface computing,towards business technology
ODP
Learning To Love Forms (Web Directions South '07)
Webvisions 2011 - Geoloqi - Location as Invisible Interface
canada presentations
Happy Valentines Day Final
4900 Piazza Center Vraag 4 Visualisatie (lowres)
cosas de España
Matthew Thomas Education Sector Portfolio 2011
презентація нова
Johnsonville jj
Things from spain
My remote controlled car
Surface computing,towards business technology
Learning To Love Forms (Web Directions South '07)
Ad

Recently uploaded (20)

PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
Modernising the Digital Integration Hub
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPT
Geologic Time for studying geology for geologist
PPTX
TEXTILE technology diploma scope and career opportunities
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Final SEM Unit 1 for mit wpu at pune .pptx
Module 1.ppt Iot fundamentals and Architecture
Enhancing plagiarism detection using data pre-processing and machine learning...
Flame analysis and combustion estimation using large language and vision assi...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Credit Without Borders: AI and Financial Inclusion in Bangladesh
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Modernising the Digital Integration Hub
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
A proposed approach for plagiarism detection in Myanmar Unicode text
Convolutional neural network based encoder-decoder for efficient real-time ob...
Geologic Time for studying geology for geologist
TEXTILE technology diploma scope and career opportunities
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
STKI Israel Market Study 2025 version august
UiPath Agentic Automation session 1: RPA to Agents
sustainability-14-14877-v2.pddhzftheheeeee
Getting started with AI Agents and Multi-Agent Systems
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Ad

Big Data is frustrating

Editor's Notes

  • #2: Every part of dealing with large quantities of data is annoying\n- Processing must be fast and accurate\n- Storing must be flexible and safe\n- Indexing must be fast and beneficial\n- Searching still needs to be lightning quick\n
  • #3: LessLettuce relies on XML feeds - some hundreds of MBs in size\nCouldn’t use SimpleXML!\nXMLReader - Look out for my article in .net Mag soon!\n
  • #4: We use MySQL - it’s still good for big data!\nChoose the right storage engine\nGet your data structure right first\nTweak your server to optimise operations\nHow will you recover GBs of data in a crash situation?\nTest, test, test!\n
  • #5: LessLettuce live DB currently has ~20million records\nThis takes up ~7GB of space\nOptimising queries is crucial\n
  • #6: We used Sphinx - easy to deploy (set up in an afternoon), talked directly to the database\nTakes about 20 minutes to do a fairly complex full index\nAll searches return in hundredths of a second\nSphinx Rocks!!!\n
  • #7: \n