SlideShare a Scribd company logo
Looking into the past -
feature extraction from
historic maps using Python,
OpenCV and PostGIS.
ESRC ADRC-S
• Administrative Data Research Centre – Scotland (ADRC-S)
• part of the Administrative Data Research Network (ADRN)
• An ESRC Data Investment
• 12 ADRC-S Work Packages
• EDINA working on WP5 - Provision of Geocoding and Georeferencing
tools
What and Why?
• Prof(s) Chris Dibben and Jamie Pearce from UoE GeoSciences
• Effects of past environmental conditions on (longitudinal) population
cohorts
• Trains – where (and which populations) did they run alongside in the past
and bring their air pollution
• Urban - did past populations live in predominantly urban or rural locales –
were these same populations experiencing urbanisation
• Industry - where were particular types of (polluting) industry located?
• Greenspace and Bluespace – e.g. Parks and Water
Historic Maps – a record of
past landscapes
• ADRC`s remit is (all) of Scotland.
• Manual capture (digitising) of features from historic maps not going to
scale given resources available.
• Chris and Jamie`s challenge to EDINA – is it possible to automagically
capture features from historic maps?
• Historic maps in Digimap historic
• For the purpose of this work we are using (higher quality) full colour scans
of historic maps provided by Chris Fleet @ NLS
• Mainly been looking at 2 map series provided by NLS
• http://maps.nls.uk/geo/explore/#zoom=15&lat=55.9757&lon=-3.1799&laye
rs=168
• http://maps.nls.uk/geo/explore/#zoom=15&lat=55.9757&lon=-3.1799&laye
rs=10
Environment
• Linux (Ubuntu)
• Python (3)
• Virtualenv – isolated Python environments
• PyCharm Python IDE (Community Edition)
• OpenCV – Computer Vision / Image Processing / Image Analysis
• PostgreSQL - Datastore
• PostGIS – Spatial query (analysis) engine
• QGIS – Desktop GIS / PostGIS data viewer
• (a bit of) ArcGIS for ArcScan (Line vectorization)
OpenCV
OpenCV (Open Source Computer Vision) is a library of programming
functions mainly aimed at real-time computer vision
Python Libraries used
• numpy - numpy (array) data structures central to all other libraries where
we are manipulating image / raster datasets via python
• cv2 - python interface to OpenCV
• Shapely – (GEOS based) package for manipulation and analysis of
planar geometric objects.
• Fiona – (F)ile (i)nput (o)utput (n)o (a)nalysis. An alternative API to OGR to
access and write vector GIS datasets e.g. Shapefiles / GeoJSON.
• Rasterio – Raster (i)nput (o)utput. Rasteio is to raster GIS datasets as
Fiona is to vector GIS datasets.
• Snaql – Keep (templated) SQL query blocks seperate from python code
and render (with context) the query block when needed.
assuming PostGIS, if you add in a map renderer like mapnik, then this lot
gives you everything needed to do geospatial data analysis (raster and
vector), data conversion, data management and map automation.
Python OpenCV Demo
• Load image
• Changing colourspaces – convert colour image to greyscale
• Threshold image – partition greyscale image into bilevel foreground
(white) and background (black) regions to simplify things.
• Finding image contours. Contour (lines) seperate foreground regions
from background regions. Having traced contours we can describe
shape/size etc of foreground regions and relationship between
regions.
• Finding patterns / classifying features
Apply similar processes to
historic maps to extract
geographic features
(1) Water features (Bluespace)
(2) Railways
(3) Urban Form / Change
#15759 – extract 'bluespace'
(1) Water features (Bluespace)
Rivers / Canals / inland water shown as
blue lines or stippled blue areas.
Find contours – each
stipple mark / line forms a
contour
Threshold to isolate blue
pixels
Contours form a
hierarchy. Parents that
hold child contours are
water regions.
Method 2
Process breaks down when water regions are
not entirely bound by blue lines or broken by
other features (bridges).
So (alternative method) find every individual
stipple and then forming groups of these gives
water regions.
Apply either of these methods of
capturing blue stippled regions
to other stippled regions e.g.
green stippled regions (parks -
greenspace)
Change - old Edinburgh quarries change to
shopping centres or from bluespace to
greenspace!
Chris@NLS provided James
Reid with 6 NLS OS 25K 1937-
61 sheets.
First a diversion - threshold
by colour seperation
In QGIS digitised polygons
covering groups of features of
interest so we can explore
values of RGB in the underlying
pixels and use to inform colour
seperation processing.
Load the training polygons and NLS 3 band
raster into PostGIS and do spatial analysis to
find pixel values in each polygon.
Calculate aggregate
min/max values of
RGB (BGR in
opencv!) across each
feature group and use
these in OpenCV
Python algorithm to
do colour seperation
on the source 25k
image. More pre/post
processing needed.
Pixels corresponding to (grey)
buildings
Pixels corresponding to (black)
important buildings (and railway
lines)
Pixels corresponding to
(orange) main roads
(2) Extracting Railways
Source 1:25,000 NLS Historic Map “black” pixels extracted after
running colour seperation process.
Isolates dashes in railway lines (but
also text/buildings)
From dashes to (railway) lines
So do contour tracing and apply
size/shape constraints to isolate the
dashes in the railway lines only.
Join up neighbouring dash
candidates to form railway lines
Complications…Process needs refined
to cope with noisier,
more complicated
regions of the map
Not helped that some
small buildings exhibit
similar size/shape
characteristics as
dashes in railway
lines.
A refinement might be
to introduce a look-
ahead constraint that
minimises change in
line direction as
candidates are
grouped since railway
lines don`t make sharp
90 degree turns.
All lines
captured
from
different
historic NLS
ca1900
Map series
Left with lines corresponding
to hatched building regions
Spatial
analysis
(3) Urban Form / Change
Current building footprints
held in OS MasterMap
Lines from historic map
selected as corresponding to
hatched building areas
overlain against OSMM
building footprints
New vs Old (Buildings)
The locale of the
Fort public housing
project.
West Bowling
Green Street &
Bowling Green
Street
Examples of
change in
Edinburgh between
ca1900 and today
All change
Discrete building areas
Dissolve
is_building = Yes / No
Overlay a 100m x 100m
sampling grid
% Building = Higher
% Building = Lower
A measure of urbaness
1. All lines pulled by
from NLS historic map
sheet. No intelligence
about what each line
represents.
Spaghetti!
2. Form groups of hatch
lines.
Criteria for group
membership is: spatial
proximity; direction
(azimuth);
lines are spatially
disjoint; lines are parallel
to one another.
3. Final set of line
groups. These
correspond to building
footprint. Other lines
from the historic map did
not meet group
membership criteria and
thus make no further
contribution to analysis.
4. Derive a pseudo
building polygon for
each group.
Could place an MBR
around them but
instead...
5. … form a Convex
Hull around the lines to
provide a polygon for
this group. For the
historic maps this is the
equivalent of the
building footprint
provided by the OS
MasterMap data.
6. Repeat the % Building
analysis for the complete
set of convex hull
polygons formed from all
groups of hatch lines.
From hatch lines to buildings
End product would be a grid describing % building (built-up) across each 100m x
100m standard grid square in ca1900. Data could be aggregated upwards e.g. to
produce a 1km x 1km grid. Using the same sampling grid could compute the same
measure for modern data (I`ve used OS MasterMap but other OS OpenData could
be used). Could then calculate + / - change between ca1900 and today / other
time periods for which historic maps available.
Output data products
Process repeated for
whole of Edinburgh
using all 19 NLS map
sheets – urban form
of Edinburgh ca1900.
Scaling up
Same 100m x 100m
grid across Edinbrugh
as a whole in ca1900

More Related Content

PDF
Luận văn: Phát triển đội ngũ giảng viên trường Đại học Y Dược Thái Nguyên
DOC
LV: Quản lý hoạt động tự đánh giá chất lượng giáo dục trường dạy nghề quân đội
PDF
ตัวอย่างแบบประเมินการอ่านและการเขียนภาษาอังกฤษระดับประถมศึกษา
PDF
Do Cosmos a Terra: Usando Python para desvendar os mistérios do Universo.
PDF
Introduction to OpenCV
PDF
Introduction to OpenCV with python (at taichung.py)
PPTX
PDF
Using openCV 3.1.0 with vs2015
Luận văn: Phát triển đội ngũ giảng viên trường Đại học Y Dược Thái Nguyên
LV: Quản lý hoạt động tự đánh giá chất lượng giáo dục trường dạy nghề quân đội
ตัวอย่างแบบประเมินการอ่านและการเขียนภาษาอังกฤษระดับประถมศึกษา
Do Cosmos a Terra: Usando Python para desvendar os mistérios do Universo.
Introduction to OpenCV
Introduction to OpenCV with python (at taichung.py)
Using openCV 3.1.0 with vs2015

Viewers also liked (15)

PDF
Using openCV 2.0 with Dev C++
PDF
Mining Smartphone Data (with Python)
PDF
OpenCV 3.0 - Latest news and the Roadmap
PDF
Geopaparazzi, history of a digital mapping kid
PDF
Face Recognition with OpenCV and scikit-learn
PDF
OpenCV Workshop
PDF
OpenCV Introduction
PPTX
Text analytics in Python and R with examples from Tobacco Control
PDF
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
ODP
Image Processing with OpenCV
PPTX
Computer Vision, Deep Learning, OpenCV
PDF
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
PPT
Automated Face Detection System
PPTX
Using openCV 3.2.0 with CodeBlocks
PPTX
Install, Compile, Setup, Setting OpenCV 3.2, Visual C++ 2015, Win 64bit,
Using openCV 2.0 with Dev C++
Mining Smartphone Data (with Python)
OpenCV 3.0 - Latest news and the Roadmap
Geopaparazzi, history of a digital mapping kid
Face Recognition with OpenCV and scikit-learn
OpenCV Workshop
OpenCV Introduction
Text analytics in Python and R with examples from Tobacco Control
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
Image Processing with OpenCV
Computer Vision, Deep Learning, OpenCV
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Automated Face Detection System
Using openCV 3.2.0 with CodeBlocks
Install, Compile, Setup, Setting OpenCV 3.2, Visual C++ 2015, Win 64bit,
Ad

Similar to Looking into the past - feature extraction from historic maps using Python, OpenCV and PostGIS (20)

PPT
Visualising Urban Geographies - Stuart Nichol
PDF
GIS for Recorders
PPTX
McDonough "Living with Machines"
PDF
Qgis tutorial compiled
PPT
Geoservices Activities at EDINA
PDF
Using python to analyze spatial data
PDF
Analysing OpenStreetMap Data with QGIS
PDF
OSM and QGIS
PPT
GIS_Whirlwind_Tour.ppt
PPT
GIS_Whirlwind_Tour.ppt
PPT
GIS_Whirlwind_Tour.ppt
PPT
GIS_Whirlwind_Tour.ppt
PDF
Building A Spatial Database In Postgresql (Ppt).pdf
PPT
Introduction_to_QGIS_Revision, read before
PDF
Pycon 2012 Taiwan
PDF
那些年 Python 攻佔了 GIS / The Year Python Takes Over GIS
PPT
GIS.ppt excellent exposure in only 25 slides
PDF
GIS FINAL.pdf
PDF
UE4 Landscape
PDF
State of the Art Web Mapping with Open Source
Visualising Urban Geographies - Stuart Nichol
GIS for Recorders
McDonough "Living with Machines"
Qgis tutorial compiled
Geoservices Activities at EDINA
Using python to analyze spatial data
Analysing OpenStreetMap Data with QGIS
OSM and QGIS
GIS_Whirlwind_Tour.ppt
GIS_Whirlwind_Tour.ppt
GIS_Whirlwind_Tour.ppt
GIS_Whirlwind_Tour.ppt
Building A Spatial Database In Postgresql (Ppt).pdf
Introduction_to_QGIS_Revision, read before
Pycon 2012 Taiwan
那些年 Python 攻佔了 GIS / The Year Python Takes Over GIS
GIS.ppt excellent exposure in only 25 slides
GIS FINAL.pdf
UE4 Landscape
State of the Art Web Mapping with Open Source
Ad

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Lecture1 pattern recognition............
PDF
Mega Projects Data Mega Projects Data
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Global journeys: estimating international migration
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Introduction to Business Data Analytics.
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Knowledge Engineering Part 1
Lecture1 pattern recognition............
Mega Projects Data Mega Projects Data
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Data_Analytics_and_PowerBI_Presentation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction-to-Cloud-ComputingFinal.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
.pdf is not working space design for the following data for the following dat...
Global journeys: estimating international migration
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Business Data Analytics.
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
STUDY DESIGN details- Lt Col Maksud (21).pptx

Looking into the past - feature extraction from historic maps using Python, OpenCV and PostGIS

  • 1. Looking into the past - feature extraction from historic maps using Python, OpenCV and PostGIS.
  • 2. ESRC ADRC-S • Administrative Data Research Centre – Scotland (ADRC-S) • part of the Administrative Data Research Network (ADRN) • An ESRC Data Investment • 12 ADRC-S Work Packages • EDINA working on WP5 - Provision of Geocoding and Georeferencing tools
  • 3. What and Why? • Prof(s) Chris Dibben and Jamie Pearce from UoE GeoSciences • Effects of past environmental conditions on (longitudinal) population cohorts • Trains – where (and which populations) did they run alongside in the past and bring their air pollution • Urban - did past populations live in predominantly urban or rural locales – were these same populations experiencing urbanisation • Industry - where were particular types of (polluting) industry located? • Greenspace and Bluespace – e.g. Parks and Water
  • 4. Historic Maps – a record of past landscapes • ADRC`s remit is (all) of Scotland. • Manual capture (digitising) of features from historic maps not going to scale given resources available. • Chris and Jamie`s challenge to EDINA – is it possible to automagically capture features from historic maps? • Historic maps in Digimap historic • For the purpose of this work we are using (higher quality) full colour scans of historic maps provided by Chris Fleet @ NLS • Mainly been looking at 2 map series provided by NLS • http://maps.nls.uk/geo/explore/#zoom=15&lat=55.9757&lon=-3.1799&laye rs=168 • http://maps.nls.uk/geo/explore/#zoom=15&lat=55.9757&lon=-3.1799&laye rs=10
  • 5. Environment • Linux (Ubuntu) • Python (3) • Virtualenv – isolated Python environments • PyCharm Python IDE (Community Edition) • OpenCV – Computer Vision / Image Processing / Image Analysis • PostgreSQL - Datastore • PostGIS – Spatial query (analysis) engine • QGIS – Desktop GIS / PostGIS data viewer • (a bit of) ArcGIS for ArcScan (Line vectorization)
  • 6. OpenCV OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision
  • 7. Python Libraries used • numpy - numpy (array) data structures central to all other libraries where we are manipulating image / raster datasets via python • cv2 - python interface to OpenCV • Shapely – (GEOS based) package for manipulation and analysis of planar geometric objects. • Fiona – (F)ile (i)nput (o)utput (n)o (a)nalysis. An alternative API to OGR to access and write vector GIS datasets e.g. Shapefiles / GeoJSON. • Rasterio – Raster (i)nput (o)utput. Rasteio is to raster GIS datasets as Fiona is to vector GIS datasets. • Snaql – Keep (templated) SQL query blocks seperate from python code and render (with context) the query block when needed. assuming PostGIS, if you add in a map renderer like mapnik, then this lot gives you everything needed to do geospatial data analysis (raster and vector), data conversion, data management and map automation.
  • 8. Python OpenCV Demo • Load image • Changing colourspaces – convert colour image to greyscale • Threshold image – partition greyscale image into bilevel foreground (white) and background (black) regions to simplify things. • Finding image contours. Contour (lines) seperate foreground regions from background regions. Having traced contours we can describe shape/size etc of foreground regions and relationship between regions. • Finding patterns / classifying features
  • 9. Apply similar processes to historic maps to extract geographic features (1) Water features (Bluespace) (2) Railways (3) Urban Form / Change
  • 10. #15759 – extract 'bluespace' (1) Water features (Bluespace) Rivers / Canals / inland water shown as blue lines or stippled blue areas. Find contours – each stipple mark / line forms a contour Threshold to isolate blue pixels Contours form a hierarchy. Parents that hold child contours are water regions.
  • 11. Method 2 Process breaks down when water regions are not entirely bound by blue lines or broken by other features (bridges). So (alternative method) find every individual stipple and then forming groups of these gives water regions. Apply either of these methods of capturing blue stippled regions to other stippled regions e.g. green stippled regions (parks - greenspace)
  • 12. Change - old Edinburgh quarries change to shopping centres or from bluespace to greenspace!
  • 13. Chris@NLS provided James Reid with 6 NLS OS 25K 1937- 61 sheets. First a diversion - threshold by colour seperation
  • 14. In QGIS digitised polygons covering groups of features of interest so we can explore values of RGB in the underlying pixels and use to inform colour seperation processing.
  • 15. Load the training polygons and NLS 3 band raster into PostGIS and do spatial analysis to find pixel values in each polygon. Calculate aggregate min/max values of RGB (BGR in opencv!) across each feature group and use these in OpenCV Python algorithm to do colour seperation on the source 25k image. More pre/post processing needed.
  • 16. Pixels corresponding to (grey) buildings
  • 17. Pixels corresponding to (black) important buildings (and railway lines)
  • 19. (2) Extracting Railways Source 1:25,000 NLS Historic Map “black” pixels extracted after running colour seperation process. Isolates dashes in railway lines (but also text/buildings)
  • 20. From dashes to (railway) lines So do contour tracing and apply size/shape constraints to isolate the dashes in the railway lines only. Join up neighbouring dash candidates to form railway lines
  • 21. Complications…Process needs refined to cope with noisier, more complicated regions of the map Not helped that some small buildings exhibit similar size/shape characteristics as dashes in railway lines. A refinement might be to introduce a look- ahead constraint that minimises change in line direction as candidates are grouped since railway lines don`t make sharp 90 degree turns.
  • 22. All lines captured from different historic NLS ca1900 Map series Left with lines corresponding to hatched building regions Spatial analysis (3) Urban Form / Change
  • 23. Current building footprints held in OS MasterMap Lines from historic map selected as corresponding to hatched building areas overlain against OSMM building footprints New vs Old (Buildings)
  • 24. The locale of the Fort public housing project. West Bowling Green Street & Bowling Green Street Examples of change in Edinburgh between ca1900 and today All change
  • 25. Discrete building areas Dissolve is_building = Yes / No Overlay a 100m x 100m sampling grid % Building = Higher % Building = Lower A measure of urbaness
  • 26. 1. All lines pulled by from NLS historic map sheet. No intelligence about what each line represents. Spaghetti! 2. Form groups of hatch lines. Criteria for group membership is: spatial proximity; direction (azimuth); lines are spatially disjoint; lines are parallel to one another. 3. Final set of line groups. These correspond to building footprint. Other lines from the historic map did not meet group membership criteria and thus make no further contribution to analysis. 4. Derive a pseudo building polygon for each group. Could place an MBR around them but instead... 5. … form a Convex Hull around the lines to provide a polygon for this group. For the historic maps this is the equivalent of the building footprint provided by the OS MasterMap data. 6. Repeat the % Building analysis for the complete set of convex hull polygons formed from all groups of hatch lines. From hatch lines to buildings
  • 27. End product would be a grid describing % building (built-up) across each 100m x 100m standard grid square in ca1900. Data could be aggregated upwards e.g. to produce a 1km x 1km grid. Using the same sampling grid could compute the same measure for modern data (I`ve used OS MasterMap but other OS OpenData could be used). Could then calculate + / - change between ca1900 and today / other time periods for which historic maps available. Output data products
  • 28. Process repeated for whole of Edinburgh using all 19 NLS map sheets – urban form of Edinburgh ca1900. Scaling up
  • 29. Same 100m x 100m grid across Edinbrugh as a whole in ca1900