SlideShare a Scribd company logo
Polina Lemenkova
PRESENTATION
PLACE
DATE PRESENTER
10.XII.2021
Prof. Dr. Olivier DEBEIR (ULB) and Dr. Thomas LECOCQ (Royal Observatory of Belgium,
Department of Seismology and Gravimetry, co-promoteur)
Université Libre de Bruxelles, École polytechnique de Bruxelles (Brussels
Faculty of Engineering), Laboratory of Image Synthesis and Analysis (LISA).
1
Vectorising analog
seismograms by
techniques of machine
learning for automated
discriminating of seismic
signal traces
SUPERVISORS
Part 1.
Introduction.
Key Facts on Seismograms.
Project Objectives and Goals.
Data and Instruments.
Multi-Disciplinary Approaches.
2
• Study object => old historical scanned
seismograms in TIFF format from the archives of
Royal Observatory of Belgium (ROB),
Department of Seismology & Gravimetry.
• Study area: Uccle station (see map).
• Study problem => to digitise large archive of the
old paper-based seismograms from ROB
quickly, accurately and automatically.
3
Research Object and Problem
Data and Instrument
There are various types of seismometers used in
geophysics. In this study we used archived
seismograms recorded in 1954 by the Galitzine
seismometer in Uccle station.
Currently dataset included a collection of 145 images
from 1 January 1954 to 12 March 1954
The period will be gradually enlarged as soon as other
seismograms are scanned to cover 70 last years.
Now images are monochrome (B/W). Other images
might be scanned in colour.
Some of the images are well preserved, some have
distortions and defects visible on the aged paper
4
Instrument used for data capture in 1954:
Horizontal Galitzine seismometer located in UCC.
Image source: courtesy of ROB. Photo: Raphaël S. M. De Plaen
Research Questions
5
6
PROBLEMS ARE CAUSED BY TECHNIQUES OF OLD SEISMOGRAM RECORDING + TIME (SPOTS, BLURS, BROKEN PAPER, ETC.)
Examples of the raw data: paper-based seismograms
Empty records between the lines of seismic traces with enlarged
fragment of seismogram. Here: UCC19540106Gal_N_0811.TIFF
Partially spotted image caused by storage, with enlarged fragment
of seismogram. Here: UCC19540107Gal_N_0815.TIFF
Continuous noise dark background with blurred traces => lack of
contrast for image recognition. Here: UCC19540108Gal_N_0815.TIFF
Overlapped traces => problems for recognition of trace direction
during vectorising. Here: UCC19540112Gal_E_0750.TIFF
•Manual digitising cannot provide
accurate and rapid data processing
for developing digitised big dataset of
archived seismograms
•Seismic data cannot be processed
manually and require automatization
and programming approaches
•We need to process big archives of
seismic data from ROB effectively and
quickly but accurately and precisely
•We need to analyse data with
minimised human labour to derive
information on earthquakes and
ground motion
7
Actuality, Importance and Research Tasks
Text
Example of the digitised seismograms using DigitSeis
So far there are no existing integrated studies of digitising
seismograms in big data volumes by ML methods. Only
selected software exist (e.g. DigitSeis, SKATE, Teseo)
8
Interdisciplinary Nature of Project
• Complexity of geophysical data processing
requires integrated multi-disciplinary
approaches
• Applying ML to digitising seismograms brings
new possibilities and bene
fi
ts in seismology.
• Opportunities of ML =>> accurate and rapid
digitising of the scanned images, rapid
processing of historical seismograms,
improved techniques of automated recognition
of signals and data interpreting. =>>
• We need to handle seismic data by ML
techniques and advanced software.
• Therefore, our project presents a multi-
disciplinary approach of ML applications to
seismic data processing
9
Project Motivation, Strengths and Challenges
Old scanned raster seismogram (TIFF
fi
le) Fragment of the vectorised output (DigitSeis)
10
Various Approaches in One Study: Overlapping Disciplines
Our project presents an interdisciplinary research
combining overlapping scienti
fi
c clusters and
engineering disciplines (image processing,
geophysics, ML and data science).
A multi-disciplinary project integrates 3 major
scienti
fi
c clusters and several disciplines as sub-
sections for vectorising seismograms:
1. Image Processing, Pattern Recognition,
Computer Science, Programming, ML
2. Earth Observation data (ROB, Uccle archive),
Geophysics and Seismology, Geology,
Earthquake Engineering
3. Data Science, Data Analysis, Signal Processing
Algorithms of Digitising & Vectorising
11
Goals and Objectives of my PhD Project
Activities Towards Achieving Project Goals
12
Part 2.
Application of DigitSeis Software for
Vectorising Seismograms.
13
14
Work
fl
ow of DigitSeis Software for Vectorising
15
Examples of marking time gaps (minutes/hours) on seismograms in Cytomine
Seismogram processed by DigitSeis
Examples of the annotation classes on the raw data: scanned analog seismograms from the Uccle station.
Fragment of the resulting digitised output
(enlarged) showing seismic traces (horizontal
curve lines) and 1-minute time gaps (small
vertical dashed lines)
16
Examples of the identi
fi
ed time gaps on the raw TIFF images
• Enlarged fragment of image
• Time gaps indicating minutes
breaking the trace line
• Zoomed segment separating the
trace line between each other (tiny
white gaps breaking traces)
Original scanned seismogram
(UCC19540116Gal_E_0820.tif)
17
Identifying time gaps on seismogram using DigitSeis
• Identifying time marks on
seismograms by measuring time
gap between records. Here:
UCC19540119Gal_N_0825.tif
Indicating time marks on seismograms
as -22 and preparing image for
classi
fi
cation
18
Identifying noise and annotations on seismogram using DigitSeis
• Results of the classi
fi
ed seismogram
with shown identi
fi
ed object categories.
• Traces are vector white lines while
noise is red-coloured objects,
automatically recognised (here:
handwritten annotations)
Small region analysis used for
de
fi
ning a smaller area of interest for
closer examination of a border
region of the seismogram
19
Digitised segments of the trace lines in DigitSeis
• Results of the classi
fi
ed image with shown
yellow segments of the identi
fi
ed trace
(enlarged fragment).
• Here: example of the
fi
le
UCC19540109Gal_E_0812.tif
• Classi
fi
ed seismogram with traces
saved in binary format 0-1.
• Here: example of
fi
le
UCC19540109Gal_E_0812.tif
(January 9, 1954.)
20
Digitised traces after classi
fi
cation in DigitSeis
• Some time gaps (upper left part of the
image) were not identi
fi
ed and not
recognised automatically between the trace
and dark background.
• In these cases, gaps required manual
correction to identify time intervals.
• Enlarged view of the automatically
recognised digitised traces
displayed by lines of various colours,
• Zero-lines for each trace are
visualised as cyan-coloured dashed
lines, numbered from top to bottom.
• Vertical yellow dashes are time gaps
21
Identi
fi
ed traces for selective correction and re-digitising using Correct Trace mode
• Identi
fi
ed wrong vector direction of line
crossing individual traces
• Detected misclassi
fi
cations caused
erroneous digitising.
• The gaps on the zero-lines (small yellow
boxes) show the gaps that existed in
the old paper in the original image itself.
22
Identi
fi
ed traces for selective correction and re-digitising using Correct Trace mode
• Overlap of line traces unrecognised during
digitising: one segment of trace went steeply
downwards and merged with another trace
• Enlarged view of the manually corrected
entangled traces. Correcting misclassi
fi
ed
traces with wrong direction based on
colour and geometric pixel’s
characteristics.
23
Identi
fi
ed traces for selective correction and re-digitising using Correct Trace mode
• Merging the trace initially broken into the
three separate parts (three small yellow
boxes)
• Reclassi
fi
cation of the selected
segment and digitising the centroid of
the trace line (purple-coloured).
Correcting trace for the selected
segments
24
Seismogram image with adjusted timing. Here: UCC19540311Gal_E_0727.mat
• Timing setup using time display
increment
• Yellow vertical small dash lines -
minute marks
• Time markers at 1-minute intervals on
each 30-minute trace.
25
Example of the digitised image with minute time gaps
Here: fragment of UCC19540311Gal_E_0727.mat)
26
Validating Results of MATLAB File in Python: Post-Processing
Controlling digitising results using Python (Matplotlib library).
Blue dots shown the starting position of the hours segments.
Green dots show the minute marks.
Red dots show the noise and edge dots.
Correctly identi
fi
ed time gaps controlled by Python’s Matplotlib
Quality control for time gaps: missed marks in unrecognised segments.
27
Statistical Plotting of Data Frequency in .mat File processed by Python
28
Research Approach of DigitSeis:
Major Steps of Seismic Data Processing
•Machine Learning (ML) and Deep Learning (DL) in vectorising
analog seismograms
• ML & DL: Automatic and intelligent data analysis: detecting
trace lines using threshold parameters
• Image processing: segmentation, classi
fi
cation of
seismograms (separating lines from noise)
• Data visualisation and plotting
• Data analysis and interpretation
•Advanced methods => solve problem of ef
fi
cient processing of
big massifs of old scanned
fi
les (TIFFs) for geophysical
modelling and data interpretation for seismology research
•Developing new advanced ML algorithms to digitise
seismograms and convert them in vector format automatically
29
Methodology of Project
Part 3.
Using Cytomine Workspace for
Storing, Viewing and Analysing Data.
30
31
Why using Cytomine for Processing Seismostorm Project ?
32
Cytomine for data storage, sharing and analysis
View of the Seismostorm project and
fi
le browsing system
Content of
fi
les in the Seismostorm project in Cytomine
• The workspace containing seismic dataset is
shared by users (collaborators of
Seismostorm)
• Navigating in Cytomine =>> paths and
hierarchical structure of the project
Cytomine is an image analysis workspace to contain, organise, visualise, annotate and analyse images.
• Data were placed on the Cytomine environment
(Cytomine), developed by the ULiège team.
• We uploaded our TIFF images into our project.
• Originally designed as a tool for biomedical image
processing, Cytomine is adopted in this study for
geophysical data processing using seismograms.
• The dataset contains 145
fi
les recorded in 1954
by Galitzine seismometer.
33
Creating ontologies in Cytomine for objects recognition
View of the Seismostorm project and
fi
le browsing system
Hour ticks, minute ticks and various
categories detected as object
classes on the images
Examples of the detected and annotated object classes on the scanned seismograms
• Ontologies generated in Seismostorm
project in Cytomine enable to class
shapes for automated recognition
• Segments, start hours ticks and
fl
ares
detected as object classes on the
scanned images
34
Examples of detecting cases in seismograms in Cytomine
Hour ticks on the seismograms recorded by the
seismometer drum
Examples of the detected and annotated object classes on the scanned seismograms
Segments separated as fragments on
the trace lines
35
Examples of marking time gaps (minutes/hours) on seismograms in Cytomine
Manual ticks for the start hours on the
partially spotted image
Examples of the annotation classes on the raw data: scanned analog seismograms from the Uccle station.
Manual hour marks for handwritten
annotations on the old scanned image
36
Examples of marking time gaps (minutes/hours) on seismograms in Cytomine
Flares detected on the old scanned
raster images of the analog
seismograms
Examples of the annotation classes on the raw data: scanned analog seismograms from the Uccle station.
Minute marks detected, recognised and
classi
fi
ed using ‘ontologies’ of Cytomine
on the TIFF
fi
les
Part 4.
Using Python for Automatic Data
Processing
37
Why Python in Vectorising Seismograms?
38
ML for Vectorising Seismograms: a Work
fl
ow in Python
39
The work
fl
ow for digitising seismograms
in Python includes several steps:
• De
fi
ning Region of Interest (ROI)
• Selecting threshold parameters (radius of
pixels, percentage of contrast)
• Sampling several approaches with varied
parameters
• Processing full ROI after testing
parameters and selecting the best and
optimal parameters (e.g. pixel size 30,
radius 85%)
• Vectorising (executing Python script)
• Exporting the results to the HDFS format
40
Python-based digitising of raster image (1)
Automated vectorising of seismograms was
performed using several work steps.
First, the low-resolution images were grabbed by
Python script from the Cytomine and used in script.
Work
fl
ow for vectorising in Python, Matplotlib library (slide 1/10)
Enlarged fragment of the vectorised
segments of the trace lines
41
Text
Work
fl
ow for vectorising in Python, Matplotlib library (slide 2/10)
Python-based digitising of raster image (2)
Second, the hour gaps have been detected using the indication of
the repeatability of gaps (double gaps, close located next to the
fi
rst minute of this hour).
Above: view of the seismogram with indicated hour gaps.
Right: enlarged fragment.
42
Text
Third, the line with double vectorisation (overlapping
time periods) were processed.
Work
fl
ow for vectorising in Python, Matplotlib library (slide 3/10)
Python-based digitising of raster image (3)
43
Work
fl
ow for vectorising in Python, Matplotlib library (slide 4/10)
Python-based digitising of raster image (4)
Left: Example of the digitised traces in Python.
Above: Example of the misclassi
fi
ed line, which was
vectorised several times as belonging to ‘neighbor’
hours segments (e.g. hour 1 and hour 2).
44
Region of Interest: Automatic Detection (slide 1/2)
ROI detection was performed using setup of threshold for contrasting pixels on the images. As a result,
the mask only included ROI between the red dashed lines (upper left image). The histograms show the
value of pixels excluded from the ROI (those above the red line on the graphs).
It is possible to process images in Python both in horizontal and in vertical orientation (image on the right)
Work
fl
ow for vectorising in Python, Matplotlib library (slide 5/10)
45
Region of Interest: Automatic Detection (slide 2/2)
De
fi
ning ROI (between the red dashed lines) and enlarged fragment. Below: 2 histograms showing the
distribution of pixels and those deleted (above the red dashed line). Right: enlarged fragment of the
digitised seismogram. Work
fl
ow for vectorising in Python, Matplotlib library (slide 6/10).
46
De
fi
ning optimal parameters for the line thickness and radius of pixels (1)
The thickness of the trace lines was de
fi
ned by a
series of trial tests with varied parameters.
Radius of 30 pixels was de
fi
ned as the optimal for the
given image (it may vary through in other cases).
Above: image with tested line thickness from 17 to 34
and radius of 50.
Below: image with tested thickness of the trace
line from 14 to 26 pixels (upper row) and 20 to
38 pixels (lower row) and radius of 40, 50 and
60 for each corresponding row (downwards).
Changed thickness of line is visible in all trial
cases (yellow-coloured horizontal lines).
Work
fl
ow for vectorising in Python, Matplotlib library (slide 7/10)
47
Continue testing the parameters for the line with
spots and seismogram with blurred contrast of
lines against the background
Work
fl
ow for vectorising in Python, Matplotlib library (slide 8/10)
De
fi
ning optimal parameters for the line thickness and radius of pixels (2)
48
Buffering minute intervals for the one-
minute gaps completed for the whole
seismogram
Work
fl
ow for vectorising in Python, Matplotlib library (slide 9/10)
Buffering parameters for the time gaps
Buffering minute intervals for the
one-minute gaps completed for
the whole seismogram;
Buffering of missing data: minute
and hour gaps
49
Work
fl
ow for vectorising in Python, Matplotlib library (slide 10/10)
Seismogram vectorised by Python overlain on the original image and uploaded in Cytomine
Example of the
vectorised trace
segments (red
lines) overlaid on
the spotted
image
Enlarged
fragment with
visible distinct
traces;
Enlarged
fragment with
visible time gaps
50
Summary of Project Milestones and Approaches
51
Conclusion: Research Connections and Structure
Data Challenge of big data in seismic studies:
massif volumes of historical seismograms from
ROB exist and present a source of information.
These archive old data must be processed,
digitised and ‘revitalised’.
Methods Our project focuses on developing
automated methods of vectorising seismograms
with minimised human interaction and
maximised ML approach in trace vectorisation
People Human interaction is necessary for the
whole work
fl
ow: archiving and processing data,
organising project, developing algorithms,
executing scripts, visualising graphics, testing
methods and trials, interpreting results
52
Thank you for attention !
Looking forward to your questions !

More Related Content

PDF
Exploitation of historical analog seismological records by image processing a...
PDF
I0343065072
PDF
mid_presentation
PPTX
understanding the planet using satellites and deep learning
PPTX
masters seminar_Detection
PPTX
Computed Tomography (ct),CT scan NDT
PDF
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
PPTX
People counting in low density video sequences2
Exploitation of historical analog seismological records by image processing a...
I0343065072
mid_presentation
understanding the planet using satellites and deep learning
masters seminar_Detection
Computed Tomography (ct),CT scan NDT
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
People counting in low density video sequences2

Similar to Vectorising analog seismograms by techniques of machine learning for automated discriminating of seismic signal traces (20)

PDF
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
PDF
SSII2018企画: センシングデバイスの多様化と空間モデリングの未来
PDF
Crack Detection of Wall Using MATLAB
PDF
Tomislav_Maric_OFW09_P_00 for mesh motion strategies given in a conference on...
PDF
Wujanz_Error_Projection_2011
PPTX
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
PDF
IRJET- Image Feature Extraction using Hough Transformation Principle
PPT
Digital Image Processing assignment 03042011.ppt
DOC
Research Paper v2.0
PPTX
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
PPT
Detector in aas 5th__lect_digital_image_0.ppt
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PPT
Miniproject final group 14
PPT
DIPsadasdasfsdfsdfdfasdfsdfsdgsdgdsfgdfgfdg
PDF
Tchebichef moment based hilbert scan for image compression
PDF
Formal Report
PPT
Photogrammetry 1.
PDF
Dr.maie-Lec_1_Introdudfdfsdfsdfsdfction.pdf
PPTX
Cahall Final Intern Presentation
PPTX
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
SSII2018企画: センシングデバイスの多様化と空間モデリングの未来
Crack Detection of Wall Using MATLAB
Tomislav_Maric_OFW09_P_00 for mesh motion strategies given in a conference on...
Wujanz_Error_Projection_2011
[20240408_LabSeminar_Huy]PivotalSTGNN.pptx
IRJET- Image Feature Extraction using Hough Transformation Principle
Digital Image Processing assignment 03042011.ppt
Research Paper v2.0
[20240628_LabSeminar_Huy]ScalableSTGNN.pptx
Detector in aas 5th__lect_digital_image_0.ppt
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Miniproject final group 14
DIPsadasdasfsdfsdfdfasdfsdfsdgsdgdsfgdfgfdg
Tchebichef moment based hilbert scan for image compression
Formal Report
Photogrammetry 1.
Dr.maie-Lec_1_Introdudfdfsdfsdfsdfction.pdf
Cahall Final Intern Presentation
1 Unit-1 DEC B.Tech ECE III Sem Syllabus & Intro.pptx
Ad

More from Universität Salzburg (20)

PDF
Biblioteca Salaborsa: Bologna, Italia. 231 Corso italiano semestrale Livello:...
PDF
Economic assessment of landslide risk for the Waidhofen a.d. Ybbs region, Alp...
PDF
Flood Hazard and Natural Risk Assessment: A Case Study of Bangladesh
PDF
Portfolio in Cartography and Remote Sensing
PDF
Studium oder Ausbildung? Die Auswahl der Jugendlichen heute in Deutschland
PDF
Tourisme en Belgique: Tourism in Belgium
PDF
Mapping landscapes of Africa using remote sensing data: detecting spatio-temp...
PDF
Accurate and rapid big spatial data processing by scripting cartographic algo...
PDF
Presentation lemenkova
PDF
Risks of Cryogenic Landslide Hazards and Their Impact on Ecosystems in Cold E...
PDF
Bringing Geospatial Analysis to the Social Studies: an Assessment of the City...
PDF
Detection of Vegetation Coverage in Urban Agglomeration of Brussels by NDVI I...
PDF
Investigation of the Lake Victoria Region (Africa: Tanzania, Kenya and Uganda)
PDF
Interpretation of Landscape Values, Typology and Quality Using Methods of Spa...
PDF
Mapping Land Cover Changes Using Landsat TM: a Case Study of Yamal Ecosystems...
PDF
Economic assessment of landslide risk for the Waidhofen a.d. Ybbs region, Alp...
PDF
Quality assessment of data from CHRIS/PROBA
PDF
Conservation Area Designation in the Andes
PDF
Seagrass mapping and monitoring along the coast of Crete, Greece
PDF
Why Should We Stand for Geothermal Energy ? Example of the Negative Impacts o...
Biblioteca Salaborsa: Bologna, Italia. 231 Corso italiano semestrale Livello:...
Economic assessment of landslide risk for the Waidhofen a.d. Ybbs region, Alp...
Flood Hazard and Natural Risk Assessment: A Case Study of Bangladesh
Portfolio in Cartography and Remote Sensing
Studium oder Ausbildung? Die Auswahl der Jugendlichen heute in Deutschland
Tourisme en Belgique: Tourism in Belgium
Mapping landscapes of Africa using remote sensing data: detecting spatio-temp...
Accurate and rapid big spatial data processing by scripting cartographic algo...
Presentation lemenkova
Risks of Cryogenic Landslide Hazards and Their Impact on Ecosystems in Cold E...
Bringing Geospatial Analysis to the Social Studies: an Assessment of the City...
Detection of Vegetation Coverage in Urban Agglomeration of Brussels by NDVI I...
Investigation of the Lake Victoria Region (Africa: Tanzania, Kenya and Uganda)
Interpretation of Landscape Values, Typology and Quality Using Methods of Spa...
Mapping Land Cover Changes Using Landsat TM: a Case Study of Yamal Ecosystems...
Economic assessment of landslide risk for the Waidhofen a.d. Ybbs region, Alp...
Quality assessment of data from CHRIS/PROBA
Conservation Area Designation in the Andes
Seagrass mapping and monitoring along the coast of Crete, Greece
Why Should We Stand for Geothermal Energy ? Example of the Negative Impacts o...
Ad

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Construction Project Organization Group 2.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PPT on Performance Review to get promotions
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
composite construction of structures.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Digital Logic Computer Design lecture notes
PPT
Project quality management in manufacturing
PPTX
UNIT 4 Total Quality Management .pptx
Mechanical Engineering MATERIALS Selection
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Internet of Things (IOT) - A guide to understanding
additive manufacturing of ss316l using mig welding
Construction Project Organization Group 2.pptx
OOP with Java - Java Introduction (Basics)
PPT on Performance Review to get promotions
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Foundation to blockchain - A guide to Blockchain Tech
composite construction of structures.pdf
Lecture Notes Electrical Wiring System Components
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Digital Logic Computer Design lecture notes
Project quality management in manufacturing
UNIT 4 Total Quality Management .pptx

Vectorising analog seismograms by techniques of machine learning for automated discriminating of seismic signal traces

  • 1. Polina Lemenkova PRESENTATION PLACE DATE PRESENTER 10.XII.2021 Prof. Dr. Olivier DEBEIR (ULB) and Dr. Thomas LECOCQ (Royal Observatory of Belgium, Department of Seismology and Gravimetry, co-promoteur) Université Libre de Bruxelles, École polytechnique de Bruxelles (Brussels Faculty of Engineering), Laboratory of Image Synthesis and Analysis (LISA). 1 Vectorising analog seismograms by techniques of machine learning for automated discriminating of seismic signal traces SUPERVISORS
  • 2. Part 1. Introduction. Key Facts on Seismograms. Project Objectives and Goals. Data and Instruments. Multi-Disciplinary Approaches. 2
  • 3. • Study object => old historical scanned seismograms in TIFF format from the archives of Royal Observatory of Belgium (ROB), Department of Seismology & Gravimetry. • Study area: Uccle station (see map). • Study problem => to digitise large archive of the old paper-based seismograms from ROB quickly, accurately and automatically. 3 Research Object and Problem
  • 4. Data and Instrument There are various types of seismometers used in geophysics. In this study we used archived seismograms recorded in 1954 by the Galitzine seismometer in Uccle station. Currently dataset included a collection of 145 images from 1 January 1954 to 12 March 1954 The period will be gradually enlarged as soon as other seismograms are scanned to cover 70 last years. Now images are monochrome (B/W). Other images might be scanned in colour. Some of the images are well preserved, some have distortions and defects visible on the aged paper 4 Instrument used for data capture in 1954: Horizontal Galitzine seismometer located in UCC. Image source: courtesy of ROB. Photo: Raphaël S. M. De Plaen
  • 6. 6 PROBLEMS ARE CAUSED BY TECHNIQUES OF OLD SEISMOGRAM RECORDING + TIME (SPOTS, BLURS, BROKEN PAPER, ETC.) Examples of the raw data: paper-based seismograms Empty records between the lines of seismic traces with enlarged fragment of seismogram. Here: UCC19540106Gal_N_0811.TIFF Partially spotted image caused by storage, with enlarged fragment of seismogram. Here: UCC19540107Gal_N_0815.TIFF Continuous noise dark background with blurred traces => lack of contrast for image recognition. Here: UCC19540108Gal_N_0815.TIFF Overlapped traces => problems for recognition of trace direction during vectorising. Here: UCC19540112Gal_E_0750.TIFF
  • 7. •Manual digitising cannot provide accurate and rapid data processing for developing digitised big dataset of archived seismograms •Seismic data cannot be processed manually and require automatization and programming approaches •We need to process big archives of seismic data from ROB effectively and quickly but accurately and precisely •We need to analyse data with minimised human labour to derive information on earthquakes and ground motion 7 Actuality, Importance and Research Tasks Text Example of the digitised seismograms using DigitSeis So far there are no existing integrated studies of digitising seismograms in big data volumes by ML methods. Only selected software exist (e.g. DigitSeis, SKATE, Teseo)
  • 8. 8 Interdisciplinary Nature of Project • Complexity of geophysical data processing requires integrated multi-disciplinary approaches • Applying ML to digitising seismograms brings new possibilities and bene fi ts in seismology. • Opportunities of ML =>> accurate and rapid digitising of the scanned images, rapid processing of historical seismograms, improved techniques of automated recognition of signals and data interpreting. =>> • We need to handle seismic data by ML techniques and advanced software. • Therefore, our project presents a multi- disciplinary approach of ML applications to seismic data processing
  • 9. 9 Project Motivation, Strengths and Challenges Old scanned raster seismogram (TIFF fi le) Fragment of the vectorised output (DigitSeis)
  • 10. 10 Various Approaches in One Study: Overlapping Disciplines Our project presents an interdisciplinary research combining overlapping scienti fi c clusters and engineering disciplines (image processing, geophysics, ML and data science). A multi-disciplinary project integrates 3 major scienti fi c clusters and several disciplines as sub- sections for vectorising seismograms: 1. Image Processing, Pattern Recognition, Computer Science, Programming, ML 2. Earth Observation data (ROB, Uccle archive), Geophysics and Seismology, Geology, Earthquake Engineering 3. Data Science, Data Analysis, Signal Processing Algorithms of Digitising & Vectorising
  • 11. 11 Goals and Objectives of my PhD Project
  • 12. Activities Towards Achieving Project Goals 12
  • 13. Part 2. Application of DigitSeis Software for Vectorising Seismograms. 13
  • 14. 14 Work fl ow of DigitSeis Software for Vectorising
  • 15. 15 Examples of marking time gaps (minutes/hours) on seismograms in Cytomine Seismogram processed by DigitSeis Examples of the annotation classes on the raw data: scanned analog seismograms from the Uccle station. Fragment of the resulting digitised output (enlarged) showing seismic traces (horizontal curve lines) and 1-minute time gaps (small vertical dashed lines)
  • 16. 16 Examples of the identi fi ed time gaps on the raw TIFF images • Enlarged fragment of image • Time gaps indicating minutes breaking the trace line • Zoomed segment separating the trace line between each other (tiny white gaps breaking traces) Original scanned seismogram (UCC19540116Gal_E_0820.tif)
  • 17. 17 Identifying time gaps on seismogram using DigitSeis • Identifying time marks on seismograms by measuring time gap between records. Here: UCC19540119Gal_N_0825.tif Indicating time marks on seismograms as -22 and preparing image for classi fi cation
  • 18. 18 Identifying noise and annotations on seismogram using DigitSeis • Results of the classi fi ed seismogram with shown identi fi ed object categories. • Traces are vector white lines while noise is red-coloured objects, automatically recognised (here: handwritten annotations) Small region analysis used for de fi ning a smaller area of interest for closer examination of a border region of the seismogram
  • 19. 19 Digitised segments of the trace lines in DigitSeis • Results of the classi fi ed image with shown yellow segments of the identi fi ed trace (enlarged fragment). • Here: example of the fi le UCC19540109Gal_E_0812.tif • Classi fi ed seismogram with traces saved in binary format 0-1. • Here: example of fi le UCC19540109Gal_E_0812.tif (January 9, 1954.)
  • 20. 20 Digitised traces after classi fi cation in DigitSeis • Some time gaps (upper left part of the image) were not identi fi ed and not recognised automatically between the trace and dark background. • In these cases, gaps required manual correction to identify time intervals. • Enlarged view of the automatically recognised digitised traces displayed by lines of various colours, • Zero-lines for each trace are visualised as cyan-coloured dashed lines, numbered from top to bottom. • Vertical yellow dashes are time gaps
  • 21. 21 Identi fi ed traces for selective correction and re-digitising using Correct Trace mode • Identi fi ed wrong vector direction of line crossing individual traces • Detected misclassi fi cations caused erroneous digitising. • The gaps on the zero-lines (small yellow boxes) show the gaps that existed in the old paper in the original image itself.
  • 22. 22 Identi fi ed traces for selective correction and re-digitising using Correct Trace mode • Overlap of line traces unrecognised during digitising: one segment of trace went steeply downwards and merged with another trace • Enlarged view of the manually corrected entangled traces. Correcting misclassi fi ed traces with wrong direction based on colour and geometric pixel’s characteristics.
  • 23. 23 Identi fi ed traces for selective correction and re-digitising using Correct Trace mode • Merging the trace initially broken into the three separate parts (three small yellow boxes) • Reclassi fi cation of the selected segment and digitising the centroid of the trace line (purple-coloured). Correcting trace for the selected segments
  • 24. 24 Seismogram image with adjusted timing. Here: UCC19540311Gal_E_0727.mat • Timing setup using time display increment • Yellow vertical small dash lines - minute marks • Time markers at 1-minute intervals on each 30-minute trace.
  • 25. 25 Example of the digitised image with minute time gaps Here: fragment of UCC19540311Gal_E_0727.mat)
  • 26. 26 Validating Results of MATLAB File in Python: Post-Processing Controlling digitising results using Python (Matplotlib library). Blue dots shown the starting position of the hours segments. Green dots show the minute marks. Red dots show the noise and edge dots. Correctly identi fi ed time gaps controlled by Python’s Matplotlib Quality control for time gaps: missed marks in unrecognised segments.
  • 27. 27 Statistical Plotting of Data Frequency in .mat File processed by Python
  • 28. 28 Research Approach of DigitSeis: Major Steps of Seismic Data Processing
  • 29. •Machine Learning (ML) and Deep Learning (DL) in vectorising analog seismograms • ML & DL: Automatic and intelligent data analysis: detecting trace lines using threshold parameters • Image processing: segmentation, classi fi cation of seismograms (separating lines from noise) • Data visualisation and plotting • Data analysis and interpretation •Advanced methods => solve problem of ef fi cient processing of big massifs of old scanned fi les (TIFFs) for geophysical modelling and data interpretation for seismology research •Developing new advanced ML algorithms to digitise seismograms and convert them in vector format automatically 29 Methodology of Project
  • 30. Part 3. Using Cytomine Workspace for Storing, Viewing and Analysing Data. 30
  • 31. 31 Why using Cytomine for Processing Seismostorm Project ?
  • 32. 32 Cytomine for data storage, sharing and analysis View of the Seismostorm project and fi le browsing system Content of fi les in the Seismostorm project in Cytomine • The workspace containing seismic dataset is shared by users (collaborators of Seismostorm) • Navigating in Cytomine =>> paths and hierarchical structure of the project Cytomine is an image analysis workspace to contain, organise, visualise, annotate and analyse images. • Data were placed on the Cytomine environment (Cytomine), developed by the ULiège team. • We uploaded our TIFF images into our project. • Originally designed as a tool for biomedical image processing, Cytomine is adopted in this study for geophysical data processing using seismograms. • The dataset contains 145 fi les recorded in 1954 by Galitzine seismometer.
  • 33. 33 Creating ontologies in Cytomine for objects recognition View of the Seismostorm project and fi le browsing system Hour ticks, minute ticks and various categories detected as object classes on the images Examples of the detected and annotated object classes on the scanned seismograms • Ontologies generated in Seismostorm project in Cytomine enable to class shapes for automated recognition • Segments, start hours ticks and fl ares detected as object classes on the scanned images
  • 34. 34 Examples of detecting cases in seismograms in Cytomine Hour ticks on the seismograms recorded by the seismometer drum Examples of the detected and annotated object classes on the scanned seismograms Segments separated as fragments on the trace lines
  • 35. 35 Examples of marking time gaps (minutes/hours) on seismograms in Cytomine Manual ticks for the start hours on the partially spotted image Examples of the annotation classes on the raw data: scanned analog seismograms from the Uccle station. Manual hour marks for handwritten annotations on the old scanned image
  • 36. 36 Examples of marking time gaps (minutes/hours) on seismograms in Cytomine Flares detected on the old scanned raster images of the analog seismograms Examples of the annotation classes on the raw data: scanned analog seismograms from the Uccle station. Minute marks detected, recognised and classi fi ed using ‘ontologies’ of Cytomine on the TIFF fi les
  • 37. Part 4. Using Python for Automatic Data Processing 37
  • 38. Why Python in Vectorising Seismograms? 38
  • 39. ML for Vectorising Seismograms: a Work fl ow in Python 39 The work fl ow for digitising seismograms in Python includes several steps: • De fi ning Region of Interest (ROI) • Selecting threshold parameters (radius of pixels, percentage of contrast) • Sampling several approaches with varied parameters • Processing full ROI after testing parameters and selecting the best and optimal parameters (e.g. pixel size 30, radius 85%) • Vectorising (executing Python script) • Exporting the results to the HDFS format
  • 40. 40 Python-based digitising of raster image (1) Automated vectorising of seismograms was performed using several work steps. First, the low-resolution images were grabbed by Python script from the Cytomine and used in script. Work fl ow for vectorising in Python, Matplotlib library (slide 1/10) Enlarged fragment of the vectorised segments of the trace lines
  • 41. 41 Text Work fl ow for vectorising in Python, Matplotlib library (slide 2/10) Python-based digitising of raster image (2) Second, the hour gaps have been detected using the indication of the repeatability of gaps (double gaps, close located next to the fi rst minute of this hour). Above: view of the seismogram with indicated hour gaps. Right: enlarged fragment.
  • 42. 42 Text Third, the line with double vectorisation (overlapping time periods) were processed. Work fl ow for vectorising in Python, Matplotlib library (slide 3/10) Python-based digitising of raster image (3)
  • 43. 43 Work fl ow for vectorising in Python, Matplotlib library (slide 4/10) Python-based digitising of raster image (4) Left: Example of the digitised traces in Python. Above: Example of the misclassi fi ed line, which was vectorised several times as belonging to ‘neighbor’ hours segments (e.g. hour 1 and hour 2).
  • 44. 44 Region of Interest: Automatic Detection (slide 1/2) ROI detection was performed using setup of threshold for contrasting pixels on the images. As a result, the mask only included ROI between the red dashed lines (upper left image). The histograms show the value of pixels excluded from the ROI (those above the red line on the graphs). It is possible to process images in Python both in horizontal and in vertical orientation (image on the right) Work fl ow for vectorising in Python, Matplotlib library (slide 5/10)
  • 45. 45 Region of Interest: Automatic Detection (slide 2/2) De fi ning ROI (between the red dashed lines) and enlarged fragment. Below: 2 histograms showing the distribution of pixels and those deleted (above the red dashed line). Right: enlarged fragment of the digitised seismogram. Work fl ow for vectorising in Python, Matplotlib library (slide 6/10).
  • 46. 46 De fi ning optimal parameters for the line thickness and radius of pixels (1) The thickness of the trace lines was de fi ned by a series of trial tests with varied parameters. Radius of 30 pixels was de fi ned as the optimal for the given image (it may vary through in other cases). Above: image with tested line thickness from 17 to 34 and radius of 50. Below: image with tested thickness of the trace line from 14 to 26 pixels (upper row) and 20 to 38 pixels (lower row) and radius of 40, 50 and 60 for each corresponding row (downwards). Changed thickness of line is visible in all trial cases (yellow-coloured horizontal lines). Work fl ow for vectorising in Python, Matplotlib library (slide 7/10)
  • 47. 47 Continue testing the parameters for the line with spots and seismogram with blurred contrast of lines against the background Work fl ow for vectorising in Python, Matplotlib library (slide 8/10) De fi ning optimal parameters for the line thickness and radius of pixels (2)
  • 48. 48 Buffering minute intervals for the one- minute gaps completed for the whole seismogram Work fl ow for vectorising in Python, Matplotlib library (slide 9/10) Buffering parameters for the time gaps Buffering minute intervals for the one-minute gaps completed for the whole seismogram; Buffering of missing data: minute and hour gaps
  • 49. 49 Work fl ow for vectorising in Python, Matplotlib library (slide 10/10) Seismogram vectorised by Python overlain on the original image and uploaded in Cytomine Example of the vectorised trace segments (red lines) overlaid on the spotted image Enlarged fragment with visible distinct traces; Enlarged fragment with visible time gaps
  • 50. 50 Summary of Project Milestones and Approaches
  • 51. 51 Conclusion: Research Connections and Structure Data Challenge of big data in seismic studies: massif volumes of historical seismograms from ROB exist and present a source of information. These archive old data must be processed, digitised and ‘revitalised’. Methods Our project focuses on developing automated methods of vectorising seismograms with minimised human interaction and maximised ML approach in trace vectorisation People Human interaction is necessary for the whole work fl ow: archiving and processing data, organising project, developing algorithms, executing scripts, visualising graphics, testing methods and trials, interpreting results
  • 52. 52 Thank you for attention ! Looking forward to your questions !