Applied Data Analysis Lab – a profile 
Dr. Łukasz Bolikowski 
ICM, University of Warsaw 
December 2014
ADA Lab  ICM  UW 
University of Warsaw (UW) is one of the top Polish higher education establishments. 
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) 
is a supercomputing and research data centre within the University of Warsaw. 
Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.
ADA Lab’s Scope of Interest 
Scalable Text and Data Mining Informatics for Open Science 
Legal Text Mining 
Business Data Mining 
Training  Outreach 
Scholarly PDF Mining 
Map of Science 
Persistent IDs 
Data Anonymization
Legal Text Mining 
Building a judgment analysis system for Poland. 
Integrating data from common courts, the 
Supreme Administrative Court, the Supreme 
Court, and the Constitutional Tribunal. 
Planning a larger, European project with similar 
goals (Horizon 2020; currently building consor-tium 
and defining scope).
Business Data Mining 
Leveraging high demand for data science skills. 
For-profit projects with business partners. 
Usually can’t discuss details due to NDAs. 
Our favourite toolset: 
R for data understanding and modelling 
Apache Spark for analysing larger data sets 
D3 for information visualization 
CRISP-DM for managing our projects 
(Cross-Industry Standard Process for Data Mining)
Training and Outreach 
“Web-Scale Data Mining and Processing” 
(Course at Polish Academy of Sciences) 
“Introduction to Text Mining” 
(Course at Warsaw School of Data Analysis organised by ICM) 
Internal trainings on Hadoop, Spark 
Presentations at Big Data conferences 
(Target audience: business partners) 
Workshops and internships for talented youth 
(In collaboration with Polish Children’s Fund)
Scholarly PDF Mining 
Extracting metadata, bibliographic references, and full text 
from scholarly PDFs. Research direction: semantic anno-tation 
of paragraphs, sentences, phrases. 
CERMINE is an open software (AGPL license), with users 
worldwide: OpenAIRE.eu, Paperity.org, Public Knowledge 
Project. 
Interfaces for humans and for machines (RESTful API). 
Try CERMINE at: http://cermine.ceon.pl/
Map of Science 
A comprehensive map of academia. Mining available 
documents and data sets in order to reconstruct the 
graph of relations between: people, documents, insti-tutions, 
topics, funding sources. 
Final result: a publicly available data set. 
Why? Better understanding of science. Cool features 
in digital libraries and research information systems. 
Elements of the map currently developed in OpenAIRE 
and OCEAN projects.
Persistent IDs 
To achieve long-term preservation of research arti-facts, 
we need an identifier minting and management 
scheme that can outlive the organization managing 
the scheme. 
We are developing a distributed scheme based on 
public-key cryptography and P2P networking (a lot 
in common with Bitcoin).
Data Anonymization 
Privacy-preserving research data publication is a 
cross-cutting issue, applies to various types of 
data analysed at ICM: legal judgments, medical 
records, social network activity.
Thank you for your attention. Let’s stay in touch! 
adalab.icm.edu.pl/blog 
twitter.com/adalab_icm 
linkedin.com/in/bolikowski 
twitter.com/bolikowski 
lukasz.bolikowski@icm.edu.pl
License 

c 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the following 
sources were used: 
https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0) 
https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0) 
https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0) 
https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0) 
https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)

More Related Content

PDF
LDOW2015 Position Talk and Discussion
PDF
A Survey on Text Mining-techniques and application
PPTX
Text mining presentation in Data mining Area
PPTX
Web Mining & Text Mining
PDF
scopeKM: Text analysis with Triples
DOCX
Applied systems
PDF
DBPedia-past-present-future
PDF
Text databases and information retrieval
LDOW2015 Position Talk and Discussion
A Survey on Text Mining-techniques and application
Text mining presentation in Data mining Area
Web Mining & Text Mining
scopeKM: Text analysis with Triples
Applied systems
DBPedia-past-present-future
Text databases and information retrieval

What's hot (18)

PPTX
Hypermedia database on the Web
PPTX
Intro to DE-DV
PDF
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
PPTX
data warehousing and data mining
PPTX
Text mining
PPTX
Dspace OAI-PMH
DOC
Semi-automatic Text MiningNK
PDF
Scalable and privacy-preserving data integration - part 1
PDF
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
PPTX
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
PPTX
Introduction to Text Mining and Semantics
PDF
Web_Mining_Overview_Nfaoui_El_Habib
PPT
Text mining and data mining
DOC
document-part- (6).doc
ZIP
SemWeb Fundamentals - Info Linking & Layering in Practice
PDF
OpenMinTeD: Making Sense of Large Volumes of Data
PDF
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
PPT
Upstate CSCI 525 Data Mining Chapter 1
Hypermedia database on the Web
Intro to DE-DV
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
data warehousing and data mining
Text mining
Dspace OAI-PMH
Semi-automatic Text MiningNK
Scalable and privacy-preserving data integration - part 1
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
Exposing Bibliographic Information as Linked Open Data using Standards-based ...
Introduction to Text Mining and Semantics
Web_Mining_Overview_Nfaoui_El_Habib
Text mining and data mining
document-part- (6).doc
SemWeb Fundamentals - Info Linking & Layering in Practice
OpenMinTeD: Making Sense of Large Volumes of Data
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Upstate CSCI 525 Data Mining Chapter 1
Ad

Similar to A profile of Applied Data Analysis Lab (ADA Lab) (20)

PPTX
Immersive informatics - research data management at Pitt iSchool and Carnegie...
PPT
Supporting Libraries in Leading the Way in Research Data Management
PPTX
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PPTX
Bosman and Kramer Open Research: A 2024 NISO Training Series, Session Four: O...
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
PPTX
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
PPTX
Research data management for masters and ph d students
PPTX
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
PDF
Effective research data management
PPTX
Research data support: a growth area for academic libraries?
PPT
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
PDF
Data management plans – EUDAT Best practices and case study | www.eudat.eu
PDF
Research Data Management: What is it and why is the Library & Archives Servic...
PPTX
A coordinated framework for open data open science in Botswana/Simon Hodson
PDF
Researh data management
PPTX
Research Data Management at The University of Edinburgh
PDF
Va sla nov 15 final
PPTX
FAIRDOM data management support for ERACoBioTech Proposals
PDF
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Supporting Libraries in Leading the Way in Research Data Management
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
Bosman and Kramer Open Research: A 2024 NISO Training Series, Session Four: O...
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research data management for masters and ph d students
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Effective research data management
Research data support: a growth area for academic libraries?
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Research Data Management: What is it and why is the Library & Archives Servic...
A coordinated framework for open data open science in Botswana/Simon Hodson
Researh data management
Research Data Management at The University of Edinburgh
Va sla nov 15 final
FAIRDOM data management support for ERACoBioTech Proposals
Ad

Recently uploaded (20)

PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
Mutation in dna of bacteria and repairss
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Packaging materials of fruits and vegetables
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
gene cloning powerpoint for general biology 2
PPT
LEC Synthetic Biology and its application.ppt
PPTX
endocrine - management of adrenal incidentaloma.pptx
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Understanding the Circulatory System……..
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Mutation in dna of bacteria and repairss
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Packaging materials of fruits and vegetables
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Seminar Hypertension and Kidney diseases.pptx
Presentation1 INTRODUCTION TO ENZYMES.pptx
gene cloning powerpoint for general biology 2
LEC Synthetic Biology and its application.ppt
endocrine - management of adrenal incidentaloma.pptx
Introcution to Microbes Burton's Biology for the Health
Understanding the Circulatory System……..
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Hypertension_Training_materials_English_2024[1] (1).pptx
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw

A profile of Applied Data Analysis Lab (ADA Lab)

  • 1. Applied Data Analysis Lab – a profile Dr. Łukasz Bolikowski ICM, University of Warsaw December 2014
  • 2. ADA Lab ICM UW University of Warsaw (UW) is one of the top Polish higher education establishments. Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) is a supercomputing and research data centre within the University of Warsaw. Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.
  • 3. ADA Lab’s Scope of Interest Scalable Text and Data Mining Informatics for Open Science Legal Text Mining Business Data Mining Training Outreach Scholarly PDF Mining Map of Science Persistent IDs Data Anonymization
  • 4. Legal Text Mining Building a judgment analysis system for Poland. Integrating data from common courts, the Supreme Administrative Court, the Supreme Court, and the Constitutional Tribunal. Planning a larger, European project with similar goals (Horizon 2020; currently building consor-tium and defining scope).
  • 5. Business Data Mining Leveraging high demand for data science skills. For-profit projects with business partners. Usually can’t discuss details due to NDAs. Our favourite toolset: R for data understanding and modelling Apache Spark for analysing larger data sets D3 for information visualization CRISP-DM for managing our projects (Cross-Industry Standard Process for Data Mining)
  • 6. Training and Outreach “Web-Scale Data Mining and Processing” (Course at Polish Academy of Sciences) “Introduction to Text Mining” (Course at Warsaw School of Data Analysis organised by ICM) Internal trainings on Hadoop, Spark Presentations at Big Data conferences (Target audience: business partners) Workshops and internships for talented youth (In collaboration with Polish Children’s Fund)
  • 7. Scholarly PDF Mining Extracting metadata, bibliographic references, and full text from scholarly PDFs. Research direction: semantic anno-tation of paragraphs, sentences, phrases. CERMINE is an open software (AGPL license), with users worldwide: OpenAIRE.eu, Paperity.org, Public Knowledge Project. Interfaces for humans and for machines (RESTful API). Try CERMINE at: http://cermine.ceon.pl/
  • 8. Map of Science A comprehensive map of academia. Mining available documents and data sets in order to reconstruct the graph of relations between: people, documents, insti-tutions, topics, funding sources. Final result: a publicly available data set. Why? Better understanding of science. Cool features in digital libraries and research information systems. Elements of the map currently developed in OpenAIRE and OCEAN projects.
  • 9. Persistent IDs To achieve long-term preservation of research arti-facts, we need an identifier minting and management scheme that can outlive the organization managing the scheme. We are developing a distributed scheme based on public-key cryptography and P2P networking (a lot in common with Bitcoin).
  • 10. Data Anonymization Privacy-preserving research data publication is a cross-cutting issue, applies to various types of data analysed at ICM: legal judgments, medical records, social network activity.
  • 11. Thank you for your attention. Let’s stay in touch! adalab.icm.edu.pl/blog twitter.com/adalab_icm linkedin.com/in/bolikowski twitter.com/bolikowski lukasz.bolikowski@icm.edu.pl
  • 12. License c 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the following sources were used: https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0) https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0) https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0) https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0) https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)