SlideShare a Scribd company logo
1
e-Infrastructural needs to support
informatics*
David Wallom
What is e-Infrastructure?
The integration of digitally-based technology, resources, facilities, and
services combined with people and organizational structures needed to
support modern, collaborative research (and teaching).
1.Data and Storage
2.Software (and Algorithms)
3.Hardware (Compute)
4.Networks
5.Security and authentication
6.People (Collaboration, Skills, Capacity)
7.The Digital Library
Bioinformatics software challenges
• This brings a onslaught of new
challenges for bioinformatics:
– projects that used to require teams
of 500 are now accessible to small
teams
– but biology curricula (i.e.
biologists) still lack computational
skills.
– thus biologists are overwhelmed
by large amounts of data
– furthermore data types are young -
so software is young, thus
• software may be badly built (by
biologists with no formal software dev
training/xp).
• software needs to be frequently
updated (bugfixes, algorithmic
improvements (sensitivity/specificity),
new data type support).
changes everything for biology
ARCHER
• UK National Supercomputing
Service
• Replacement for HECToR
• LINPACK = 1.359 Pflop/s
• EPSRC is the managing partner
on behalf of RCUK
• NERC are the other partner
research council
• Cray XC30 Hardware
• Nodes based on 2× Intel Ivy Bridge
12-core processors
• 64GB (or 128GB) memory per
node
• 3008 nodes in total (72162 cores)
• Linked by Cray Aries interconnect
(dragonfly topology)
External Network inside JASMIN
Unmanaged Cloud – IaaS, PaaS, SaaS
JASMIN Internal Network
Panasas
storage
Lotus Batch
Compute
JASMIN Cloud Architecture
Standard Remote Access Protocols –
ftp, http, …
Managed Cloud - PaaS, SaaS
JASMIN
Analysis
Platform
VM
Project1-org
Science
Analysis
VM 0
Science
Analysis
VM 0
Science
Analysis
VM
JASMIN Cloud Management Interfaces
Direct File
System
Access
Direct access to
batch processing
cluster
Appliance
Catalogue
Firewall + NAT
Firewall
optirad-org
Science
Analysis
VM 0
Science
Analysis
VM 0
IPython
Slave VM
File Server
VM
IPython
JupyterHub VM
eos-cloud-org
Science
Analysis
VM 0
Science
Analysis VM
0
EOSCloud VM
File Server
VM
EOSCloud
Fat Node
IPython Notebook VM with access
cluster through IPython.parallel EOSCoud Desktop as a Service
with dynamic RAM boost
Appliance
Catalogue
Appliance
Catalogue
Firewall + NAT Firewall + NAT
Firewall
Thanks to Phil Kershaw
e-infrastructural needs to support informatics
OTHER PUBLIC CLOUD PROVIDERS ARE AVAILABLE
Bio-Linux: A scalable solution
• Comprehensive, free bioinformatics workstation based on Ubuntu
Linux and Debian Med
• 11 years & 8 major releases
• Around 8000 users from 1600 locations
• 200+ bioinf packages including big integrative tools :- QIIME, Galaxy
Server, PredictProtein, EMBOSS, ...Incorporates all software
Dual BootLinux Live Local Servers Cloud
Docker, simplifying the portability
of applications and services
EOS Cloud
• A tenancy in the JASMIN Unmanaged Cloud (& QMUL RCC)
• Reusing JASMIN web interfaces and user management to
provide custom IaaS software platform
• Each receives two VMs
– Bio-Linux
– Ubuntu Docker hosting environment
• Users have total responsibility for instantiated system
• Accessible though standard remote desktop tools
• Scalability limited by support available
Why Cloud?
• Data sets can be too big or restricted to easily move
– move the compute to the data
– Researcher work patterns are maintained
• Tools such as Bio-Linux/Docker etc are community enablers
• More efficient use of shared resources
• Central maintenance of infrastructure
• Central Management of data sharing agreements possible
• Lower barrier to entry (Compared to traditional HPC and
Grid)
• What type of cloud?
• What role for traditional HPC?
TRAINING IS KEY TO MAKING INFORMED CHOICES
EOS Cloud next?
• Expand currently available resource beyond
current limitations?
• Create deployable machine image for other
cloud marketplaces
• EOS/institutional badging to give users
confidence in quality
Pilot Users
• CEH Bioinformaticians using the EOS Cloud to
study patterns in microbial
biodiversity
• Genomic and transcriptomic data from fish
toxicogenomics studies at Exeter
Pilot Users
• Creating compute pipelines and containers for each OSD in silico
analysis
– HPC, Cloud (IaaS & PaaS)
– Portable
• Run same analysis on different laptops/grids/clouds
– Repeatable/Reproducible
• Same input gives same output given that reference databases did not change
– Preservation
• All analysis tools and dependencies are in one image
• Images are simple tar.gz
• Preserving Docker and base images is preserving all analysis
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability
Institute
A national facility for cultivating better, more
sustainable, research software to enable world-
class research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Developing the policy and tools to
support the community developing and
using research software
Supported by RCUK
Communication
Website & blog
Campaigns
Advice
Guides
Courses
Workshops
Fellowship
Research
Software
Policy
Training
Community
Consultancy
41 projects
92 evaluations
4 surgeries
33 UK SWC
workshops
1000+ learners
50,000 readers
41 domain
ambassadors
20+ workshops organised
740 researchers
50,000 grants
analysed
150+ contributed articles
19,000 unique visitors per month
272 RSEs engaged 1700 signatures 13 issues highlighted
The end of the beginning, not the beginning of the end!
• A holistic approach is required with all parts of e-infrastructure supported from the Hard to
Soft to Wet!
• Good start up investments need continuity to ensure impact
– Certain tools are foundations upon which large swathes of community depend
– Putting tools next to immovable data ensures value!
• Integrating with larger activities ensure benefits of scaling
– you can’t steer something you’re not involved with…
• Abstract underpinning e-infrastructure services from the users, as they’re not interested!
– Run something on one resource should be able to be moved to others throug hthe use of
standards etc!
– I have ignored the institutional resources…
****WARNING****
Institute for Environmental Analytics Summer School on e-infrastructure for the environment
19th – 22nd Sept ’16, Oxford.

More Related Content

PPTX
CloudWatch2 Adoption Deep Dive
PPT
Hedstrom Infrastructure
PDF
Sgci esip-7-20-18
PPT
The Standards Dilemma - Digital Library Standards 2008
PPT
Authentication Methods: Shibboleth
PPTX
Cyber Resilient Energy Delivery Consortium - Overview
PPS
Inn Presentation
PPT
Implementing an Institutional Repository for Leeds Met
CloudWatch2 Adoption Deep Dive
Hedstrom Infrastructure
Sgci esip-7-20-18
The Standards Dilemma - Digital Library Standards 2008
Authentication Methods: Shibboleth
Cyber Resilient Energy Delivery Consortium - Overview
Inn Presentation
Implementing an Institutional Repository for Leeds Met

Similar to e-infrastructural needs to support informatics (20)

PPTX
Desktop as a Service supporting Environmental 'Omics
PPTX
Supporting Research through "Desktop as a Service" models of e-infrastructure...
PPTX
Utilising Cloud Computing for Research through Infrastructure, Software and D...
PPTX
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
PPTX
Desktop as a Service supporting Environmental ‘omics
PPTX
CLIMB System Introduction Talk - CLIMB Launch
PPTX
Packaging computational biology tools for broad distribution and ease-of-reuse
PPTX
Cloud and Desktop aaS for Teaching
PPTX
e-Infrastructure available for research, using the right tool for the right job
PPTX
Cyverse: Extensible Cyberinfrastructure for Life Science
PPTX
Open Source and Science at the National Science Foundation (NSF)
PPTX
Interoperability and scalability with microservices in science
PPTX
EOSC-hub service portfolio
PDF
EGI Services
PDF
Software Ecosystems = Big Data
PPTX
Climb bath
PPTX
COAR Notify - presentation to PRC Meeting Lyon Notify
PPTX
Data-intensive bioinformatics on HPC and Cloud
PPTX
2016 05 sanger
PPTX
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Desktop as a Service supporting Environmental 'Omics
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Desktop as a Service supporting Environmental ‘omics
CLIMB System Introduction Talk - CLIMB Launch
Packaging computational biology tools for broad distribution and ease-of-reuse
Cloud and Desktop aaS for Teaching
e-Infrastructure available for research, using the right tool for the right job
Cyverse: Extensible Cyberinfrastructure for Life Science
Open Source and Science at the National Science Foundation (NSF)
Interoperability and scalability with microservices in science
EOSC-hub service portfolio
EGI Services
Software Ecosystems = Big Data
Climb bath
COAR Notify - presentation to PRC Meeting Lyon Notify
Data-intensive bioinformatics on HPC and Cloud
2016 05 sanger
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Ad

More from David Wallom (20)

PPTX
Quantifying the impact of green leasing on energy use in a retail portfolio: ...
PPTX
Trust and Cloud computing, removing the need for the consumer to trust their ...
PPTX
Trust and Cloud computing, removing the need for the consumer to trust their ...
PPTX
The University of Oxford e-Research Centre
PPTX
Introduction to Cloud Computing
PPTX
Benefits of big data analytics in Smart Metering, ADEPT, WICKED and beyond
PPTX
Smarter Energy, Infrastruture service, consumtion analytics and applications
PPTX
The Climateprediction.net programme, big data climate modelling
PPTX
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
PPTX
e-Research & the art of linking Astrophysics to Deforestation
PPTX
Privacy and Security policies in the cloud
PPTX
Working with Earth Observation Data, INFORM and the IEA
PPTX
WICKED - Working with the data rich
PPTX
Mapping Priorities and Future Collaborations for you Projects
PPTX
CloudWatch: Mapping priorities and future collaboration for your project
PPTX
Trust and Cloud Computing, removing the need to trust your cloud provider
PPTX
Generating Insight from Big Data
PPTX
International Forest Risk Model
PPTX
Generating Insight from Big Data in Energy and the Environment
PPTX
Smart Grid, Smart Metering and Cybersecurity
Quantifying the impact of green leasing on energy use in a retail portfolio: ...
Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...
The University of Oxford e-Research Centre
Introduction to Cloud Computing
Benefits of big data analytics in Smart Metering, ADEPT, WICKED and beyond
Smarter Energy, Infrastruture service, consumtion analytics and applications
The Climateprediction.net programme, big data climate modelling
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
e-Research & the art of linking Astrophysics to Deforestation
Privacy and Security policies in the cloud
Working with Earth Observation Data, INFORM and the IEA
WICKED - Working with the data rich
Mapping Priorities and Future Collaborations for you Projects
CloudWatch: Mapping priorities and future collaboration for your project
Trust and Cloud Computing, removing the need to trust your cloud provider
Generating Insight from Big Data
International Forest Risk Model
Generating Insight from Big Data in Energy and the Environment
Smart Grid, Smart Metering and Cybersecurity
Ad

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
microscope-Lecturecjchchchchcuvuvhc.pptx
The scientific heritage No 166 (166) (2025)
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Biophysics 2.pdffffffffffffffffffffffffff
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
bbec55_b34400a7914c42429908233dbd381773.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
. Radiology Case Scenariosssssssssssssss
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The KM-GBF monitoring framework – status & key messages.pptx
ECG_Course_Presentation د.محمد صقران ppt
Derivatives of integument scales, beaks, horns,.pptx
2. Earth - The Living Planet Module 2ELS
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Introduction to Fisheries Biotechnology_Lesson 1.pptx

e-infrastructural needs to support informatics

  • 1. 1 e-Infrastructural needs to support informatics* David Wallom
  • 2. What is e-Infrastructure? The integration of digitally-based technology, resources, facilities, and services combined with people and organizational structures needed to support modern, collaborative research (and teaching). 1.Data and Storage 2.Software (and Algorithms) 3.Hardware (Compute) 4.Networks 5.Security and authentication 6.People (Collaboration, Skills, Capacity) 7.The Digital Library
  • 3. Bioinformatics software challenges • This brings a onslaught of new challenges for bioinformatics: – projects that used to require teams of 500 are now accessible to small teams – but biology curricula (i.e. biologists) still lack computational skills. – thus biologists are overwhelmed by large amounts of data – furthermore data types are young - so software is young, thus • software may be badly built (by biologists with no formal software dev training/xp). • software needs to be frequently updated (bugfixes, algorithmic improvements (sensitivity/specificity), new data type support). changes everything for biology
  • 4. ARCHER • UK National Supercomputing Service • Replacement for HECToR • LINPACK = 1.359 Pflop/s • EPSRC is the managing partner on behalf of RCUK • NERC are the other partner research council • Cray XC30 Hardware • Nodes based on 2× Intel Ivy Bridge 12-core processors • 64GB (or 128GB) memory per node • 3008 nodes in total (72162 cores) • Linked by Cray Aries interconnect (dragonfly topology)
  • 5. External Network inside JASMIN Unmanaged Cloud – IaaS, PaaS, SaaS JASMIN Internal Network Panasas storage Lotus Batch Compute JASMIN Cloud Architecture Standard Remote Access Protocols – ftp, http, … Managed Cloud - PaaS, SaaS JASMIN Analysis Platform VM Project1-org Science Analysis VM 0 Science Analysis VM 0 Science Analysis VM JASMIN Cloud Management Interfaces Direct File System Access Direct access to batch processing cluster Appliance Catalogue Firewall + NAT Firewall optirad-org Science Analysis VM 0 Science Analysis VM 0 IPython Slave VM File Server VM IPython JupyterHub VM eos-cloud-org Science Analysis VM 0 Science Analysis VM 0 EOSCloud VM File Server VM EOSCloud Fat Node IPython Notebook VM with access cluster through IPython.parallel EOSCoud Desktop as a Service with dynamic RAM boost Appliance Catalogue Appliance Catalogue Firewall + NAT Firewall + NAT Firewall Thanks to Phil Kershaw
  • 7. OTHER PUBLIC CLOUD PROVIDERS ARE AVAILABLE
  • 8. Bio-Linux: A scalable solution • Comprehensive, free bioinformatics workstation based on Ubuntu Linux and Debian Med • 11 years & 8 major releases • Around 8000 users from 1600 locations • 200+ bioinf packages including big integrative tools :- QIIME, Galaxy Server, PredictProtein, EMBOSS, ...Incorporates all software Dual BootLinux Live Local Servers Cloud
  • 9. Docker, simplifying the portability of applications and services
  • 10. EOS Cloud • A tenancy in the JASMIN Unmanaged Cloud (& QMUL RCC) • Reusing JASMIN web interfaces and user management to provide custom IaaS software platform • Each receives two VMs – Bio-Linux – Ubuntu Docker hosting environment • Users have total responsibility for instantiated system • Accessible though standard remote desktop tools • Scalability limited by support available
  • 11. Why Cloud? • Data sets can be too big or restricted to easily move – move the compute to the data – Researcher work patterns are maintained • Tools such as Bio-Linux/Docker etc are community enablers • More efficient use of shared resources • Central maintenance of infrastructure • Central Management of data sharing agreements possible • Lower barrier to entry (Compared to traditional HPC and Grid) • What type of cloud? • What role for traditional HPC? TRAINING IS KEY TO MAKING INFORMED CHOICES
  • 12. EOS Cloud next? • Expand currently available resource beyond current limitations? • Create deployable machine image for other cloud marketplaces • EOS/institutional badging to give users confidence in quality
  • 13. Pilot Users • CEH Bioinformaticians using the EOS Cloud to study patterns in microbial biodiversity • Genomic and transcriptomic data from fish toxicogenomics studies at Exeter
  • 14. Pilot Users • Creating compute pipelines and containers for each OSD in silico analysis – HPC, Cloud (IaaS & PaaS) – Portable • Run same analysis on different laptops/grids/clouds – Repeatable/Reproducible • Same input gives same output given that reference databases did not change – Preservation • All analysis tools and dependencies are in one image • Images are simple tar.gz • Preserving Docker and base images is preserving all analysis
  • 15. Software Sustainability Institute www.software.ac.uk The Software Sustainability Institute A national facility for cultivating better, more sustainable, research software to enable world- class research • Software reaches boundaries in its development cycle that prevent improvement, growth and adoption • Providing the expertise and services needed to negotiate to the next stage • Developing the policy and tools to support the community developing and using research software Supported by RCUK
  • 16. Communication Website & blog Campaigns Advice Guides Courses Workshops Fellowship Research Software Policy Training Community Consultancy 41 projects 92 evaluations 4 surgeries 33 UK SWC workshops 1000+ learners 50,000 readers 41 domain ambassadors 20+ workshops organised 740 researchers 50,000 grants analysed 150+ contributed articles 19,000 unique visitors per month 272 RSEs engaged 1700 signatures 13 issues highlighted
  • 17. The end of the beginning, not the beginning of the end! • A holistic approach is required with all parts of e-infrastructure supported from the Hard to Soft to Wet! • Good start up investments need continuity to ensure impact – Certain tools are foundations upon which large swathes of community depend – Putting tools next to immovable data ensures value! • Integrating with larger activities ensure benefits of scaling – you can’t steer something you’re not involved with… • Abstract underpinning e-infrastructure services from the users, as they’re not interested! – Run something on one resource should be able to be moved to others throug hthe use of standards etc! – I have ignored the institutional resources… ****WARNING**** Institute for Environmental Analytics Summer School on e-infrastructure for the environment 19th – 22nd Sept ’16, Oxford.

Editor's Notes

  • #6: Hosted through the Centre for Environmental Data Analysis(CEDA) at STFC the JASMIN service[13] is intended to be a multi user and multi use-case facility that is able to support a wide range of user communities. The system is built in a modular fashion with a hardware layer that is deliberately designed to support multiple different distributed computing paradigms, from HPC type workloads to cloud hosting. Utilising commercial off-the-shelf private cloud software, which supports API level access, the cloud service operated in two modes: firstly a highly controlled Platform as a Service system in the managed cloud, and secondly a more flexible community or user controlled Infrastructure as a Service system, the unmanaged cloud. Users may opt to run in the managed mode, where VMs are standardised to JASMIN approved templates and configured by Puppet[14] scripts that apply centralised access controls, or in the unmanaged mode which we use for the EOS Cloud. Here each user can have full root access rights on their VM once it is deployed and handed over to them, and we as clients of JASMIN can make use of the full vCloud Director API[15] to assign resources to VMs according to our own policy. A further advantage of working on JASMIN is that, as a community cloud, the service already hosts, and synchronises data from relevant providers. These include large reference datasets which our target user community finds useful. Users can rely on this resource and only need to concern themselves with the transfer of their own data into and out of the cloud.
  • #9: Bio-Linux[1] is a major success with widespread adoption across multiple communities. It provides a method by which personal bioinformatics workstations can be set up and maintained by novice users, removing the need for individuals to compile and install packages themselves. The project has an active user and developer community and engages with other open-source community efforts like Debian Med[2], Galaxy[3], Open Bioinformatics Foundation[4], etc. Workstations running Bio-Linux connect to a package repository where hundreds of tools have been built into Debian (.deb) packages and are tested and continually updated by the community of maintainers. Installation or removal of even complex tools is trivial for the end user, and updates are automatic. Bio-Linux has traditionally been installed onto bare-metal, possibly on a dual-boot workstation, or run directly from a USB stick for training and demo purposes. In recent times, use on local Virtual Machines(VM) and Cloud systems has increased, with a community CloudBioLinux[5] effort building and releasing machine images on the Amazon EC2[6] public cloud. Also, partly through this current project, support for local VM installations (on VirtualBox[7], Parallels[8], VMWare[9]) has been made a core part of each Bio-Linux release. Users can obtain an Open Virtualization Archive (OVA)[10] file which loads directly into these products to instantly provide a functional desktop system. It is this same pre-packaged virtualised workstation that users now access on the EOS Cloud.
  • #10: Docker[11] allows a uniform platform-independent application to be built and encapsulated in a format know as a container. As the research community, and people in general, make use of a large number of different operating systems the complexity of installing applications on new systems and can be high. Sometimes there may be incompatibilities between the libraries and helper tools needed by an application and those on the target platform. The Docker approach aims to get round this by providing single application hosting platform irrespective of hosting system/operating system etc. We have recently seen within the bioinformatics community an increase in sharing of applications and tools in this manner. One problem faced when using the Docker hosting environment is that the applications that are installed and running within it look and feel like remotely hosted applications, they do not behave as if they have been installed on the users machine. Therefore a tool that bridges this usability gap would greatly increase the acceptability of tools and services that are installed and distributed using Docker.
  • #14: Hyun Soon Gweon and Katja Lehmann, two bioinformaticians as CEH, are familiar with using Bio-Linux on the departmental server and personal VirtualBox VMs. They are trialling the system as part of their regular work and to provide a shared computing environment with project collaborators in microbial community profiling. Ranny van Aerle at University of Exeter is unable to run a Bio-Linux workstation at his home institute due to local IT policies and budget constraints. He needs a basic workstation plus occasionally some extra computing power. While he is not at present being charged for use of the pilot system, he represents exactly the type of user for whom a paid subscription to the service would make economic sense. Since being given access to a Bio-Linux instance on EOS Cloud in October 2014 he has used it to assemble bacterial genomes, annotate large metagenomics datasets using Blast and Diamond, visualise Blast/Diamond results in MEGAN and generally to try out new software and scripts.
  • #16: The Software Sustainability Institute can help with: software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more… Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchers Providing services for research software users and developers Developing research community interactions and capacity Promoting research software best practice and capability
  • #17: KEY POINT: Get across what the Institute does This is a conceptual breakdown to make the Institute’s activities easier to manage. In reality the Institute staff work across themes, contributing to many activities, as software sustainability cannot be addressed just through individual activities working alone. Specific activities: Software Carpentry courses teach researchers key computational skills Helping to spin up ELIXIR-UK training activities DIRAC Drivers License to get best use out of HPC resources Advising Wellcome Trust on software policy Guides on wide range of subjects We understand the linkage between all of these requirements, issues and activities because we’ve investigated the research communities reliance on software in depth.