SlideShare a Scribd company logo
Giovanni Dall'Olio,
          IBE (UPF-CEXS)




Introduction to version control
 and hg for our bioinformatics
             group
What is hg?
●   Programmers use software to keep track of all
    the versions of the code they write. These are
    called Version Control Systems (VCS)
●   There are many software to make VCS; the
    most renown are cvs, subversion, git, hg,
    bazaar
●   Git, hg and bazaar are newer and based on an
    improved paradigm called Distributed Version
    Control System (DVCS)
How will hg be useful for us?
●   Keep versions of the scripts we create
    ●   also for the datasets, results, etc..
●   Have a common and official version of the
    pipeline and the scripts, on bitbucket.org
●   Everybody will work on his computer on his
    version of the scripts; every once in a while, he
    will merge it with the official version
Installing hg
●   Hg can run on any operating system
●   On linux, install it through your software center
    ●   sudo apt-get install mercurial
●   On other OS, go to http://mercurial.selenic.com/
    and download the installer
Initial hg configuration
●   Hg stores its configuration in a file called:
     ●   ~/.hgrc on Unix
     ●   C:Documents and Settingsyour_name.hgrc
●   Open it and write your username:

    [ui]
    username = Giovanni Dall'Olio <dalloliogm@gmail.com>
Hg version control bioinformaticians
The basic operations of a VCS
●   Creating a repository
    ●   Can be equivalent to 'start keeping track of the
        version of the files in this project'
●   Adding files to the repository
    ●   Files are not tracked unless you say so
●   Committing changes
    ●   Saving a version of the actual state of the files
●   Pushing the changes and merging them with
    the standard version
Creating a repository
●   Create a new directory and create the repo
    with:
    ●   hg init
Effect of creating a new repo
●   An hidden directory (.hg) will be created
●   From now on, it will be possible to give other hg
    commands
Adding files to the repo
●   By default, no files are added to the repository
●   It means that if you create a new file in the
    directory, hg will ignore it
Creating a file
Files are not added automatically to
               the repo
●   The command:
    ●   hg log file.txt
●   should return the historial of changes of the file
    file.txt. Since it is not in the repo yet, nothing is
    shown
hg add
●   To add a file to the repository, use hg add
●   This will mean that the software should record
    all the changes on that file
Committing changes
●   The most important operation in VCS is the
    commit
●   This operation saves the status of the files
    tracked and associate it with a version
●   One commit → one version
Committing a change
●   We have added the file file.txt to the repo
●   This is a change compared to the previous
    version (where this file was not present)
    ●   So we have to record it with a commit
Our first commit
Effects of adding a file and
                committing
●   From now on, all the changes made to the file
    will be tracked
What is being 'committed'?
●   Every time you commit a new version, hg
    stores the set of changes since the previous
    version
●   Other old VCS stored a copy of all the files for
    each version
    ●   => very big disk space occupation
●   By storing only the changes, hg occupies less
    space and makes it easier to compare versions
Hg diff
●   The hg diff command will show the differences
    between the file and its last saved version
Hg log
●   Hg log will show the history of the changes in
    the repository
Hg log
The story continues..
●   The basic operations in a VCS are adding files
    to the tracking, and commit changes
●   Next week we will see how to keep a copy of
    our repository on a remote server, and how to
    collaborate with other people
●   Now I will show you some example of using a
    version control system
Example: backup
●   Imagine that for error, you remove a file or a
    directory from your project
●   With a VCS, you can revert to the previous
    version and get the files back
Example: tracking code
●   VCS have been developed to track changes in
    the code
    ●   Return to the point where you have made a mistake
        or a typo
    ●   Implementing a parallel version of the code, like
        trying a different library or approach (branching)
    ●   Remember what you have been doing, when you
        have to change code written months ago
Example: releasing a software
●   Mr. Werewolf publishes a software to predict
    when the moon will be full
●   The code gets adopted by the werewolf
    community. Papers got published using it
●   At a certain point, another werewolf discover a
    bug in the code. It will be possible to seek the
    version where the error occurred and identify all
    the versions affected
Example: tracking data
●   Version control can be applied to a dataset
●   Example: Mr Dracula wants to write a paper on
    the quality of the blood in his neighborhood.
    Every time he gets new data, he commits a
    change
Tracking everything else
●   VCS can be applied to many kinds of file
●   Usually they do not support binary files
●   OpenOffice documents can be tracked (they
    are XML)
Tracking huge files
●   Hg stores the differences between two versions
●   Storing all the 1000g will take:
    ●   Some gigabytes to store a compressed version of the
        files
    ●   Less space to store the following commits (but these
        commits will take time)
●   Maybe it is not worth to put gigabytes of data
    under version control
    ●   No solution to date
    ●   Some hg extensions for big files
How frequently should I commit?
●   Everybody has his/her own phylosophy
    ●   Some people prefer to commit every smallest
        change
    ●   Others prefer to make only a big commit every day
●   As a general rule:
    ●   The biggest the commit is, the most difficult is to
        integrate it if there are conflicts
    ●   It's up to you to decide
How to write the perfect commit
               messages
●   One or two sentences
●   Avoid generic messages
    ●   “new changes”, “fixed bugs”
●   Use tags like 'Fix', 'Add', 'Config', etc..:
    ●   “Fix: error when reading file”
    ●   “Add: new function for plotting results”
●   Cite the files changed if you think it may be
    useful:
    ●   Implemented new sorting algorithm for sorting.py

More Related Content

PDF
Hg for bioinformatics, second part
PDF
Version control
PDF
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
PDF
Using GIT for Everyone
ODP
Evolution of Version Control In Open Source
PPTX
Git and github fundamentals
PDF
BLUG 2012 Version Control for Notes Developers
PDF
What is version control software and why do you need it?
Hg for bioinformatics, second part
Version control
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Using GIT for Everyone
Evolution of Version Control In Open Source
Git and github fundamentals
BLUG 2012 Version Control for Notes Developers
What is version control software and why do you need it?

What's hot (20)

PDF
Software Versioning with Bitbucket and Eclipse
PPTX
Introduction to git and github
PPTX
Git Basics
PDF
Git - The Incomplete Introduction
PPTX
Introduction to Git and Github
PPTX
Git and GitHub
PPTX
Git hub
PDF
From SVN to Git
PDF
Git for IBM Notes Designer
KEY
Intro to git
PPTX
HacktoberFest-Git&GitHub
PDF
Git for beginners
ODP
Intro to XPages for Administrators (DanNotes, November 28, 2012)
PPTX
Git in 10 minutes
PDF
Introduction to Git
PDF
Advanced Git Tutorial
PDF
Introduction to Git
PDF
Git (Sistema Distribuido de Control de Versiones)
PPT
Introducción a Git
ODP
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
Software Versioning with Bitbucket and Eclipse
Introduction to git and github
Git Basics
Git - The Incomplete Introduction
Introduction to Git and Github
Git and GitHub
Git hub
From SVN to Git
Git for IBM Notes Designer
Intro to git
HacktoberFest-Git&GitHub
Git for beginners
Intro to XPages for Administrators (DanNotes, November 28, 2012)
Git in 10 minutes
Introduction to Git
Advanced Git Tutorial
Introduction to Git
Git (Sistema Distribuido de Control de Versiones)
Introducción a Git
Source Control with Domino Designer 8.5.3 and Git (DanNotes, November 28, 2012)
Ad

Viewers also liked (8)

PPT
Please Feed the Unicorns
PDF
How to use incentives actively to increase revenue
DOC
Machine skills ii
 
PDF
wob brand afternoon 12 - B2B 2.0
PDF
2 19-14
PDF
PassionBusiness Workshop - Day 1
PDF
Adam Rusbridge 2010 conf
Please Feed the Unicorns
How to use incentives actively to increase revenue
Machine skills ii
 
wob brand afternoon 12 - B2B 2.0
2 19-14
PassionBusiness Workshop - Day 1
Adam Rusbridge 2010 conf
Ad

Similar to Hg version control bioinformaticians (20)

PDF
Getting Started with (Distributed) Version Control
ZIP
Mercurial Distributed Version Control
PDF
Distributed Version Control (DVCS) With Mercurial
PPTX
Slide set 7 (Source Code Management History Overview) - Copy.pptx
PDF
L06 a versioning_system_overview
PPTX
Understanding GIT and Version Control
PPT
Introduction to Git for developers
KEY
Version Control Lassosoft 2009 Lasso Developers Conference
PDF
Mercurial The Definitive Guide 1st Edition Bryan O'Sullivan
PDF
Introduction to Version Control
PDF
Mercurial: Modern Source Control Management
PDF
Svn workflow
PPTX
[Mas 500] Software Development Strategies
PDF
Progressive Revision Control
PDF
Version Control History and Git Basics
PDF
Git Going With DVCS v1.5.2
PDF
Git introduction
PPTX
01 - Introduction to Version Control
PDF
The Mercurial SCM
PPT
Introduction to git
Getting Started with (Distributed) Version Control
Mercurial Distributed Version Control
Distributed Version Control (DVCS) With Mercurial
Slide set 7 (Source Code Management History Overview) - Copy.pptx
L06 a versioning_system_overview
Understanding GIT and Version Control
Introduction to Git for developers
Version Control Lassosoft 2009 Lasso Developers Conference
Mercurial The Definitive Guide 1st Edition Bryan O'Sullivan
Introduction to Version Control
Mercurial: Modern Source Control Management
Svn workflow
[Mas 500] Software Development Strategies
Progressive Revision Control
Version Control History and Git Basics
Git Going With DVCS v1.5.2
Git introduction
01 - Introduction to Version Control
The Mercurial SCM
Introduction to git

More from Giovanni Marco Dall'Olio (20)

PPTX
Applicazioni di chatGPT e altri LLMs per la ricerca di farmaci
PDF
Fehrman Nat Gen 2014 - Journal Club
PDF
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
PDF
PDF
Linux intro 5 extra: awk
PDF
Linux intro 5 extra: makefiles
PDF
Linux intro 4 awk + makefile
PDF
Linux intro 3 grep + Unix piping
PDF
Linux intro 2 basic terminal
PDF
Linux intro 1 definitions
PDF
Wagner chapter 5
PDF
Wagner chapter 4
PDF
Wagner chapter 3
PDF
Wagner chapter 2
PDF
Wagner chapter 1
PDF
The true story behind the annotation of a pathway
PDF
Plotting data with python and pylab
ODP
Makefiles Bioinfo
ODP
biopython, doctest and makefiles
Applicazioni di chatGPT e altri LLMs per la ricerca di farmaci
Fehrman Nat Gen 2014 - Journal Club
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Linux intro 5 extra: awk
Linux intro 5 extra: makefiles
Linux intro 4 awk + makefile
Linux intro 3 grep + Unix piping
Linux intro 2 basic terminal
Linux intro 1 definitions
Wagner chapter 5
Wagner chapter 4
Wagner chapter 3
Wagner chapter 2
Wagner chapter 1
The true story behind the annotation of a pathway
Plotting data with python and pylab
Makefiles Bioinfo
biopython, doctest and makefiles

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf

Hg version control bioinformaticians

  • 1. Giovanni Dall'Olio, IBE (UPF-CEXS) Introduction to version control and hg for our bioinformatics group
  • 2. What is hg? ● Programmers use software to keep track of all the versions of the code they write. These are called Version Control Systems (VCS) ● There are many software to make VCS; the most renown are cvs, subversion, git, hg, bazaar ● Git, hg and bazaar are newer and based on an improved paradigm called Distributed Version Control System (DVCS)
  • 3. How will hg be useful for us? ● Keep versions of the scripts we create ● also for the datasets, results, etc.. ● Have a common and official version of the pipeline and the scripts, on bitbucket.org ● Everybody will work on his computer on his version of the scripts; every once in a while, he will merge it with the official version
  • 4. Installing hg ● Hg can run on any operating system ● On linux, install it through your software center ● sudo apt-get install mercurial ● On other OS, go to http://mercurial.selenic.com/ and download the installer
  • 5. Initial hg configuration ● Hg stores its configuration in a file called: ● ~/.hgrc on Unix ● C:Documents and Settingsyour_name.hgrc ● Open it and write your username: [ui] username = Giovanni Dall'Olio <dalloliogm@gmail.com>
  • 7. The basic operations of a VCS ● Creating a repository ● Can be equivalent to 'start keeping track of the version of the files in this project' ● Adding files to the repository ● Files are not tracked unless you say so ● Committing changes ● Saving a version of the actual state of the files ● Pushing the changes and merging them with the standard version
  • 8. Creating a repository ● Create a new directory and create the repo with: ● hg init
  • 9. Effect of creating a new repo ● An hidden directory (.hg) will be created ● From now on, it will be possible to give other hg commands
  • 10. Adding files to the repo ● By default, no files are added to the repository ● It means that if you create a new file in the directory, hg will ignore it
  • 12. Files are not added automatically to the repo ● The command: ● hg log file.txt ● should return the historial of changes of the file file.txt. Since it is not in the repo yet, nothing is shown
  • 13. hg add ● To add a file to the repository, use hg add ● This will mean that the software should record all the changes on that file
  • 14. Committing changes ● The most important operation in VCS is the commit ● This operation saves the status of the files tracked and associate it with a version ● One commit → one version
  • 15. Committing a change ● We have added the file file.txt to the repo ● This is a change compared to the previous version (where this file was not present) ● So we have to record it with a commit
  • 17. Effects of adding a file and committing ● From now on, all the changes made to the file will be tracked
  • 18. What is being 'committed'? ● Every time you commit a new version, hg stores the set of changes since the previous version ● Other old VCS stored a copy of all the files for each version ● => very big disk space occupation ● By storing only the changes, hg occupies less space and makes it easier to compare versions
  • 19. Hg diff ● The hg diff command will show the differences between the file and its last saved version
  • 20. Hg log ● Hg log will show the history of the changes in the repository
  • 22. The story continues.. ● The basic operations in a VCS are adding files to the tracking, and commit changes ● Next week we will see how to keep a copy of our repository on a remote server, and how to collaborate with other people ● Now I will show you some example of using a version control system
  • 23. Example: backup ● Imagine that for error, you remove a file or a directory from your project ● With a VCS, you can revert to the previous version and get the files back
  • 24. Example: tracking code ● VCS have been developed to track changes in the code ● Return to the point where you have made a mistake or a typo ● Implementing a parallel version of the code, like trying a different library or approach (branching) ● Remember what you have been doing, when you have to change code written months ago
  • 25. Example: releasing a software ● Mr. Werewolf publishes a software to predict when the moon will be full ● The code gets adopted by the werewolf community. Papers got published using it ● At a certain point, another werewolf discover a bug in the code. It will be possible to seek the version where the error occurred and identify all the versions affected
  • 26. Example: tracking data ● Version control can be applied to a dataset ● Example: Mr Dracula wants to write a paper on the quality of the blood in his neighborhood. Every time he gets new data, he commits a change
  • 27. Tracking everything else ● VCS can be applied to many kinds of file ● Usually they do not support binary files ● OpenOffice documents can be tracked (they are XML)
  • 28. Tracking huge files ● Hg stores the differences between two versions ● Storing all the 1000g will take: ● Some gigabytes to store a compressed version of the files ● Less space to store the following commits (but these commits will take time) ● Maybe it is not worth to put gigabytes of data under version control ● No solution to date ● Some hg extensions for big files
  • 29. How frequently should I commit? ● Everybody has his/her own phylosophy ● Some people prefer to commit every smallest change ● Others prefer to make only a big commit every day ● As a general rule: ● The biggest the commit is, the most difficult is to integrate it if there are conflicts ● It's up to you to decide
  • 30. How to write the perfect commit messages ● One or two sentences ● Avoid generic messages ● “new changes”, “fixed bugs” ● Use tags like 'Fix', 'Add', 'Config', etc..: ● “Fix: error when reading file” ● “Add: new function for plotting results” ● Cite the files changed if you think it may be useful: ● Implemented new sorting algorithm for sorting.py