SlideShare a Scribd company logo
Deep Seq Data Analysis
Part II
Christophe.antoniewski@upmc.fr
http://drosophile.org
Mouse Genetics
January 29, 2015, 13:30–
15:00
http://fr.slideshare.net/christopheantoniewski/
The article
The method section available on line
RNA isolation and library construction
Both human and mouse blastomeres were prepared using identical protocols. Single
blastomeres were isolated by removing the zona pellucida using acidic tyrode
solution (Sigma, catalogue no. T1788), then separated by gentle mouth pipetting in a
calcium-free medium. Single cells were washed twice with 1× PBS containing 0.1%
BSA before placing in lysis buffer. RNA was isolated from single cells or single morula
embryos and amplified as described previously14. Library construction was
performed following Illumina manufacturer suggestions. Libraries were sequenced
on the Illumina Hiseq2000 platform and sequencing reads that contained polyA, low
quality, and adapters were pre-filtered before mapping. Filtered reads were mapped
to the hg19 genome and mm9 genome using default parameters from BWA aligner29,
and reads that failed to map to the genome were re-mapped to their respective
mRNA sequences to capture reads that span exons.
Transcriptional profiling
In both human and mouse cases, data normalization was performed by transforming
uniquely mapped transcript reads to RPKM30. Genes with low expression in all stages
(average RPKM < 0.5) were filtered out, followed by quantile normalization. For
differential expression, we compared every time point to its previous time point
using default parameters in DESeq using normalized read counts. Genes were called
differentially expressed if they exhibited a Benjamini and Hochberg–adjusted P value
(FDR) <5% and a mean fold change of >2.
Data 1
GEO dataset accession: GSE44183
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE44183
• Take the SRP identifier at the bottom of the page: SRP018525
• Search for this identifier in EBI SRA ENA SRA Galaxy tool
• Check for your experiment accession by clicking on the SRX…. links
• Click on the fastq files (galaxy) links
 Files are uploaded in yellow datasets that show up in the current history
GSM1080195: mouse oocyte 1; Mus musculus; RNA-Seq
1 ILLUMINA (Illumina HiSeq 2000) run: 16.4M spots, 3G bases, 1.9Gb downloads
Accession: SRX229784
GSM1080196: mouse oocyte 2; Mus musculus; RNA-Seq
1 ILLUMINA (Illumina HiSeq 2000) run: 20.2M spots, 3.6G bases, 2.4Gb downloads
Accession: SRX229785
GSM1080197: mouse pronuclei 1; Mus musculus; RNA-Seq
1 ILLUMINA (Illumina HiSeq 2000) run: 17.2M spots, 3.1G bases, 2Gb downloads
Accession: SRX229786
GSM1080198: mouse pronuclei 2; Mus musculus; RNA-Seq
1 ILLUMINA (Illumina HiSeq 2000) run: 12.8M spots, 2.3G bases, 1.5Gb downloads
Accession: SRX229787
GSM1080199: mouse pronuclei 3; Mus musculus; RNA-Seq
1 ILLUMINA (Illumina HiSeq 2000) run: 12.4M spots, 2.2G bases, 1.5Gb downloads
Accession: SRX229788
• Register in mississippi.fr
• Take an identifier :
oocyte1@pasteur.fr
• oocyte2@pasteur.fr
• pronuclei1@pasteur.fr
• pronuclei2@pasteur.fr
• pronuclei3@pasteur.fr
• And the same password:
gsgalaxy
• Click on “Analyze Data”
• You are by default on an unnamed
history
• Name it “Datasets”
Data 2
• Click on “Share Data  Data Libraries”
• Click on “Public Datasets”
• Click on “Mouse Pasteur”
• Check boxes corresponding RefSeq_Genes_mm9.gtf, and your datasets
• Click on the “Go” item
• Click on “Analyze Data”
• Look at the imported data sets (3 green boxes)
• Look at their content (eye)
• Look at their metadata (info icon)
The dataset are already available from the server
Read Mapping
1. Type “fastqc” in the search field at the left-hand column
2. Click on “FastQC:Read QC reports using FastQC”
3. Select your first fastq data set
4. Run the tool
5. Select the yellow box (running tool)
6. Click on the “redo” box
7. Select your second fastq data set
8. Run the tool  it will take 4-5 min max
9. Search for “bwa” in the tool search field
10. Select “Map with BWA for Illumina”
11. Lets have a look to the tool form
Filtered reads were mapped to the hg19 genome and mm9 genome using
default parameters from BWA aligner29, and reads that failed to map to the
genome were re-mapped to their respective mRNA sequences to capture
reads that span exons.
1. The procedure is not reproducible because metadata and
parameters are lacking.
2. The procedure is out of date
• The article has been published in 2013
• Tophat has been published in 2009, 2012 – Tophat2 in April 2013
Look at fastQC results
Read Mapping using Tophat2
See https://wiki.galaxyproject.org/Events/GCC2014/TrainingDay?action=AttachFile&do=view&target=RNA-SeqAltSlides.pdf
For a nice introduction to RNA-seq analysis
Read Mapping using Tophat2 in Galaxy
1. Create a new history and name it “tophat2 alignment”
2. Copy your 2 fastq files from the previous history, as well as the RefSeq.gtf reference file
3. Rename the files and put an annotation
4. Find and fill in the tophat2 tool form
5. Run the tool
6. Select your first fastq data set
7. Run the tool
8. While it is running look at the metadata
9. Rename the datasets using the pencil box
10. Import Two other datasets
11. Re-run the Tophat2 on these datasets
12. Look at the job in the admin panel (reproducible analyses)
13. Look at the tool on the galaxy tool repository
14. Stop all running tools
15. Import the history “GS SRP018525 tophat2”
16. Visualize your reads in Trackster (1 gtf track + 1 condition mapping)
17. Optional, visualize junctions, etc…
18. Compare with another public genome browser (UCSC or Ensembl)
Paired-end reads were mapped to the mm9 genome using Tophat2 the
parameters ---, and the RefSeq gtf mm9 annotation as a guide.
Read Counting using featureCounts in
Galaxy
1. Create a new history called “Read Counts”
2. Copy the accepted hits datasets from the “imported: GS SRP018525 tophat2” history
as well as the RefSef GTF guide
3. You have now 6 datasets in the “Read Counts” history
4. Run feature count once on oocyte 1 data
5. Re-run the tool for oocyte 2 and pronuclei 1, 2, 3
6. Change the metadata of featureCount summaries
7. Iteratively paste the featureCounts outputs using the Paste two files side by side tool
8.  We have a hit Table
9. Rename it FeatureCounts HIT TABLE
10. We can visualize data using chart
Differential count analysis
1. Create a new history called “Differential count analysis”
2. Copy the “FeatureCounts HIT TABLE”
3. Run “Differential_Count models using BioConductor packages” on the FeatureCounts
HIT TABLE
4. Review the results
5. Yet, we did not reproduce the sup Fig. 1
DESeq Analysis
1. Let’s examine Fig.1, together with the published methods
2. The information is wrong, but we will approach the figure, trying to guess what has
been really done
3. Copy the “FeatureCounts HIT TABLE” in a new history called “my DESeq approach”
4. To run the Deseq(1) package we need to reformat the HIT TABLE
5. With a text editor OR within Galaxy
1. Cut columns
2. Remove header
3. Upload new header
4. Manipulate header
5. Concatenate files
6. Run the tool “DESeq Profiling (replicates) with sample replicates”
7. Get the R code available in the public library: Rscript_for_Sup_Fig1a
8. Run the Docker Tool Factory tool with this R code to generate the figure
9. Run the tool “DESeq2 Profiling”
10. Re-run the Docker Tool Factory tool with the same R code on the DESeq2 DE analysis
Transcriptional profiling
In both human and mouse cases, data normalization was performed by
transforming uniquely mapped transcript reads to RPKM30. Genes with low
expression in all stages (average RPKM < 0.5) were filtered out, followed by
quantile normalization. For differential expression, we compared every time
point to its previous time point using default parameters in DESeq using
normalized read counts. Genes were called differentially expressed if they
exhibited a Benjamini and Hochberg–adjusted P value (FDR) <5% and a mean
fold change of >2.
Optional: comparison between the
tophat2 approach and the BWA
approach
1. Sharing the “SRP018525 BWA” history
2. Sharing the “Comparison BWA / Tophat” visualization
3. Analyze the differences

More Related Content

PDF
Virtual Bash! A Lunchtime Introduction to Kafka
PDF
Concurrent programming with RTOS
PPTX
Potential interview
PDF
الأمر بالإتباع والنهي عن الإبتداع للإمام السيوطي
PPTX
Croquet Records - Nonprofit record label and band incubator
PPTX
Questionnaire radio trailer
PPTX
Evidencias 2 parcial powerpoint
DOCX
Fungsi otot
Virtual Bash! A Lunchtime Introduction to Kafka
Concurrent programming with RTOS
Potential interview
الأمر بالإتباع والنهي عن الإبتداع للإمام السيوطي
Croquet Records - Nonprofit record label and band incubator
Questionnaire radio trailer
Evidencias 2 parcial powerpoint
Fungsi otot

Viewers also liked (12)

PPTX
Joe Bodfish Evaluation Question 3
PPTX
Target audeince style
PPTX
Evaluation q3
PPTX
Beautiful south india
PDF
منهج الشهرستاني في كتابه الملل والنحل عرض وتقويم
PDF
ذم الأشاعرة والمتكلمين والفلاسفة لأحمد بن الصديق الغماري
PPTX
Nepal bhutan-srilanka
PPTX
Final proposal
PPTX
Conventions of media
PPTX
Incredible north & south India Tour Packages
PDF
The Difference between Advising and Condemning
DOCX
AT Aug 10 '16
Joe Bodfish Evaluation Question 3
Target audeince style
Evaluation q3
Beautiful south india
منهج الشهرستاني في كتابه الملل والنحل عرض وتقويم
ذم الأشاعرة والمتكلمين والفلاسفة لأحمد بن الصديق الغماري
Nepal bhutan-srilanka
Final proposal
Conventions of media
Incredible north & south India Tour Packages
The Difference between Advising and Condemning
AT Aug 10 '16
Ad

Similar to Pasteur deep seq analysis practical Part - 2015 (20)

PDF
Advanced Bioinformatics- NGS Data analysis
DOCX
1_chlamydia task completely best.docx
PDF
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
PPTX
Bioinfo ngs data format visualization v2
PPTX
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
PPTX
The Transformation of Systems Biology Into A Large Data Science
PDF
An open source framework for processing daily satellite images (AVHRR) over l...
PDF
TAU E4S ON OpenPOWER /POWER9 platform
DOCX
Supercomputer - Overview
PPTX
Quick start with pallaral meta on window10 virtual desktop-virtualbox linux u...
PPTX
Automatic Launch and Tracking the Computational Simulations with LiFlow and S...
PDF
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
DOCX
PurposeThe purpose of this project is to provide non-trivial .docx
PPTX
FAIR Projector Builder
PDF
Summary of Journal_ShenLu_Summer2013
PPTX
Cool Informatics Tools and Services for Biomedical Research
PDF
Accelerating GWAS epistatic interaction analysis methods
PPT
Oracle Golden Gate
PDF
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
PDF
HiPEAC 2019 Tutorial - Maestro RTOS
Advanced Bioinformatics- NGS Data analysis
1_chlamydia task completely best.docx
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Bioinfo ngs data format visualization v2
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
The Transformation of Systems Biology Into A Large Data Science
An open source framework for processing daily satellite images (AVHRR) over l...
TAU E4S ON OpenPOWER /POWER9 platform
Supercomputer - Overview
Quick start with pallaral meta on window10 virtual desktop-virtualbox linux u...
Automatic Launch and Tracking the Computational Simulations with LiFlow and S...
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
PurposeThe purpose of this project is to provide non-trivial .docx
FAIR Projector Builder
Summary of Journal_ShenLu_Summer2013
Cool Informatics Tools and Services for Biomedical Research
Accelerating GWAS epistatic interaction analysis methods
Oracle Golden Gate
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
HiPEAC 2019 Tutorial - Maestro RTOS
Ad

Recently uploaded (20)

PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
PPTX
SKIN Anatomy and physiology and associated diseases
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PDF
Copy of OB - Exam #2 Study Guide. pdf
PPTX
Note on Abortion.pptx for the student note
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PPTX
Imaging of parasitic D. Case Discussions.pptx
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPTX
Respiratory drugs, drugs acting on the respi system
PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PPTX
ACID BASE management, base deficit correction
PPTX
post stroke aphasia rehabilitation physician
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
PPT
Breast Cancer management for medicsl student.ppt
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
Transforming Regulatory Affairs with ChatGPT-5.pptx
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
Clinical approach and Radiotherapy principles.pptx
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
SKIN Anatomy and physiology and associated diseases
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Copy of OB - Exam #2 Study Guide. pdf
Note on Abortion.pptx for the student note
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
Imaging of parasitic D. Case Discussions.pptx
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
Respiratory drugs, drugs acting on the respi system
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
ACID BASE management, base deficit correction
post stroke aphasia rehabilitation physician
OPIOID ANALGESICS AND THEIR IMPLICATIONS
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
Breast Cancer management for medicsl student.ppt
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
Transforming Regulatory Affairs with ChatGPT-5.pptx
شيت_عطا_0000000000000000000000000000.pdf
Clinical approach and Radiotherapy principles.pptx

Pasteur deep seq analysis practical Part - 2015

  • 1. Deep Seq Data Analysis Part II Christophe.antoniewski@upmc.fr http://drosophile.org Mouse Genetics January 29, 2015, 13:30– 15:00 http://fr.slideshare.net/christopheantoniewski/
  • 3. The method section available on line RNA isolation and library construction Both human and mouse blastomeres were prepared using identical protocols. Single blastomeres were isolated by removing the zona pellucida using acidic tyrode solution (Sigma, catalogue no. T1788), then separated by gentle mouth pipetting in a calcium-free medium. Single cells were washed twice with 1× PBS containing 0.1% BSA before placing in lysis buffer. RNA was isolated from single cells or single morula embryos and amplified as described previously14. Library construction was performed following Illumina manufacturer suggestions. Libraries were sequenced on the Illumina Hiseq2000 platform and sequencing reads that contained polyA, low quality, and adapters were pre-filtered before mapping. Filtered reads were mapped to the hg19 genome and mm9 genome using default parameters from BWA aligner29, and reads that failed to map to the genome were re-mapped to their respective mRNA sequences to capture reads that span exons. Transcriptional profiling In both human and mouse cases, data normalization was performed by transforming uniquely mapped transcript reads to RPKM30. Genes with low expression in all stages (average RPKM < 0.5) were filtered out, followed by quantile normalization. For differential expression, we compared every time point to its previous time point using default parameters in DESeq using normalized read counts. Genes were called differentially expressed if they exhibited a Benjamini and Hochberg–adjusted P value (FDR) <5% and a mean fold change of >2.
  • 4. Data 1 GEO dataset accession: GSE44183 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE44183 • Take the SRP identifier at the bottom of the page: SRP018525 • Search for this identifier in EBI SRA ENA SRA Galaxy tool • Check for your experiment accession by clicking on the SRX…. links • Click on the fastq files (galaxy) links  Files are uploaded in yellow datasets that show up in the current history GSM1080195: mouse oocyte 1; Mus musculus; RNA-Seq 1 ILLUMINA (Illumina HiSeq 2000) run: 16.4M spots, 3G bases, 1.9Gb downloads Accession: SRX229784 GSM1080196: mouse oocyte 2; Mus musculus; RNA-Seq 1 ILLUMINA (Illumina HiSeq 2000) run: 20.2M spots, 3.6G bases, 2.4Gb downloads Accession: SRX229785 GSM1080197: mouse pronuclei 1; Mus musculus; RNA-Seq 1 ILLUMINA (Illumina HiSeq 2000) run: 17.2M spots, 3.1G bases, 2Gb downloads Accession: SRX229786 GSM1080198: mouse pronuclei 2; Mus musculus; RNA-Seq 1 ILLUMINA (Illumina HiSeq 2000) run: 12.8M spots, 2.3G bases, 1.5Gb downloads Accession: SRX229787 GSM1080199: mouse pronuclei 3; Mus musculus; RNA-Seq 1 ILLUMINA (Illumina HiSeq 2000) run: 12.4M spots, 2.2G bases, 1.5Gb downloads Accession: SRX229788 • Register in mississippi.fr • Take an identifier : oocyte1@pasteur.fr • oocyte2@pasteur.fr • pronuclei1@pasteur.fr • pronuclei2@pasteur.fr • pronuclei3@pasteur.fr • And the same password: gsgalaxy • Click on “Analyze Data” • You are by default on an unnamed history • Name it “Datasets”
  • 5. Data 2 • Click on “Share Data  Data Libraries” • Click on “Public Datasets” • Click on “Mouse Pasteur” • Check boxes corresponding RefSeq_Genes_mm9.gtf, and your datasets • Click on the “Go” item • Click on “Analyze Data” • Look at the imported data sets (3 green boxes) • Look at their content (eye) • Look at their metadata (info icon) The dataset are already available from the server
  • 6. Read Mapping 1. Type “fastqc” in the search field at the left-hand column 2. Click on “FastQC:Read QC reports using FastQC” 3. Select your first fastq data set 4. Run the tool 5. Select the yellow box (running tool) 6. Click on the “redo” box 7. Select your second fastq data set 8. Run the tool  it will take 4-5 min max 9. Search for “bwa” in the tool search field 10. Select “Map with BWA for Illumina” 11. Lets have a look to the tool form Filtered reads were mapped to the hg19 genome and mm9 genome using default parameters from BWA aligner29, and reads that failed to map to the genome were re-mapped to their respective mRNA sequences to capture reads that span exons. 1. The procedure is not reproducible because metadata and parameters are lacking. 2. The procedure is out of date • The article has been published in 2013 • Tophat has been published in 2009, 2012 – Tophat2 in April 2013
  • 7. Look at fastQC results
  • 8. Read Mapping using Tophat2 See https://wiki.galaxyproject.org/Events/GCC2014/TrainingDay?action=AttachFile&do=view&target=RNA-SeqAltSlides.pdf For a nice introduction to RNA-seq analysis
  • 9. Read Mapping using Tophat2 in Galaxy 1. Create a new history and name it “tophat2 alignment” 2. Copy your 2 fastq files from the previous history, as well as the RefSeq.gtf reference file 3. Rename the files and put an annotation 4. Find and fill in the tophat2 tool form 5. Run the tool 6. Select your first fastq data set 7. Run the tool 8. While it is running look at the metadata 9. Rename the datasets using the pencil box 10. Import Two other datasets 11. Re-run the Tophat2 on these datasets 12. Look at the job in the admin panel (reproducible analyses) 13. Look at the tool on the galaxy tool repository 14. Stop all running tools 15. Import the history “GS SRP018525 tophat2” 16. Visualize your reads in Trackster (1 gtf track + 1 condition mapping) 17. Optional, visualize junctions, etc… 18. Compare with another public genome browser (UCSC or Ensembl) Paired-end reads were mapped to the mm9 genome using Tophat2 the parameters ---, and the RefSeq gtf mm9 annotation as a guide.
  • 10. Read Counting using featureCounts in Galaxy 1. Create a new history called “Read Counts” 2. Copy the accepted hits datasets from the “imported: GS SRP018525 tophat2” history as well as the RefSef GTF guide 3. You have now 6 datasets in the “Read Counts” history 4. Run feature count once on oocyte 1 data 5. Re-run the tool for oocyte 2 and pronuclei 1, 2, 3 6. Change the metadata of featureCount summaries 7. Iteratively paste the featureCounts outputs using the Paste two files side by side tool 8.  We have a hit Table 9. Rename it FeatureCounts HIT TABLE 10. We can visualize data using chart
  • 11. Differential count analysis 1. Create a new history called “Differential count analysis” 2. Copy the “FeatureCounts HIT TABLE” 3. Run “Differential_Count models using BioConductor packages” on the FeatureCounts HIT TABLE 4. Review the results 5. Yet, we did not reproduce the sup Fig. 1
  • 12. DESeq Analysis 1. Let’s examine Fig.1, together with the published methods 2. The information is wrong, but we will approach the figure, trying to guess what has been really done 3. Copy the “FeatureCounts HIT TABLE” in a new history called “my DESeq approach” 4. To run the Deseq(1) package we need to reformat the HIT TABLE 5. With a text editor OR within Galaxy 1. Cut columns 2. Remove header 3. Upload new header 4. Manipulate header 5. Concatenate files 6. Run the tool “DESeq Profiling (replicates) with sample replicates” 7. Get the R code available in the public library: Rscript_for_Sup_Fig1a 8. Run the Docker Tool Factory tool with this R code to generate the figure 9. Run the tool “DESeq2 Profiling” 10. Re-run the Docker Tool Factory tool with the same R code on the DESeq2 DE analysis Transcriptional profiling In both human and mouse cases, data normalization was performed by transforming uniquely mapped transcript reads to RPKM30. Genes with low expression in all stages (average RPKM < 0.5) were filtered out, followed by quantile normalization. For differential expression, we compared every time point to its previous time point using default parameters in DESeq using normalized read counts. Genes were called differentially expressed if they exhibited a Benjamini and Hochberg–adjusted P value (FDR) <5% and a mean fold change of >2.
  • 13. Optional: comparison between the tophat2 approach and the BWA approach 1. Sharing the “SRP018525 BWA” history 2. Sharing the “Comparison BWA / Tophat” visualization 3. Analyze the differences