SlideShare a Scribd company logo
- Academia	Sinica LSL	NGS	Workshop	-
DNA	Methylation	Data	Analysis
Yi-Feng	Chang Ph.D.
Molecular	Medicine	Research	Center,	Chang	Gung	University
ianyfchang@mail.cgu.edu.tw
03-2118800	#3166	or	#3528
2015/11/18
1
Outlines
• DNA	Methylation:	Functions	and	Diseases
• Methods	of	Measuring	DNA	Methylation	Status
• DNA	Methylation	Data	Analysis
• A	Case	Study	of	DNA	Methylation	Data	Analysis
• DNA	Methylation	Data	Visualization
2
http://commonfund.nih.gov/epigenomics/figure.aspx	
3
DNA	Methylation:	Functions	and	Diseases
4
Portela,	A.	&	Esteller,	M.	Epigenetic	modifications	and	human	disease.	Nat	Biotechnol 28,	1057-1068,	doi:10.1038/nbt.1685	(2010).
DNA	Epigenetic	Modifications	in	
Human	Diseases
5
Portela,	A.	&	Esteller,	M.	Epigenetic	modifications	and	human	disease.	Nat	Biotechnol 28,	1057-1068,	doi:10.1038/nbt.1685	(2010).
DNA	Methylation	Pathway
6
Moore,	L.D.,	Le,	T.	&	Fan,	G.	DNA	methylation	 and	its	basic	function.	Neuropsychopharmacology 38,	23-38	(2013).
DNA	Demethylation Pathway
7
Moore,	L.D.,	Le,	T.	&	Fan,	G.	DNA	methylation	 and	its	basic	function.	Neuropsychopharmacology 38,	23-38	(2013).
• 5mC:	5-Methylcytosine
• 5hmC:	5-hydroxymethylcytosine
• 5hmU:		5-hydroxymethyluracil
• 5fC:	5-formylcytosine
• 5caC:	5-carboxycytosine
• Tet:	Ten-eleven	translocation	enzymes
• AID/	APOBEC:	activation-induced	cytidine	
deaminase/apolipo-protein	B	mRNA-
editing	enzyme	complex
• TDG:	Thymine	DNA	glycosylase
• SMUG1:	Single-strand-selective	
monofunctional uracil-DNA	glycosylase	1
Methods	of	Measuring	DNA	Methylation	Status
8
Timeline	of	Technologies	for	Studying	DNA	
Methylation
9
COBRA:	Combined	Bisulfite	 Restriction	 Analysis
AP-PCR:	Methylation-Sensitive	Arbitrarily	 Primed	PCR
AIMS:	DNA methylation by	amplification	 of	intermethylated sites
RRBS:	Reduced	representation	 bisulfite	 sequencing
MS-HRM:	Methylation-sensitive	high	resolution	 melting
MeDIP-Seq:	Methylated	DNA	immunoprecipitation sequencing
MethylC-Seq/BS-Seq:	Bisulfite	 sequencing
TAB-Seq:	Tet-Assisted	Bs-Seq
MAB-Seq:	M.SssI methylase-assisted	BS-Seq
MS-HRM
MeDIP-Seq
BS-Seq
MethylC-Seq
TAB-Seq
MAB-Seq
Harrison,	 A.	&	Parle-McDermott,	 A.	DNA	methylation:	a	timeline	 of	methods	and	applications.	Front	Genet	2,	74	(2011).
2015
The	Steps	to	Determining	the	Methylation	Status	
of	Cytosine	in	a	Known	DNA	Sequence	by	The	
Bisulfite	Conversion	Method
10
Singal,	R.	&	Ginder,	 G.D.	DNA	Methylation.	Blood	Journal	 93,	4059-4070	(1999).
11
Lister,	 R.	&	Ecker,	J.R.	Finding	the	fifth	base:	
genome-wide	sequencing	 of	cytosine	methylation.	
Genome	Res	19,	959-66	(2009).
Genomic	DNA
Deep	Sequencing
Techniques for Genome-
Wide Sequencing of
Cytosine Methylation Sites
12
Genomic	DNA
Deep	Sequencing
Techniques for Enrichment of Methylated
or Target Regions Prior to BS-Seq
Lister,	R.	&	Ecker,	J.R.	Finding	the	
fifth	base:	genome-wide	sequencing	
of	cytosine	methylation.	Genome	
Res	19,	959-66	(2009).
Approaches	for	Detecting	Active	DNA	
Demethylation	at	Single	Base	Resolution
13
TAB-Seq: Tet-Assisted Bs-Seq
Yu,	M.	et	al.	Tet-assisted	bisulfite	 sequencing	of	5-
hydroxymethylcytosine.	 Nat	Protoc 7,	2159-70	(2012).
Yu,	M.	et	al.	Base-resolution	 analysis	of	5-
hydroxymethylcytosine	 in	the	mammalian	genome.	Cell	149,	
1368-80	(2012).
MAB-Seq: M.SssI methylase-assisted BS-Seq
Wu,	H.,	Wu,	X.,	Shen,	L.	&	Zhang,	Y.	Single-base	resolution	 analysis	of	active	DNA	
demethylation	 using	methylase-assisted	bisulfite	 sequencing.	Nat	Biotechnol 32,	
1231-40	(2014).
Key	Metrics	of	the	Technology	Comparison
14
Beck,	S.	Taking	the	measure	of	the	methylome.	Nat	 Biotechnol 28,	1026-8	(2010).
Human Methylation 450K
contains approximately 480k
CpG sites, covering 99%
RefSeq genes (hg19) and
96% CpG islands (CGIs).
Genomic	Coverage	of	MeDIP-seq,	MethylCap-seq,	
RRBS	and	Infinium
15
Bock,	C.	et	al.	Quantitative	comparison	 of	genome-wide	DNA	methylation	 mapping	technologies.	 Nat	Biotechnol 28,	1106-14	(2010).
MeDIP-seq	and	MethylCap-seq	provide	broad	coverage	of	the	genome,	whereas	RRBS	
and	Infinium	are	more	restricted	to	CpG	islands	and	promoter	regions
Common	Base	Resolution	Methylation	Sequencing	Platforms
16Sun,	Z.,	Cunningham,	J.,	Slager,	S.	&	Kocher,	J.	P.	Base	resolution	methylome profiling:	considerations	in	platform	
selection,	data	preprocessing	and	analysis.	Epigenomics 7,	813-828,	doi:10.2217/epi.15.21	(2015).
WGBS	Coverage	Depth	vs	Replicates
• Using	several	high-coverage	reference	data	sets	to	experimentally	
determine	minimal	sequencing	requirements
17
Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite
sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).
WGBS	Coverage	Depth	vs	Replicates
• For	DMR	identification
• Per-sample	coverage	in	the	range	of	5–15×,	depending	on	the	magnitude	of	methylation	differences	
between	the	groups	and	whether	a	smoothing	or	single	CpG-based	DMR	identification	strategy	is	
used
• To	identify	long	DMRs	with	large	methylation	differences,	we	find	that	reducing	coverage	down	to	1×
or	2× per	sample	is	acceptable
• Biological	replicates	should	be	analyzed	separately	to	increase	power,	as	opposed	to	being	pooled	
together	for	analysis
• Strongly	argue	for	the	use	of	at	least	two	separate	biological	replicates for	DMR	analysis
• Choosing	an	appropriate	number	of	biological	replicates	is	a	complex	issue	influenced	by	the	degree	
of	within-group	heterogeneity,	the	magnitude	of	between-group	differences	and	the	presence	of	
confounding	factors	such	as	batch	effects.
18Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite
sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).
DNA	Methylation	Data	Analysis
19
Effect	and	Problems	of	Bisulfite	Treatment	of	DNA
20
Krueger,	 F.,	Kreck,	B.,	Franke,	A.	&	Andrews,	S.R.	DNA	methylome analysis	using	short	bisulfite	 sequencing	data.	Nat	Methods	9,	145-51	(2012).
Mapping	bisulfite	reads	to	4	possible	bisulfite	strands	(OT/CTOT/OB/CTOB)	is	
equivalent	to	mapping	the	bisulfite	read	and	its	reverse	complementary	
read	to	both	Top/Bottom	strands	of	the	original	reference	sequence.
OT,	original	top	strand;	CTOT,	strand	complementary	to	the	original	top	
strand;	OB,	original	bottom	strand;	and	CTOB,	strand	complementary	to	the	
original	bottom	strand.
How	to		Align	BS	Reads	Against	Reference	Genome?
21
Bock,	C.	Analysing and	interpreting	 DNA	methylation	 data.	Nat	Rev	Genet	13,	705-19	(2012)
TCGA TCGT ACGT ATGA
TTGT ATGTTCGA ATGA
BS-Seq reads
Procedure	to	Perform	Three-Letter	Alignment
22
Krueger,	 F.	&	Andrews,	S.R.	Bismark:	A	flexible	aligner	and	methylation	caller	for	Bisulfite-Seq applications.	 Bioinformatics	 (2011).
Three-Letter	Alignment
23
Multiple	hits
Bock,	C.	Analysing and	interpreting	 DNA	methylation	 data.	Nat	Rev	Genet	13,	705-19	(2012)
Wild-Card	Alignment
24
Convert	C/T	to	Y
Multiple	hits
Bock,	C.	Analysing and	interpreting	 DNA	methylation	 data.	Nat	Rev	Genet	13,	705-19	(2012)
Wild-Card	Alignments	have	Better	Accuracy	
but	Poor	Running	Time
25
http://smithlabresearch.org/manuals/rmap_manual.pdf
Workflow	for	Analyzing	BS-Seq data
26Krueger,	 F.,	Kreck,	B.,	Franke,	A.	&	Andrews,	S.R.	DNA	methylome analysis	using	
short	 bisulfite	sequencing	 data.	Nat	Methods	9,	145-51	(2012).
http://omictools.com/bisulfite-seq/
A	Case	Study	of	DNA	Methylation	Data	Analysis
27
Turn	off	PowerPoint	Smart	Quote
28
Required	Software	in	Your	Laptop
• Mac	OS	X	Terminal
• Application	à Utilities	 à Terminal	(終端機)
• Linux	console
• Putty:	
http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
• SCP/SFTP/FTP	client
• Winscp:	http://winscp.net/download/winscp556.zip
• PDF	viewer
• http://get.adobe.com/tw/reader/
• R
• https://cran.r-project.org/
29
Required	R	Packages
• Bioconductor
• http://www.bioconductor.org/install/#install-
bioconductor-packages
• methylKit:
• https://github.com/al2na/methylKit
30
> R
# dependencies
> install.packages( c("data.table","devtools"))
> source("http://bioconductor.org/biocLite.R")
> biocLite(c("GenomicRanges","IRanges"))
# install the development version from github
> library(devtools)
> install_github("al2na/methylKit",build_vignettes=FALSE)
Analysis	Pipeline
31
Allele-specific	Methylated	Regions
amrfinder allelicmeth
Differential	Methylation	Region
dmr
Large	Hypo/Hyper-Methylation	 Domains
pmd
Hypo/Hyper-Methylation	 Regions
hmr hyperhmr pmr
Methylation	Calling
methcounts
Bisulfite	Conversion	Rate
bsrate
Remove	Duplicate	Reads
duplicate-remover
Mapping
walt
Quality	Trimming
fastq_masker
Cross-species	Comparison	of	Methylomes
liftOver
Calculating	Methylation	Ratio	for	Regions
bigWigAverageOverBed roimethstat bwtools
Generate	Methylation	BED	file
Bedtools bedGraphToBigWig
fastx toolkit:	http://hannonlab.cshl.edu/fastx_toolkit/	
MethPipe:	http://smithlabresearch.org/software/methpipe/
Bedtools:	https://github.com/arq5x/bedtools2
Programs	from	UCSC	Genome	Browser:	
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64
bwtool:	https://github.com/CRG-Barcelona/bwtool/wiki
Sorting	mr files
Sorting	mr files
http://smithlabresearch.org/downloads/methpipe-manual.pdf
Public	BS-Seq Datasets
32
http://smithlabresearch.org/software/methbase/
Other	species	in	NCBI	GEO	Database
• Glycine	max	(Soy	beans)
• Schistocerca gregaria (Locust)
• Rattus norvegicus (Rat)
• Danio rerio (Zebra	fish)
• Drosophila melanogaster (Fruit	fly)
• Oryza sativa (Rice)
• Macaca mulatta (Rhesus	monkey)
• Mus musculus domesticus (Western	Europen house	mouse)
• Xenopus (Silurana)	tropicalis (Frog)
• Cynoglossus semilaevis (Tongue	sole,	bony	fish)
• Bombyx mori (Silkworm)
• Harpegnathos saltator (Jerdon's jumping	ant)
• Camponotus floridanus (Florida	carpenter	ant)
H1	(male):	human	embryonic	stem	cells	(107GB)
IMR90	(female):	fetal	lung	fibroblasts	(154GB)
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16256
33
Datasets	used	in	This	Case	Study
Convert	SRA	to	FASTQ	(Example	ONLY)
# sra-toolkit can be download from https://github.com/ncbi/sratoolkit
> fastq-dump --split-3 SRR018975.sra
> ls
SRR018975.fastq
34
DEMO	Files
> cd /work3/LSLNGSDNAMETH
> ls -alh
total 12G
drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 16 00:29 .
drwxrwxrwt 109 root root 4.0K Nov 15 14:10 ..
-rwxr-xr-x 1 u00gel00 u00ycm02 65K Nov 15 17:22 h1.chrX.hmr
-rwxr-xr-x 1 u00gel00 u00ycm02 4.6G Nov 15 14:51 h1.chrX.mr.dremove
-rwxr-xr-x 1 u00gel00 u00ycm02 9.8K Nov 15 17:22 h1.chrX.pmd
-rwxr-xr-x 1 u00gel00 u00ycm02 34M Nov 15 17:39 h1.chrX_CpG.meth
-rwxr-xr-x 1 u00gel00 u00ycm02 39M Nov 15 23:52 h1.chrX_CpG.meth.for.methylKit
-rwxr-xr-x 1 u00gel00 u00ycm02 161K Nov 15 17:22 h1_gt_imr90.chrX.dmr
-rwxr-xr-x 1 u00gel00 u00ycm02 45M Nov 15 17:22 h1_imr90.chrX.methdiff
-rwxr-xr-x 1 u00gel00 u00ycm02 55K Nov 15 17:22 h1_lt_imr90.chrX.dmr
-rwxr-xr-x 1 u00gel00 u00ycm02 194K Nov 15 17:22 imr90.chrX.hmr
-rwxr-xr-x 1 u00gel00 u00ycm02 7.3G Nov 15 14:52 imr90.chrX.mr.dremove
-rwxr-xr-x 1 u00gel00 u00ycm02 5.6K Nov 15 17:22 imr90.chrX.pmd
-rwxr-xr-x 1 u00gel00 u00ycm02 35M Nov 15 17:39 imr90.chrX_CpG.meth
-rwxr-xr-x 1 u00gel00 u00ycm02 40M Nov 15 23:52 imr90.chrX_CpG.meth.for.methylKit
drwxr-xr-x 6 u00gel00 u00ycm02 4.0K Nov 15 14:28 methpipe-3.3.1
drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 15 14:46 methpipe-data
35
Quality	Trimming and	Split	FASTQ	Files	into	Smaller	
Files		(Example	ONLY)
#e.g. SRR018975.fastq.gz
> for f in *.gz;
do
b=`basename $f .gz`;
echo $f
bsub -q 4G -o $f.stdout -e $f.stderr "
gzip -dc $f|
fastq_masker -q 30 -Q33|
split -dl 6000000 - $b- ";
done
> ls
SRR018975.fastq-00
SRR018975.fastq-01
SRR018975.fastq-02
… 36
#e.g. SRR018975.fastq.gz
# listing all gzip files one by one
# SRR018975.fastq
#uncompressing gzip file and out to stdout
#masking low quality reads as Ns
#spliting fastq file into smaller ones
Mapping	BS-Seq
FASTQ	Files		
(Example	ONLY)
> export AdapterTrich=AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
> export AdapterArich=CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
> bsub -q 4G -o rmapbs.stdout -e rmapbs.stderr "
/work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe 
-c /work3/LSLNGSDNAMETH/methpipe-data/data/genome 
-o /work3/USERNAME/Output/test.mr 
-m 3 -L 400 -C $AdapterTrich:$AdapterArich
/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_1.fq 
/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_2.fq"
37
>	/work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe
Usage: rmapbs-pe [OPTIONS] <fastq-reads-file>
Options:
-o, -output output file name
-c, -chrom chromosomes in FASTA file or dir
-T, -start index of first read to map
-N, -number number of reads to map
-s, -suffix suffix of chrom files (assumes dir provided)
-m, -mismatch maximum allowed mismatches
-M, -max-map maximum allowed mappings for a read
-C, -clip clip the specified adaptor
-L, -fraglen max fragment length
-suffix-len Suffix length of reads name
-v, -verbose print more run info
Help options:
-?, -help print this help message
-about print about message
Example	Output	of	imr90	chrX
38
> head -n 30 /work3/LSLNGSDNAMETH/imr90.chrX.mr.dremove	 |column
MR Format
•RNAME (chromosome name)
•SPOS (start position, 0-based)
•EPOS (end position, 0-based)
•QNAME (read name)
•MISMATCH (number of mismatches)
•STRAND (forward or reverse strand)
•SEQ
•QUAL
Remove	Duplicates	(Example	ONLY)
> export PATH=$PATH:/pkg/biology/methpipe/methpipe-3.3.1/bin/
> bsub -q 16G -o stdout -e stderr "
LC_ALL=C sort -S 14G -k 1,1 -k 2,2n -k 3,3n -k 6,6 
-o /work3/USERNAME/h1.chrX.mr.sorted_start
/work3/LSLNGSDNAMETH/h1.chrX.mr;
duplicate-remover -S /work3/USERNAME/h1.chrX_dremove_stat.txt 
-o /work3/USERNAME/h1.chrX.mr.dremove 
/work3/USERNAME/h1.chrX.mr.sorted_start "
> cat stdout
Successfully completed.
Resource usage summary:
CPU time : 343.80 sec.
Max Processes : 3
Max Threads : 4 39
> cat/work3/USERNAME/h1.chrX_dremove_stat.txt
TOTAL READS IN: 24350707
GOOD BASES IN: 1987943796
TOTAL READS OUT: 22884736
GOOD BASES OUT: 1867152730
DUPLICATES REMOVED: 1465971
READS WITH DUPLICATES: 1219174
Computing	single-site	methylation	levels	(Example	Only)
# sorting again for methylated CpG analysis
bsub -q 16G -o stdout -e stderr "
LC_ALL=C sort -S 14G -k 1,1 -k 3,3n -k 2,2n -k 6,6 
-o /work3/USERNAME/h1.chrX.mr.sorted_end_first 
/work3/LSLNGSDNAMETH/h1.chrX.mr.dremove"
# methylation calling
bsub -q 16G -o stdout -e stderr "
methcounts -c /work3/LSLNGSDNAMETH/hg18 
-o /work3/USERNAME/h1.chrX.meth 
/work3/USERNAME/h1.chrX.mr.sorted_end_first"
#extract CpG sites
bsub -q 16G -o stdout -e stderr "
symmetric-cpgs 
-o /work3/USERNAME/h1.chrX_CpG.meth h1.chrX.meth"
40
chrX 152 + CpG 0 0
chrX 232 + CpG 0 0
chrX 330 + CpG 0 0
chrX 334 + CpG 0 0
chrX 336 + CpG 0 0
chrX 364 + CpG 0 0
chrX 366 + CpG 0 0
chrX 374 + CpG 0 0
chrX 376 + CpG 0 0
meth	ratio read	count
Computation	of	methylation	level	statistics	
(Example	ONLY)
41
bsub -q 16G -o stdout -e stderr "
levels -o /work3/USERNAME/Output/h1.chrX.levels 
/work3/USERNAME/h1.chrX.meth"
Estimating	bisulfite	conversion	rate
> bsub -q 16G -o stdout -e stderr "
bsrate -c /work3/LSLNGSDNAMETH/hg18 
-o /work3/USERNAME/Output/h1.chrX.bsrate 
/work3/LSLNGSDNAMETH/h1.chrX.mr.dremove"
42
# head –n 16 /work3/USERNAME/Output/h1.chrX.bsrate
OVERALL CONVERSION RATE = 0.980192
POS CONVERSION RATE = 0.980204 96942555
NEG CONVERSION RATE = 0.980179 96821402
BASE PTOT PCONV PRATE NTOT NCONV NRATE BTHTOT BTHCONV BTHRATE ERR ALL ERRRATE
1 1798190 1762518 0.98016 1796291 1760655 0.98016 3594481 3523173 0.98016 36327 3630808 0.01001
2 1654252 1617801 0.97797 1649805 1613025 0.97771 3304057 3230826 0.97784 41299 3345356 0.01235
3 1646403 1615036 0.98095 1644710 1613525 0.98104 3291113 3228561 0.98099 48231 3339344 0.01444
4 1699787 1666286 0.98029 1695105 1662078 0.98052 3394892 3328364 0.98040 50697 3445589 0.01471
5 1663363 1631006 0.98055 1658397 1626045 0.98049 3321760 3257051 0.98052 52464 3374224 0.01555
6 1720978 1687130 0.98033 1716036 1682351 0.98037 3437014 3369481 0.98035 45366 3482380 0.01303
7 1677561 1644979 0.98058 1677119 1644343 0.98046 3354680 3289322 0.98052 53873 3408553 0.01581
8 1714426 1681206 0.98062 1714378 1681339 0.98073 3428804 3362545 0.98068 34491 3463295 0.00996
9 1702891 1668424 0.97976 1700092 1665742 0.97980 3402983 3334166 0.97978 34861 3437844 0.01014
10 1681522 1648092 0.98012 1680471 1647068 0.98012 3361993 3295160 0.98012 45776 3407769 0.01343
11 1664207 1631036 0.98007 1664386 1631083 0.97999 3328593 3262119 0.98003 46055 3374648 0.01365
12 1651326 1618334 0.98002 1649370 1616514 0.98008 3300696 3234848 0.98005 44139 3344835 0.01320
Hypomethylated (hmr)	and	hypermethylated
(hypermr)
> bsub -q 16G -o stdout -e stderr "
hmr -o /work3/USERNAME/h1.chrX.hmr /work3/USERNAME/h1.chrX_CpG.meth"
> bsub -q 16G -o stdout -e stderr "
pmd -o /work3/USERNAME/h1.chrX.pmd /work3/USERNAME/h1.chrX_CpG.meth"
43
chrX 2727656 2728600 HYPO0 18 +
chrX 2731108 2731952 HYPO1 14 +
chrX 2732390 2733303 HYPO2 23 +
chrX 2740632 2740962 HYPO3 9 +
chrX 2756524 2758153 HYPO4 139 +
chrX 2817685 2817980 HYPO5 8 +
chrX 2855757 2857708 HYPO6 127 +
chrX 2890571 2890884 HYPO7 9 +
chrX 3004371 3004626 HYPO8 9 +
chrX 3238227 3238677 HYPO9 9 +
#	of	CpG
Differential	Methylation	Analysis
> bsub -q 16G -o stdout -e stderr "
methdiff -o /work3/USERNAME/h1_imr90.chrX.methdiff
/work3/LSLNGSDNAMETH/h1.chrX_CpG.meth /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth"
44
chrX 2709681 + CpG 0.749276 7 2 12 7
chrX 2709727 + CpG 0.917633 4 1 9 12
chrX 2709774 + CpG 0.894737 3 1 6 10
chrX 2709871 + CpG 0.742424 0 16 0 48
chrX 2709890 + CpG 0.857575 3 20 3 47
chrX 2709982 + CpG 0.999354 10 2 7 19
chrX 2710014 + CpG 0.704043 3 6 3 10
chrX 2710023 + CpG 0.600782 4 3 4 4
chrX 2710146 + CpG 0.523077 1 2 8 14
chrX 2710155 + CpG 0.234026 3 3 17 9
Probability
Sample	A
Un-meth
Sample	A
Meth
Sample	B
Un-meth
Sample	B
Meth
Differential	methylated	region	(DMR)
> bsub -q 16G -o stdout -e stderr "
dmr /work3/LSLNGSDNAMETH/h1_imr90.chrX.methdiff
/work3/LSLNGSDNAMETH/h1.chrX.hmr /work3/LSLNGSDNAMETH/imr90.chrX.hmr
h1_lt_imr90.chrX.dmr h1_gt_imr90.chrX.dmr"
45
==> h1_lt_imr90.chrX.dmr <==
chrX 2727656 2728600 X:18 10 +
chrX 2731108 2731952 X:15 4 +
chrX 2732390 2733303 X:37 8 +
chrX 2740632 2740962 X:9 0 +
chrX 2758131 2758153 X:3 0 +
chrX 2817685 2817980 X:9 0 +
chrX 2855757 2855890 X:1 1 +
chrX 2890571 2890884 X:9 4 +
chrX 3004371 3004626 X:9 0 +
chrX 3238227 3238677 X:24 0 +
==> h1_gt_imr90.chrX.dmr <==
chrX 2825454 2826947 X:37 17 +
chrX 2857708 2857760 X:2 0 +
chrX 3272822 3273033 X:13 3 +
chrX 3275527 3275594 X:1 0 +
chrX 3287038 3289160 X:36 9 +
chrX 3643168 3643374 X:7 0 +
chrX 4016033 4022054 X:47 29 +
chrX 4028369 4042000 X:79 54 +
chrX 4051286 4059878 X:52 39 +
chrX 4079778 4087714 X:45 26 +
Number	of	significant	differential	methylated	CpG
Meth.	level	lower	in	H1	than	IMR90 Meth.	level	lower	in	IMR90	than	H1
#	of	CpG
> awk -F "[:t]" ’$5 >= 10 && $6 >= 5 {print $0}’ h1_lt_imr90.chrX.dmr
> h1_lt_imr90.chrX.dmr.filtered
Other	Utilities
• DM	analysis	of	two	groups	of	DNA	methylomes
• Robinson,	M.	D.	et	al.	Statistical	methods	for	detecting	differentially	
methylated	loci	and	regions.	Frontiers	in	genetics	5,	324,	
doi:10.3389/fgene.2014.00324	(2014).
• Allele-specific	methylation
• allelicmeth
• amrfinder:	http://smithlabresearch.org/software/amrfinder/
• Estimate	hydroxymethylation(5hmC)	and	methylation	(5mC)	
levels	from	BS-seq,	oxBS-seq and	TAB-seq
• mlml:	http://smithlabresearch.org/software/mlml/
46
DNA	Methylation	Data	Visualization
47
R	Packages:	methylKit
The	following	examples	were	adopt	from	the	tutorials	of	methylKit
• Akalin,	A. et	al. methylKit:	a	comprehensive	R	package	for	the	
analysis	of	genome-wide	DNA	methylation	profiles.	Genome	Biol
13,	R87,	doi:10.1186/gb-2012-13-10-r87	(2012).
• Tutorial:	
http://methylkit.googlecode.com/files/methylKitTutorial_feb2012.
pdf
• Tutorial	Slide:	http://methylkit.googlecode.com/files/	
methylKitTutorialSlides_2013.pdf
48
Convert	MethPipe mr Format	to	methylKit
Format
Id chr base strand coverage freqC freqT
Chr21.9764539 chr21 9764539 R 12 25.00 75.00
Chr21.9764513 chr21 9764513 R 12 0.00 100.00
Chr21.9820622 chr21 9820622 F 13 0.00 100.00
Chr21.9837545 chr21 9837545 F 11 0.00 100.00
Chr21.9849022 chr21 9849022 F 124 72.58 27.42
Chr21.9853326 chr21 9853326 F 17 70.59 29.41
49
> awk -F $'t' -v OFS=$'t’ '$6>0{$5=int($5*100); print $1"."$2, $1,
$2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/h1.chrX_CpG.meth >
/work3/USERNAME/Output/h1.chrX_CpG.meth.for.methylKit
> awk -F $'t' -v OFS=$'t' '$6>0{$5=int($5*100); print $1"."$2, $1,
$2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth >
/work3/USERNAME/Output/imr90.chrX_CpG.meth.for.methylKit
Read	Methylation	Files	into	methylKit Objects
> library(methylKit)
# load methylation files (change to your datasets)
> file.list=list(
system.file("extdata", "test1.myCpG.txt", package = "methylKit"),
system.file("extdata", "test2.myCpG.txt", package = "methylKit"),
system.file("extdata", "control1.myCpG.txt", package = "methylKit"),
system.file("extdata", "control2.myCpG.txt", package = "methylKit") )
# read the files to a methylRawList object: myobj
> myobj=read( file.list, sample.id=list("test1", "test2","ctrl1","ctrl2"),
assembly="hg18",treatment=c(1,1,0,0))
> head(myobj)
50
Get	descriptive	stats	on	methylation
> png("test1.png",width=600,height=600)
> getMethylationStats(myobj[[1]],plot=T,both.strands=F)
> dev.off()
null device 1
> png("control1.png",width=600,height=600)
> getMethylationStats(myobj[[3]],plot=T,both.strands=F)
> dev.off()
null device 1
51
Sample	Correlation
> png("correlation.png",width=1000,height=1000)
> getCorrelation(meth, plot = T)
test1 test2 ctrl1 ctrl2
test1 1.0000000 0.9252530 0.8767865 0.8737509
test2 0.9252530 1.0000000 0.8791864 0.8801669
ctrl1 0.8767865 0.8791864 1.0000000 0.9465369
ctrl2 0.8737509 0.8801669 0.9465369 1.0000000
> dev.off()
52
Get	bases	covered	by	all	samples	and	cluster	
samples
# merge all samples to one table by using base-pair locations that are covered in all samples
> meth=unite(myobj)
# cluster all samples using correlation distance and plot hierarchical clustering
> png("cluster.png", width=600, height=600)
> hc = clusterSamples(meth, dist="correlation", method="ward", plot=T)
> dev.off()
> png("pca.png", width=600,height=600)
> PCASamples(meth)
> dev.off()
53
Calculate	differential	methylation
# calculate differential methylation p-values and q-values
> myDiff=calculateDiffMeth(meth)
# get differentially methylated regions with 25% difference and qvalue < 0.01
> myDiff25p=get.methylDiff(myDiff,difference=25,qvalue=0.01)
# get differentially hypo methylated regions with 25% difference and qvalue<0.01
> myDiff25pHypo =get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")
# get differentially hyper methylated regions with 25% difference and qvalue<0.01
> myDiff25pHyper=get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")
54
Differential	methylation	events	per	chromosome
> png("meth_event.png",width=600,height=600)
> diffMethPerChr(myDiff, plot = T, qvalue.cutoff = 0.01,meth.cutoff = 25)
> dev.off()
55
Annotate	Differentially	Methylated	Bases/Regions
#	read-in	transcript	locations	to	be	used	in	annotation
>	gene.obj=read.transcript.features(system.file("extdata",	"refseq.hg18.bed.txt",	package	=	
"methylKit"))	
#	annotate	differentially	methylated	Cs	with	promoter/exon/intron	using	annotation	data	
>annotate.WithGenicParts(myDiff25p,gene.obj)
56
Annotating	Differential	Methylation	Events	around	
CpG Islands
>	cpg.obj =	read.feature.flank(system.file("extdata",	"cpgi.hg18.bed.txt",	package	=	
"methylKit"),feature.flank.name =	c("CpGi",	"shores"))
>	diffCpGann =	annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi,	cpg.obj$shores,	
feature.name =	"CpGi",flank.name =	"shores")
57
https://www.gitbook.com/book/ycl6/methylation-sequencing-
analysis/details
58Dr.	I-Hsuan Lin,	NYMU
Questions?
59

More Related Content

PPT
MLPA
PPTX
Next generation sequencing
PDF
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
PPTX
Gene expression profiling
PPTX
Histone modifications
PPTX
Microarray (DNA and SNP microarray)
PPTX
Next generation sequencing
PPTX
SNPs analysis methods
MLPA
Next generation sequencing
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
Gene expression profiling
Histone modifications
Microarray (DNA and SNP microarray)
Next generation sequencing
SNPs analysis methods

What's hot (20)

PDF
Structural Variation Detection
PDF
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
PPTX
Epigenetics: Introduction and Definition and the mechanism
PPT
Analysis of gene expression
PPTX
Next Generation Sequencing of DNA
PDF
Transcriptomics and metabolomics
PPTX
Genomics(functional genomics)
PPT
Microarray
PDF
Overview of Single-Cell RNA-seq
PPTX
Comparative genomics
PPTX
Next Generation Sequencing and its Applications in Medical Research - Frances...
PPTX
NGS data formats and analyses
PPT
Micro RNA.ppt
PPT
Micro RNAs
PPTX
Single strand conformation polymorphism
PPT
Comparative genomics
PDF
Transcriptome Analysis & Applications
PPTX
Genome wide association studies seminar
PPTX
Epigenomics
Structural Variation Detection
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Epigenetics: Introduction and Definition and the mechanism
Analysis of gene expression
Next Generation Sequencing of DNA
Transcriptomics and metabolomics
Genomics(functional genomics)
Microarray
Overview of Single-Cell RNA-seq
Comparative genomics
Next Generation Sequencing and its Applications in Medical Research - Frances...
NGS data formats and analyses
Micro RNA.ppt
Micro RNAs
Single strand conformation polymorphism
Comparative genomics
Transcriptome Analysis & Applications
Genome wide association studies seminar
Epigenomics
Ad

Similar to DNA Methylation Data Analysis (20)

PPTX
20141218 Methylation Sequencing Analysis
DOCX
DNA Methylation Technique in detail.docx
PPTX
20140613 Analysis of High Throughput DNA Methylation Profiling
PPTX
DNA methylation: from array to sequencing
PDF
Dna methylation field guide 20130806
PDF
Dna Methylation Analysis in a Single Day - Download the Slides
PPTX
Sequencing based approaches for profiling dna methylation
PPT
Dna methylation
PPTX
Dna methylation christopher-mendoza
PDF
Wp adna epi_tectmethyl2
PDF
Accurate DNA Methylation Analysis with Successful Bisulfite Conversion Webinar
PPTX
DNA methylation_2023.pptx
PDF
Aacr2009 methylprofiler
PDF
EuroBioc 2018 - metyhlKit overview
DOCX
Determination of DNA Methylation Using Electrochemiluminescenc.docx
PDF
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
PPTX
DNA methylation by Bisulphite sequencing MSC II write up.pptx
PDF
DNA methylation_ understanding the language of DNA 20130806
PDF
The role of DNA methylation in complex diseases
PPTX
2015 07 09__epigenetic_profiling_environmental_health_sciences_v42
20141218 Methylation Sequencing Analysis
DNA Methylation Technique in detail.docx
20140613 Analysis of High Throughput DNA Methylation Profiling
DNA methylation: from array to sequencing
Dna methylation field guide 20130806
Dna Methylation Analysis in a Single Day - Download the Slides
Sequencing based approaches for profiling dna methylation
Dna methylation
Dna methylation christopher-mendoza
Wp adna epi_tectmethyl2
Accurate DNA Methylation Analysis with Successful Bisulfite Conversion Webinar
DNA methylation_2023.pptx
Aacr2009 methylprofiler
EuroBioc 2018 - metyhlKit overview
Determination of DNA Methylation Using Electrochemiluminescenc.docx
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
DNA methylation by Bisulphite sequencing MSC II write up.pptx
DNA methylation_ understanding the language of DNA 20130806
The role of DNA methylation in complex diseases
2015 07 09__epigenetic_profiling_environmental_health_sciences_v42
Ad

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Lecture1 pattern recognition............
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Database Infoormation System (DBIS).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Mega Projects Data Mega Projects Data
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Fluorescence-microscope_Botany_detailed content
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Foundation of Data Science unit number two notes
Lecture1 pattern recognition............
Qualitative Qantitative and Mixed Methods.pptx
.pdf is not working space design for the following data for the following dat...
Database Infoormation System (DBIS).pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction-to-Cloud-ComputingFinal.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

DNA Methylation Data Analysis