SlideShare a Scribd company logo
Introduction to Julia 
for Bioinformatics
Kenta Sato (佐藤建太)
@ Bioinformatics Research Unit, RIKEN ACCC
November 19, 2015
1 / 72
Topics
About Me
Julia
BioJulia
Julia Updates '15
2 / 72
About Me
Graduate school student at the University of Tokyo.
About 2-year experience of Julia programming.
Contributing to Julia and its ecosystem:
https://github.com/docopt/DocOpt.jl
https://github.com/bicycle1885/IntArrays.jl
https://github.com/BioJulia/IndexableBitVectors.jl
https://github.com/BioJulia/WaveletMatrices.jl
https://github.com/BioJulia/FMIndexes.jl
https://github.com/isagalaev/highlight.js (Julia support)
etc.
Core developer of BioJulia - https://github.com/BioJulia/Bio.jl
Julia Summer of Code 2015 Student -
http://julialang.org/blog/2015/10/biojulia-sequence-analysis/
3 / 72
JuliaCon 2015 at MIT, Boston
https://twitter.com/acidflask/status/633349038226690048
4 / 72
Julia is ...
Julia is a high­level, high­performance
dynamic programming language for technical
computing, with syntax that is familiar to users of
other technical computing environments. It provides
a sophisticated compiler, distributed parallel
execution, numerical accuracy, and an extensive
mathematical function library. Julia’s Base library,
largely written in Julia itself, also integrates mature,
best­of­breed open source C and Fortran libraries for
linear algebra, random number generation, signal
processing, and string processing.
— http://julialang.org/
“
5 / 72
Two­Language Problem
In technical computing, users use easier and slower script languages,
while developers use harder and faster compiled languages.
6 / 72
Two­Language Problem
Both users and developers can use a handy language without
sacrificing performance.
7 / 72
Three Virtues of the Julia Language
Simple
Fast
Dynamic
8 / 72
Simple
Syntax with least astonishment
no semicolons
no variable declarations
no argument types
Unicode support
1-based index
blocks end with end
No implicit type conversion
Quick sort with 24 lines
quicksort(xs)=quicksort!(copy(xs))
quicksort!(xs)=quicksort!(xs,1,endof(xs))
functionquicksort!(xs,lo,hi)
iflo<hi
p=partition(xs,lo,hi)
quicksort!(xs,lo,p-1)
quicksort!(xs,p+1,hi)
end
returnxs
end
functionpartition(xs,lo,hi)
pivot=div(lo+hi,2)
pvalue=xs[pivot]
xs[pivot],xs[hi]=xs[hi],xs[pivot]
j=lo
@inboundsforiinlo:hi-1
ifxs[i]≤pvalue
xs[i],xs[j]=xs[j],xs[i]
j+=1
end
end
xs[j],xs[hi]=xs[hi],xs[j]
returnj
end
9 / 72
Fast
Comparable performance to compiled languages.
http://julialang.org/ 10 / 72
Fast
The LLVM-backed JIT compiler emits machine code at runtime.
julia>4>>1 #bitwiseright-shiftfunction
2
julia>@code_native4>>1
.section __TEXT,__text,regular,pure_instructions
Filename:int.jl
Sourceline:115
pushq %rbp
movq %rsp,%rbp
movl $63,%ecx
cmpq $63,%rsi
Sourceline:115
cmovbeq%rsi,%rcx
sarq %cl,%rdi
movq %rdi,%rax
popq %rbp
ret
11 / 72
Dynamic
No need to precompile your program.
hello.jl:
println("hello,world")
Output:
$juliahello.jl
hello,world
In REPL:
julia>include("hello.jl")
hello,world
12 / 72
Dynamic
High-level code generation at runtime (macros).
julia>x=5
5
julia>@assertx>0"xshouldbepositive"
julia>x=-2
-2
julia>@assertx>0"xshouldbepositive"
ERROR:AssertionError:xshouldbepositive
julia>macroexpand(:(@assertx>0"xshouldbepositive"))
:(ifx>0
nothing
else
Base.throw(Base.Main.Base.AssertionError("xshouldbepositive"
end)
13 / 72
Who Created?
Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman
Soon the team was building their dream language.
MIT, where Bezanson is a graduate student, became
an anchor for the project, with much of the work
being done within computer scientist and
mathematician Alan Edelman’s research group. But
development of the language remained completely
distributed. “Jeff and I didn’t actually meet until we’d
been working on it for over a year, and Viral was in
India the entire time,” Karpinski says. “So the whole
language was designed over email.”
— "Out in the Open: Man Creates One Programming Language to Rule Them All" ­
http://www.wired.com/2014/02/julia/
“
14 / 72
Why Created?
In short, because we are greedy.
— "Why We Created Julia" ­ http://julialang.org/blog/2012/02/why­we­created­julia/
“
15 / 72
Why Created?
The creators wanted a language that satisfies:
the speed of C
with the dynamism of Ruby
macros like Lisp
mathematical notations like Matlab
as usable for general programming as Python
as easy for statistics as R
as natural for string processing as Perl
as powerful for linear algebra as Matlab
as good at gluing programs together as the shell
16 / 72
Batteries Included
You can start technical computing without installing lots of libraries.
Numeric types
{8, 16, 32, 64, 128}-bit {signed, unsigned} integers,
16, 32, 64-bit floating point numbers,
and arbitrary-precision numbers.
Numerical linear algebra
matrix multiplication, matrix decomposition/factorization, solver for
system of linear equations, and more!
sparse matrices
Random number generator
Mersenne-Twister method accelerated by SIMD
17 / 72
Batteries Included
You can start technical computing without installing lots of libraries.
Unicode support
Perl-compatible regular expressions (PCRE)
Parallel computing
Dates and times
Unit tests
Profiler
Package manager
18 / 72
Language Design
19 / 72
Literals
#Int64
42
10_000_000
#UInt8
0x1f
#Float64
3.14
6.022e23
#Bool
true
false
#UnitRange{Int64}
1:100
#ASCIIString
"asciistring"
#UTF8String
"UTF8文字列"
#Regex
r"^>[^n]+n[ACGTN]+"
#Array{Float64,1}
#(Vector{Float64})
[1.0,1.1,1.2]
#Array{Float64,2}
#(Matrix{Float64})
[1.0 1.1;
2.0 2.2]
#Tuple{Int,Float64,ASCIIString}
(42,3.14,"asciistring")
#Dict{ASCIIString,Int64}
Dict("one"=>1,"two",=>2)
20 / 72
Functions
All function definitions below are equivalent:
functionfunc(x,y)
returnx+y
end
functionfunc(x,y)
x+y
end
func(x,y)=returnx+y
func(x,y)=x+y
Force inlining:
@inlinefunc(x,y)=x+y
This simple function will be automatically inlined by the compiler.❏
21 / 72
Functions ­ Arguments
Optional arguments:
functionincrement(x,by=1)
returnx+by
end
increment(3) #4
increment(3,2) #5
Keyword arguments:
functionincrement(x;by=1)
returnx+by
end
increment(3) #4
increment(3,by=2)#5
Variable number of arguments:
functionpushback!(list,vals...)
forvalinvals
push!(list,val)
end
returnlist
end
pushback!([]) #[]
pushback!([],1) #[1]
pushback!([],1,2) #[1,2]
Notice semicolon (;) in the argument list above.❏
22 / 72
Functions ­ Return Values
You can return multiple values from a function as a tuple:
functiondivrem64(n)
returnn>>6,n&0b111111
end
And you can receive returned values with multiple assignments:
julia>divrem64(1025)
(16,1)
julia>d,r=divrem64(1025)
(16,1)
julia>d
16
julia>r
1
23 / 72
Functions ­ Document
A document string can be attached to a function definition:
"""
Thisfunctioncomputesquotientandremainder
dividedby64foranon-negativeinteger.
"""
functiondivrem64(n)
returnn>>6,n&0b111111
end
In REPL, you can read the attached document with the ?command:
help?>divrem64
search:divrem64divrem
Thisfunctioncomputesquotientandremainder
dividedby64foranon-negativeinteger.
24 / 72
Types
Two kinds of types:
concrete types: instantiatable
abstract types: not instantiatable
25 / 72
Defining Types
Abstract type:
abstractAbstractFloat<:Real
Composite type:
#mutable
typePoint
x::Float64
y::Float64
end
#immutable
immutablePoint
x::Float64
y::Float64
end
Bits type:
bitstype64Int64<:Signed
Type alias:
typealiasUIntUInt64
Enum:
@enumVotepositivenegative
26 / 72
Parametric Types
Types can take type parameters:
typePoint{T}
x::T
y::T
end
Point: abstract type
Point{Int64}: concrete type
subtype of Point(Point{Int64}<:Point)
all of the members (i.e. xand y) are Int64s
typeNucleotideSequence{T<:Nucleotide}<:Sequence
data::Vector{UInt64}
...
end
27 / 72
Constructors
Julia automatically generates default constructors.
Point(1,2)creates an object of Point{Int}type.
Point(1.0,2.0)creates an object of Point{Float64}type.
Point{Float64}(1,2)creates an object of Point{Float64}type.
Users can create custom constructors.
typePoint{T}
x::T
y::T
end
#outerconstructor
functionPoint(x)
returnPoint(x,x)
end
p=Point(1) #>Point{Int64}(1,1)
28 / 72
Memory Layout
Compact memory layout like C's structs
C compatible memory layout
You can pass Julia objects to C functions without copy.
This is especially important in bioinformatics
when defining data structures for efficient algorithms
when handling lots of small objects
julia>@enumStrandforwardreversebothunknown
julia>immutableExon
chrom::Int
start::Int
stop::Int
strand::Strand
end
julia>sizeof(Exon(1,12345,12446,forward))
32
29 / 72
Multiple Dispatch
Combination of all argument types determines a called method.
Single dispatch (e.g. Python)
The first argument is special and
determines a method.
Multiple dispatch (e.g. Julia)
All arguments are equally
responsible to determine a
method.
classSerializer:
defwrite(self,val):
ifisinstance(val,int)
#...
elifisinstance(val,float)
#...
#...
functionwrite(dst::Serializer,
val::Int64)
#...
end
functionwrite(dst::Serializer,
val::Float64)
#...
end
#...
30 / 72
Multiple Dispatch ­ Example (1)
base/char.jl:
-(x::Char,y::Char) =Int(x)-Int(y)
-(x::Char,y::Integer)=Char(Int32(x)-Int32(y))
+(x::Char,y::Integer)=Char(Int32(x)+Int32(y))
+(x::Integer,y::Char)=y+x
julia>'c'-'a'
2
julia>'c'-1
'b'
julia>'a'+0x01
'b'
julia>0x01+'a'
'b'
31 / 72
Multiple Dispatch ­ Example (2)
functionhas{T<:Integer}(range::UnitRange{Int},target::T)
returnfirst(range)≤target≤last(range)
end
functionhas(iter,target) #sameashas(iter::Any,target::Any)
forelminiter
ifelm==target
returntrue
end
end
returnfalse
end
julia>has(1:10,4)
true
julia>has(1:10,-2)
false
julia>has([1,2,3],2)
true
32 / 72
Metaprogramming
Julia can represent its own program code as a data structure (Expr).
Three metaprogramming components in Julia:
Macros
generate an expression from expressions.
Expr↦ Expr
Generated functions
generate an expression from types.
Types↦ Expr
Non-standard string literals
generate an expression from a string.
String↦ Expr
33 / 72
Metaprogramming ­ Macros
Generate an expression from expressions.
Expr↦ Expr
Denoted as @<macroname>.
Distinguishable from function calls
We've already seen some macros.
macroassert(ex)
msg=string(ex)
:($ex?nothing:throw(AssertionError($msg)))
end
julia>x=-1
-1
julia>@assertx>1
ERROR:AssertionError:x>1
34 / 72
Metaprogramming ­ Useful Macros (1)
@show: print variables, useful for debug:
julia>x=-1
-1
julia>@showx
x=-1
@inbounds: omit to check bounds:
@inboundsh[i,j]=h[i-1,j-1]+submat[a[i],b[j]]
@which: return which function will be called:
julia>@whichmax(1,2)
max{T<:Real}(x::T<:Real,y::T<:Real)atpromotion.jl:239
35 / 72
Metaprogramming ­ Useful Macros (2)
@time: measure elapsed time to evaluate the expression:
julia>xs=rand(1_000_000);
julia>@timesum(xs)
0.022633seconds(27.24kallocations:1.155MB)
499795.2805424741
julia>@timesum(xs)
0.000574seconds(5allocations:176bytes)
499795.2805424741
@profile: profile the expression:
julia>sort(xs);@profilesort(xs);
julia>Profile.print()
69REPL.jl;anonymous;line:92
68REPL.jl;eval_user_input;line:62
...
36 / 72
Generated Functions
Generate a specialized program code for argument types.
Type(s)↦ Expr
Same as function call.
indistinguishable syntax from a calling site
@generatedfunction_sub2ind{N,M}(dims::NTuple{N,Integer},
subs::NTuple{M,Integer})
meta=Expr(:meta,:inline)
ex=:(subs[$M]-1)
fori=M-1:-1:1
ifi>N
ex=:(subs[$i]-1+$ex)
else
ex=:(subs[$i]-1+dims[$i]*$ex)
end
end
Expr(:block,meta,:($ex+1))
end
37 / 72
Non­standard String Literals
Generate an expression from a string.
String↦ Expr
Denoted as <literalname>"..."
Regular expression literal (e.g. r"^>[^n]+n[ACGTN]+") is an
example.
In Bio.jl, dna"ACGT"is converted to a DNASequenceobject.
macror_str(s)
Regex(s)
end
#Regexobject
r"^>[^n]+n[ACGTN]+"
#DNASequenceobject
dna"ACGT"
38 / 72
Modules
Modules are namespace.
Names right under a module are considered as global names.
Import/export system enables to exchange names between
modules.
moduleFoo
exportfoo,gvar
#function
foo()=println("hello,foo")
bar()=println("hello,bar")
#globalvariable
constgvar=42
end
Foo.foo()
Foo.bar()
Foo.gvar
importFoo:foo
foo()
importFoo:bar
bar()
usingFoo
foo()
gvar
39 / 72
Packages
A package manager is bundled with Julia.
No other package manager; this is the standard.
The package manager can build, install, and create packages.
Almost all packages are hosted on GitHub.
Registered packages
Registered packages are public packages that can be installed by
name.
List: http://pkg.julialang.org/
Repository: https://github.com/JuliaLang/METADATA.jl
40 / 72
Packages ­ Management
The package manager is accessible from REPL.
Pkg.update(): update registered package data and upgrade
packages
The way to install a package depends on whether the package is
registered or not.
Pkg.add(<package>): install a registered package
Pkg.clone(<url>): install a package from the git URL
julia>Pkg.update()
julia>Pkg.add("DocOpt")
julia>Pkg.clone("git@github.com:docopt/DocOpt.jl.git")
41 / 72
Packages ­ Create a Package
Package template can be generated with Pkg.generate(<package>).
This generates a disciplined scaffold to develop a new package.
Generated packages will be located in ~/.julia/v0.4/.
Pkg.tag(<package>,<version>)tags the version to the current
commit of the package.
This tag is considered as a release of the package.
Developers should follow Semantic Versioning.
major: incompatible API changes
minor: backwards-compatible functionality addition
patch: backwards-compatible bug fixes
julia>Pkg.generate("DocOpt")
julia>Pkg.tag("DocOpt",:patch) #patchupdate
42 / 72
BioJulia
43 / 72
BioJulia
Collaborative project to build bioinformatics infrastructure for Julia.
Packages:
Bio.jl - https://github.com/BioJulia/Bio.jl
Other packages - https://github.com/BioJulia
44 / 72
BioJulia ­ Basic Principles
BioJulia will be fast.
All contributions undergo code review.
We'll design it to suit modern bioinformatics and Julia, not just copy
other Bio-projects.
https://github.com/BioJulia/Bio.jl/wiki/roadmap
45 / 72
Bio.jl
Major modules:
Bio.Seq: biological sequences
Bio.Intervals: genomic intervals
Bio.Align: sequence alignments (coming soon!)
Bio.Phylo: phylogenetics (common soon!)
Under (active!) development.
46 / 72
Bio.jl
Major modules:
Bio.Seq: biological sequences
Bio.Intervals: genomic intervals
Bio.Align: sequence alignments (coming soon!)
Bio.Phylo: phylogenetics (common soon!)
Under (active!) development.
47 / 72
Sequences
Sequence types are defined in Bio.Seqmodule:
DNASequence, RNASequence, AminoAcidSequence, Kmer
julia>usingBio.Seq
julia>dna"ACGTN" #non-standardstringliteral
5ntDNASequence
ACGTN
julia>rna"ACGUN"
5ntRNASequence
ACGUN
julia>aa"ARNDCWYV"
8aaSequence:
ARNDCWYV
julia>kmer(dna"ACGT")
DNA4-mer:
ACGT
48 / 72
Sequences ­ Packed Nucleotides
A/C/G/Tare packed into an array with 2-bit encoding (+1 bit for N).
typeNucleotideSequence{T<:Nucleotide}<:Sequence
data::Vector{UInt64}#2-bitencodedsequence
ns::BitVector #'N'mask
...
end
In Kmer, nucleotides are packed into a 64-bit type.
bitstype64Kmer{T<:Nucleotide,K}
typealiasDNAKmer{K}Kmer{DNANucleotide,K}
typealiasRNAKmer{K}Kmer{RNANucleotide,K}
49 / 72
Sequences ­ Immutable by Convention
Sequences are immutable by convention.
No copy when creating a subsequence from an existing sequence.
julia>seq=dna"ACGTATG"
7ntDNASequence
ACGTATG
julia>seq[2:4]
3ntDNASequence
CGT
#internaldataissharedbetween
#theoriginalanditssubsequences
julia>seq.data===seq[2:4].data
true
50 / 72
Intervals
Genomic interval types are defined in Bio.Intervalsmodule:
Interval{T}: Tis the type of metadata attached to the interval.
typeInterval{T}<:AbstractInterval{Int64}
seqname::StringField
first::Int64
last::Int64
strand::Strand
metadata::T
end
This is useful when annotating a genomic range:
julia>usingBio.Intervals
julia>Interval("chr2",5692667,5701385,'+',"SOX11")
chr2:5692667-5701385 + SOX11
51 / 72
Intervals ­ Indexed Collections
Set of intervals can be indexed by IntervalCollection:
immutableCDS;gene::ASCIIString;index::Int;end
ivals=IntervalCollection{CDS}()
push!(ivals,Interval("chr6",156777930,156779471,'+',
CDS("ARID1B",1)))
push!(ivals,Interval("chr6",156829227,156829421,'+',
CDS("ARID1B",2)))
push!(ivals,Interval("chr6",156901376,156901525,'+',
CDS("ARID1B",3)))
intersectiterates over intersecting intervals:
julia>query=Interval("chr6",156829200,156829300);
julia>foriinintersect(ivals,query)
println(i)
end
chr6:156829227-156829421 + CDS("ARID1B",2)
52 / 72
Parsers
Parsers are generated from the Ragel state machine compiler.
Finite state machines are described in regular language.
The Ragel compiler generates pure Julia programs.
Actions can be injected into the state transition.
The next Ragel release (v7) will be shipped with the Julia generator.
http://www.colm.net/open-source/ragel/
53 / 72
Parsers ­ FASTA
<name>=<expression>><enteringaction>%<leavingaction>;
FASTA parser:
newline ='r'?'n' >count_line;
hspace =[tv];
whitespace =space|newline;
identifier =(any-space)+ >mark %identifier;
description=((any-hspace)[^rn]*)>mark %description;
letters =(any-space-'>')+ >mark %letters;
sequence =whitespace*letters?(whitespace+letters)*;
fasta_entry='>'identifier(hspace+description)?newline
sequencewhitespace*;
main:=whitespace*(fasta_entry%finish_match)**;
https://github.com/BioJulia/Bio.jl/blob/master/src/seq/fasta.rl
https://github.com/BioJulia/Bio.jl/blob/master/src/seq/fasta.jl
54 / 72
Parsers ­ Fast
Ragel can generate fast parsers.
julia>@timeforrecinopen("hg38.fa",FASTA)
println(rec)
end
>chr1
248956422ntMutableDNASequence
NNNNNNNNNNNNNNNNNNNNNNN…NNNNNNNNNNNNNNNNNNNNNNNN
>chr10
133797422ntMutableDNASequence
NNNNNNNNNNNNNNNNNNNNNNN…NNNNNNNNNNNNNNNNNNNNNNNN
#...
>chrY_KI270740v1_random
37240ntMutableDNASequence
TAATAAATTTTGAAGAAAATGAA…GAATGAAGCTGCAGACATTTACGG
32.198314seconds(174.92kallocations:1.464GB,1.14%gctime)
55 / 72
Alignments
The Bio.Alignmodule supports various pairwise alignment types.
Score maximization:
GlobalAlignment
SemiGlobalAlignment
OverlapAlignment
LocalAlignment
Cost minimization:
EditDistance
LevenshteinDistance
HammingDistance
56 / 72
Alignments ­ Simple Interfaces (1)
julia>affinegap=AffineGapScoreModel(match=5,
mismatch=-4,
gap_open=-3,
gap_extend=-2);
julia>pairalign(GlobalAlignment(),
dna"ATGGTGACT",
dna"ACGTGCCCT",
affinegap)
PairwiseAlignment{Int64,Bio.Seq.NucleotideSequence{Bio.Seq.DNANucleotide},B
score:12
seq:ATGGTGAC-T
||||||
ref:ACG-TGCCCT
57 / 72
Alignments ­ Simple Interfaces (2)
pairalign(<type>,<seq1>,<seq2>,<score/costmodel>)
pairalign(GlobalAlignment(),a,b,model)
pairalign(SemiGlobalAlignment(),a,b,model)
pairalign(OverlapAlignment(),a,b,model)
pairalign(LocalAlignment(),a,b,model)
pairalign(EditDistance(),a,b,model)
pairalign(LevenshteinDistance(),a,b)
pairalign(HammingDistance(),a,b)
Alignment options:
pairalign(GlobalAlignment(),a,b,model,banded=true)
pairalign(GlobalAlignment(),a,b,model,score_only=true)
58 / 72
Alignments ­ Speed (1)
Global alignment of titin sequences (human and mouse):
affinegap=AffineGapScoreModel(BLOSUM62,-10,-1)
a=first(open("Q8WZ42.fasta",FASTA)).seq
b=first(open("A2ASS6.fasta",FASTA)).seq
@timealn=pairalign(
GlobalAlignment(),
Vector{AminoAcid}(a),
Vector{AminoAcid}(b),
affinegap,
)
println(score(aln))
8.012499seconds(601.99kallocations:1.155GB,0.09%gctime)
165611
vs. R (Biostrings):
user systemelapsed
14.042 1.233 15.475
59 / 72
Alignments ­ Speed (2)
vs. R (Biostrings):
user systemelapsed
14.042 1.233 15.475
library(Biostrings,quietly=T)
a=readAAStringSet("Q8WZ42.fasta")[[1]]
b=readAAStringSet("A2ASS6.fasta")[[1]]
t0=proc.time()
aln=pairwiseAlignment(a,b,type="global",
substitutionMatrix="BLOSUM62",
gapOpening=10,gapExtension=1)
t1=proc.time()
print(t1-t0)
print(score(aln))
60 / 72
Indexable Bit Vectors
Bit vectors that supports bit counting in constant time.
rank1(bv,i): Count the number of 1 bits within bv[1:i].
rank0(bv,i): Count the number of 0 bits within bv[1:i].
A fundamental data structure when defining other data structures.
WaveletMatrix, a generalization of the indexable bit vector,
depends on this data structure.
'N'nucleotides in a reference sequence can be compressed
using this data structure.
julia>bv=SucVector(bitrand(10_000_000));
julia>rank1(bv,9_000_000); #precompile
julia>@timerank1(bv,9_000_000)
0.000006seconds(149allocations:10.167KB)
4502258
61 / 72
Indexable Bit Vectors ­ Internals
A bit vector is divided into 256-bit large blocks and each large block is
divided into 64-bit small blocks:
immutableBlock
#largeblock
large::UInt32
#smallblocks
smalls::NTuple{4,UInt8}
#bitchunks(64bits×4=256bits)
chunks::NTuple{4,UInt64}
end
Each block has a cache that counts the number of 1s.
62 / 72
FM­Indexes
Index for full-text search.
Fast, compact, and often used in short-read sequence mappers
(Bowtie2, BWA, etc.).
Product of Julia Summer of Code 2015
https://github.com/BioJulia/FMIndexes.jl
This package is not specialized for biological sequences.
FMIndexes.jl does not depend on Bio.jl.
JIT compiler can optimize code for a specific type at runtime.
julia>fmindex=FMIndex(dna"ACGTATTGACTGTA");
julia>count(dna"TA",fmindex)
2
julia>count(dna"TATT",fmindex)
1
63 / 72
FM­Indexed ­ Queries
Create an FM-Index for chromosome 22:
julia>fmindex=FMIndex(first(open("chr22.fa",FASTA)).seq);
count(pattern,index): count the number of occurrences of pattern:
julia>count(dna"ACGT",fmindex)
37672
julia>count(dna"ACGTACGT",fmindex)
42
64 / 72
FM­Indexed ­ Queries
Create an FM-Index for chromosome 22:
julia>fmindex=FMIndex(first(open("chr22.fa",FASTA)).seq);
locate(pattern,index): locate positions of pattern:
#locatereturnsaniterator
julia>locate(dna"ACGTACGT",fmindex)|>collect
42-elementArray{Any,1}:
20774876
⋮
22729149
#locateallreturnsanarray
julia>locateall(dna"ACGTACGT",fmindex)
42-elementArray{Int64,1}:
20774876
⋮
22729149
65 / 72
Other Julia Orgs You Should Know
Statistics - JuliaStats https://github.com/JuliaStats
https://github.com/JuliaStats/StatsBase.jl
https://github.com/JuliaStats/DataFrames.jl
https://github.com/JuliaStats/Clustering.jl
https://github.com/JuliaStats/Distributions.jl
https://github.com/JuliaStats/MultivariateStats.jl
https://github.com/JuliaStats/NullableArrays.jl
https://github.com/JuliaStats/GLM.jl
66 / 72
Other Julia Orgs You Should Know
Optimization - JuliaOpt https://github.com/JuliaOpt
https://github.com/JuliaOpt/JuMP.jl
https://github.com/JuliaOpt/Optim.jl
https://github.com/JuliaOpt/Convex.jl
Graphs - JuliaGraphs https://github.com/JuliaGraphs
https://github.com/JuliaGraphs/LightGraphs.jl
Database - JuliaDB https://github.com/JuliaDB
https://github.com/JuliaDB/SQLite.jl
https://github.com/JuliaDB/PostgreSQL.jl
67 / 72
Julia Updates '15
68 / 72
Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
69 / 72
Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
Moore foundation granted Julia Computing $600,000.
"Bringing Julia from beta to 1.0 to support data-intensive, scientific
computing" - https://www.moore.org/newsroom/in-the-
news/2015/11/10/bringing-julia-from-beta-to-1.0-to-support-data-
intensive-scientific-computing
70 / 72
Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
Moore foundation granted Julia Computing $600,000.
"Bringing Julia from beta to 1.0 to support data-intensive, scientific
computing" - https://www.moore.org/newsroom/in-the-
news/2015/11/10/bringing-julia-from-beta-to-1.0-to-support-data-
intensive-scientific-computing
Multi-threading Support
https://github.com/JuliaLang/julia/pull/13410
71 / 72
Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
Moore foundation granted Julia Computing $600,000.
"Bringing Julia from beta to 1.0 to support data-intensive, scientific
computing" - https://www.moore.org/newsroom/in-the-
news/2015/11/10/bringing-julia-from-beta-to-1.0-to-support-data-
intensive-scientific-computing
Multi-threading Support
https://github.com/JuliaLang/julia/pull/13410
Intel released ParallelAccelerator.jl
https://github.com/IntelLabs/ParallelAccelerator.jl
72 / 72
Introduction to Julia for bioinformacis

More Related Content

PDF
Julia - Easier, Better, Faster, Stronger
PPT
julia-Latest Programming language
PPTX
Introduction to Julia Language
PDF
Java lab-manual
PPT
Python Programming Language
DOC
Java programming lab assignments
PDF
Python Programming
PDF
All experiment of java
Julia - Easier, Better, Faster, Stronger
julia-Latest Programming language
Introduction to Julia Language
Java lab-manual
Python Programming Language
Java programming lab assignments
Python Programming
All experiment of java

What's hot (20)

DOC
1183 c-interview-questions-and-answers
PPTX
05. Java Loops Methods and Classes
PPTX
14.jun.2012
PPTX
02. Data Types and variables
PDF
66781291 java-lab-manual
PPT
Java ppt
PPT
Tutorial java
PPT
Java tut1
PPT
Devnology Workshop Genpro 2 feb 2011
PDF
Collections forceawakens
PDF
Generics past, present and future
PDF
C Programming - Refresher - Part III
PDF
What is Python?
PPTX
Python basics
PPTX
19. Data Structures and Algorithm Complexity
PDF
On Parameterised Types and Java Generics
PPT
Java tut1 Coderdojo Cahersiveen
PDF
Communicating State Machines
PPTX
Lecture02 class -_templatev2
PDF
Java Generics - by Example
1183 c-interview-questions-and-answers
05. Java Loops Methods and Classes
14.jun.2012
02. Data Types and variables
66781291 java-lab-manual
Java ppt
Tutorial java
Java tut1
Devnology Workshop Genpro 2 feb 2011
Collections forceawakens
Generics past, present and future
C Programming - Refresher - Part III
What is Python?
Python basics
19. Data Structures and Algorithm Complexity
On Parameterised Types and Java Generics
Java tut1 Coderdojo Cahersiveen
Communicating State Machines
Lecture02 class -_templatev2
Java Generics - by Example
Ad

Viewers also liked (15)

PDF
Julia language: inside the corporation
PPTX
High performance computing language,julia
PDF
次世代シーケンサが求める機械学習
PDF
Julia最新情報 2015
PDF
バイオインフォマティクスによる遺伝子発現解析
PDF
Keynote_HITC_March2015
DOCX
Why all deadlines are bad for quality
PDF
Agile Lean Conference 2015 - Lean & Startup (Canessa)
PDF
Social Media Workshop
PPT
ываываываывфы фы вфы фыв фыв фыв фыв
PPT
Аппаратно-програмный комплекс для урологии
PDF
La Flora del Promontorio di Portofino-ISBN-9789077634004
PPTX
MobileTechTalk - Android application troubleshooting
PDF
Tecnología
PPT
Pam Tilson
Julia language: inside the corporation
High performance computing language,julia
次世代シーケンサが求める機械学習
Julia最新情報 2015
バイオインフォマティクスによる遺伝子発現解析
Keynote_HITC_March2015
Why all deadlines are bad for quality
Agile Lean Conference 2015 - Lean & Startup (Canessa)
Social Media Workshop
ываываываывфы фы вфы фыв фыв фыв фыв
Аппаратно-програмный комплекс для урологии
La Flora del Promontorio di Portofino-ISBN-9789077634004
MobileTechTalk - Android application troubleshooting
Tecnología
Pam Tilson
Ad

Similar to Introduction to Julia for bioinformacis (20)

PDF
20 issues of porting C++ code on the 64-bit platform
PDF
Program errors occurring while porting C++ code from 32-bit platforms on 64-b...
PDF
20 issues of porting C++ code on the 64-bit platform
PDF
Development of a static code analyzer for detecting errors of porting program...
PDF
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
PDF
Monitoring a program that monitors computer networks
PPTX
Things to Remember When Developing 64-bit Software
PDF
The static code analysis rules for diagnosing potentially unsafe construction...
PDF
About size_t and ptrdiff_t
PDF
Headache from using mathematical software
PPTX
Lacture 1- Programming using python.pptx
PPTX
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
PDF
Safety of 64-bit code
PPTX
Integrating Python with SQL (12345).pptx
PDF
A Collection of Examples of 64-bit Errors in Real Programs
PDF
A Collection of Examples of 64-bit Errors in Real Programs
PDF
Microprocessor and Microcontroller Interview Questions A complete question ba...
PDF
PRELIM-Lesson-2.pdf
PPTX
Python introduction towards data science
PPT
Introduction-to-Csharpppppppppppppppp.ppt
20 issues of porting C++ code on the 64-bit platform
Program errors occurring while porting C++ code from 32-bit platforms on 64-b...
20 issues of porting C++ code on the 64-bit platform
Development of a static code analyzer for detecting errors of porting program...
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
Monitoring a program that monitors computer networks
Things to Remember When Developing 64-bit Software
The static code analysis rules for diagnosing potentially unsafe construction...
About size_t and ptrdiff_t
Headache from using mathematical software
Lacture 1- Programming using python.pptx
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
Safety of 64-bit code
Integrating Python with SQL (12345).pptx
A Collection of Examples of 64-bit Errors in Real Programs
A Collection of Examples of 64-bit Errors in Real Programs
Microprocessor and Microcontroller Interview Questions A complete question ba...
PRELIM-Lesson-2.pdf
Python introduction towards data science
Introduction-to-Csharpppppppppppppppp.ppt

Recently uploaded (20)

PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Sciences of Europe No 170 (2025)
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
The scientific heritage No 166 (166) (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
BIOMOLECULES PPT........................
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Microbiology with diagram medical studies .pptx
The KM-GBF monitoring framework – status & key messages.pptx
Biophysics 2.pdffffffffffffffffffffffffff
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Sciences of Europe No 170 (2025)
AlphaEarth Foundations and the Satellite Embedding dataset
Introduction to Cardiovascular system_structure and functions-1
POSITIONING IN OPERATION THEATRE ROOM.ppt
The scientific heritage No 166 (166) (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
7. General Toxicologyfor clinical phrmacy.pptx
INTRODUCTION TO EVS | Concept of sustainability
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
ECG_Course_Presentation د.محمد صقران ppt
Placing the Near-Earth Object Impact Probability in Context
Phytochemical Investigation of Miliusa longipes.pdf
BIOMOLECULES PPT........................
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
2. Earth - The Living Planet Module 2ELS
Microbiology with diagram medical studies .pptx

Introduction to Julia for bioinformacis