Sequence Analysis and Modern C++ Hauswedell

Sequence Analysis and Modern C++ Hauswedell
install download
https://ebookmeta.com/product/sequence-analysis-and-modern-c-
hauswedell/
Download more ebook from https://ebookmeta.com

Computational Biology
Hannes Hauswedell
Sequence
Analysis and
Modern C++
The Creation of the SeqAn3
Bioinformatics Library

Advisory Editors:
Gordon Crippen, University of Michigan, Ann Arbor, MI, USA
Joseph Felsenstein, University of Washington, Seattle, WA, USA
Dan Gusfield, University of California, Davis, CA, USA
Sorin Istrail, Brown University, Providence, RI, USA
Thomas Lengauer, Max Planck Institute for Computer Science, Saarbrücken,
Germany
Marcella McClure, Montana State University, Bozeman, MT, USA
Martin Nowak, Harvard University, Cambridge, MA, USA
David Sankoff, University of Ottawa, Ottawa, ON, Canada
Ron Shamir, Tel Aviv University, Tel Aviv, Israel
Mike Steel, University of Canterbury, Christchurch, New Zealand
Gary Stormo, Washington University in St. Louis, St. Louis, MO, USA
Simon Tavaré, University of Cambridge, Cambridge, UK
Tandy Warnow, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Lonnie Welch, Ohio University, Athens, OH, USA
Editors-in-Chief:
Andreas Dress, CAS-MPG Partner Institute for Computational Biology, Shanghai,
China
Michal Linial, Hebrew University of Jerusalem, Jerusalem, Israel
Olga Troyanskaya, Princeton University, Princeton, NJ, USA
Martin Vingron, Max Planck Institute for Molecular Genetics, Berlin, Germany
Editorial Board Members:
Robert Giegerich, University of Bielefeld, Bielefeld, Germany
Janet Kelso, Max Planck Institute for Evolutionary Anthropology, Leipzig,
Germany
Gene Myers, Max Planck Institute of Molecular Cell Biology and Genetics,
Dresden, Germany
Pavel Pevzner, University of California, San Diego, CA, USA

Endorsed by the International Society for Computational Biology, the Computa-
tional Biology series publishes the very latest, high-quality research devoted to
specific issues in computer-assisted analysis of biological data. The main emphasis
is on current scientific developments and innovative techniques in computational
biology (bioinformatics), bringing to light methods from mathematics, statistics
and computer science that directly address biological problems currently under
investigation.
The series offers publications that present the state-of-the-art regarding the problems
in question; show computational biology/bioinformatics methods at work; and
finally discuss anticipated demands regarding developments in future methodology.
Titles can range from focused monographs, to undergraduate and graduate text-
books, and professional text/reference works.
More information about this series at https://link.springer.com/bookseries/5769

Hannes Hauswedell
Sequence Analysis and
Modern C++
The Creation of the SeqAn3 Bioinformatics
Library

Hannes Hauswedell
Reykjavik
Iceland
ISSN 1568-2684 ISSN 2662-2432 (electronic)
ISBN 978-3-030-90989-5 ISBN 978-3-030-90990-1 (eBook)
https://doi.org/10.1007/978-3-030-90990-1
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface
This is a book about software engineering, bioinformatics, the C++ programming
language and the SeqAn library. In the broadest sense, it will help the reader
create better, faster and more reliable software by deepening their understanding
of available tools, language features, techniques and design patterns.
Every developer who previously worked with C++ will enjoy the in-depth chapter
on important changes in the language from C++ 11 up to and including C++ 20.
In contrast to many resources on Modern C++ that present new features only in
small isolated examples, this book represents a more holistic approach: readers will
understand the relevance of new features and how they interact in the context of
a large software project and not just within a “toy example”. Previous experience
in creating software with C++ is highly recommended to fully appreciate these
aspects.
SeqAn3 is a new, re-designed software library. The conception and implementa-
tion process is detailed in this book, including a critical reflection on the previous
versions of the library. This is particularly helpful to readers who are about to create
a large software project themselves, or who are planning a major overhaul of an
existing library or framework. While the focus of the book is clearly on software
development and design, it also touches on various organisational and administrative
aspects like licensing, dependency management and quality control.
The field that SeqAn3 provides solutions for is sequence analysis or, in a broader
sense, bioinformatics. Readers working in this domain will recognise many of the
discussed problems. However, almost all content is useful to software engineers in
general and research software engineers in particular; no background in biology or
previous experience with the SeqAn library is required.
This book is based on a dissertation, so the general style is more reminiscent
of a “story” than might be typical for a computer science book. Some readers will
enjoy reading it cover to cover while others will want to jump to sections of interest
directly. The original preface of the dissertation is given on the following page as
v

vi Preface
the acknowledgements section. In addition to the persons mentioned there, I would
like to thank Martin Vingron who was part of my defence committee and suggested
this book project. I would also like to thank Susan Evans and the team at Springer
Nature for helping it become reality.
Reykjavik, Iceland Hannes Hauswedell

Acknowledgements
The SeqAn library is a very active project with a long history. Over the last more
than 10 years, it has had different core developers and many people who contributed
features and fixes. Although SeqAn3 contains almost no code from SeqAn1/2,
the experience of working on and with previous versions was invaluable in the
development of SeqAn3. I feel that it is therefore only proper to mention Andreas
Gogol-Döring, David Weese, Enrico Siragasu and Manuel Holtgrewe at this point,
all of whom contributed significantly to SeqAn1/2. Of course Knut Reinert has
always guided and does until today lead the project. His experience is the main
pillar of its continued success.
This thesis introduces a new and radically different version of the SeqAn library.
The scope of this project is huge, and it certainly would not have been possible to
create the library single-handedly in this time. I do, however, credit myself with
its inception, the vision behind the project and the endurance to pursue a complete
rewrite of the library when most people called it infeasible. The design process, the
overarching goals and the technical decisions are overwhelmingly my work—that is
the foundation of this thesis. On the practical side, I have also written and changed
more code than the next most important contributors combined, but I want to state
clearly that relevant parts of SeqAn3 have also been implemented by people other
than myself.
René Rahn has shared the responsibility of leading the project with me on a social
and administrative level. Since the early beginnings of SeqAn3, I relied strongly on
his counsel. Later, we assembled the SeqAn core team to discuss design and strategy
matters on a regular basis. This included Svenja Mehringer, Marcel Ehrhardt and
Enrico Seiler. All members of the core team have left their mark in some way on the
library, and I am confident that SeqAn3 is in good hands after I leave the project.
I would like to thank everyone who contributed to SeqAn3, but more generally
I want to also thank everyone for the great time at Freie Universität and the
unforgettable SeqAn retreats! Special thanks go to Sara Hetzel and Felix Heeger
who provided very helpful comments on a draft of this dissertation. Sara will also
continue work on Lambda, an application presented later in this thesis.
vii

viii Acknowledgements
On a professional and personal level, my sincere gratitude goes to Knut Reinert
who has been my mentor now for so many years. None of this would have been
possible without him. I would also like to express my sincere gratitude to Stefan
Kurtz who agreed to co-supervise this (quite comprehensive) thesis although we
had not worked together previously.
Attending the meetings of and contributing to the ISO C++ committee has had the
most profound influence on my understanding of C++ and has thus helped greatly
with creating SeqAn3. I would like to thank Fabio Fracassi and Nico Josuttis from
the DIN Arbeitskreis Programmiersprachen as well as Corentin Jabot and JeanHeyd
Meneide for helping me find my way around WG21.
Before working at Freie Universität, my studies were funded through a stipend
of the Max-Planck-Gesellschaft. I additionally received a fellowship by the Hans-
Böckler-Stiftung which allowed me to attend various extracurricular activities, for
which I am very grateful.
Finally, I would like to thank my parents for supporting me during my youth and
my early university studies. I am privileged to have had access to computers as a
child and to grow up in an environment that fostered my curiosity in science and
technology. I am grateful for the support of my friends and especially Romy and
Betti. I look forward to spending more time with everyone again!

Contents
Part I Background
1 Sequence Analysis .......................................................... 3
2 The SeqAn Library (Versions 1 and 2) ................................... 7
2.1 History ................................................................ 7
2.2 Design Goals ......................................................... 8
2.3 Programming Techniques ............................................ 8
2.3.1 Generic Programming ...................................... 9
2.3.2 Template Subclassing ...................................... 9
2.3.3 Global Function Interfaces ................................. 11
2.3.4 Metafunctions............................................... 12
2.4 Discussion ............................................................ 14
2.4.1 Performance................................................. 17
2.4.2 Simplicity ................................................... 19
2.4.3 Generality, Refineability and Extensibility ................ 24
2.4.4 Integration................................................... 25
2.4.5 Summary .................................................... 31
3 Modern C++ ................................................................. 33
3.1 Type Deduction....................................................... 35
3.1.1 The auto Specifier ........................................ 36
3.1.2 Class Template Argument Deduction (CTAD) ............ 39
3.2 Move Semantics and Perfect Forwarding ........................... 39
3.2.1 Move Semantics ............................................ 40
3.2.2 Reference Types and Perfect Forwarding ................. 42
3.2.3 Out-Parameters and Returning by Value .................. 43
3.3 Metaprogramming and Compile-Time Computations.............. 45
3.3.1 Metafunctions and Type Traits............................. 45
3.3.2 Traits Classes ............................................... 46
3.3.3 Compile-Time Computations .............................. 48
3.3.4 Conditional Instantiation ................................... 50
ix

x Contents
3.3.5 Standard Library Traits..................................... 51
3.4 C++ Concepts ......................................................... 52
3.4.1 Introduction ................................................. 53
3.4.2 Defining Concepts .......................................... 54
3.4.3 Using Concepts ............................................. 55
3.4.4 Concepts-Based Polymorphism............................ 57
3.4.5 Standard Library Concepts ................................. 59
3.5 Code Reuse ........................................................... 59
3.5.1 The Curiously Recurring Template Pattern (CRTP) ...... 60
3.5.2 Metaclasses ................................................. 62
3.6 C++ Ranges ........................................................... 63
3.6.1 Introduction ................................................. 63
3.6.2 Range Traits and Concepts ................................. 65
3.6.3 The View Concept .......................................... 69
3.6.4 Range Adaptor Objects..................................... 71
3.6.5 Standard Library Views .................................... 74
3.7 Customisation Points................................................. 76
3.7.1 Excursus: Calling Conventions ............................ 76
3.7.2 Introduction ................................................. 77
3.7.3 “Niebloids” ................................................. 79
3.7.4 Future Standardisation...................................... 82
3.8 Concurrency & Parallelism .......................................... 82
3.9 C++ Modules.......................................................... 83
3.10 Utility Types .......................................................... 84
3.11 Discussion ............................................................ 85
Part II SeqAn3
4 The Design of SeqAn3...................................................... 89
4.1 Design Goals ......................................................... 89
4.1.1 Performance................................................. 90
4.1.2 Simplicity ................................................... 90
4.1.3 Integration................................................... 91
4.1.4 Adaptability ................................................. 92
4.1.5 Compactness ................................................ 92
4.2 Programming Techniques ............................................ 93
4.2.1 Modern C++ ................................................. 93
4.2.2 Programming Paradigms ................................... 94
4.2.3 Polymorphism and Customisation ......................... 95
4.2.4 Aspects of Object-Orientation ............................. 96
4.2.5 Ranges and Views .......................................... 97
4.2.6 “Natural” Function Interfaces .............................. 98
4.2.7 constexpr if Possible ................................... 98
4.3 Administrative Aspects .............................................. 99
4.3.1 Header-Only Library ....................................... 99

Contents xi
4.3.2 Licence ...................................................... 100
4.3.3 Platform Support ........................................... 100
4.3.4 Stability ..................................................... 103
4.3.5 Availability.................................................. 106
4.3.6 Combining SeqAn2 and SeqAn3 .......................... 107
4.4 Dependencies and Tooling ........................................... 107
4.4.1 Library Dependencies ...................................... 108
4.4.2 Documentation.............................................. 113
4.4.3 Testing....................................................... 116
4.5 Project Management and Social Aspects............................ 122
5 Library Structure and Small Modules ................................... 125
5.1 Library Structure ..................................................... 126
5.1.1 Files and Directories........................................ 126
5.1.2 Modules and Submodules .................................. 126
5.1.3 Names and Namespaces.................................... 128
5.2 “Small” Modules ..................................................... 129
5.2.1 Argument Parser ............................................ 129
5.2.2 The Core Module ........................................... 131
5.2.3 The Utility Module ......................................... 133
5.2.4 The STD Module ........................................... 137
5.2.5 The Contrib Module ........................................ 138
5.3 Discussion ............................................................ 139
5.3.1 Performance................................................. 139
5.3.2 Simplicity ................................................... 140
5.3.3 Integration................................................... 142
5.3.4 Adaptability ................................................. 142
5.3.5 Compactness ................................................ 142
6 The Alphabet Module ...................................................... 145
6.1 General Design ....................................................... 146
6.1.1 Character and Rank Representation ....................... 147
6.1.2 Function Objects and Traits ................................ 149
6.1.3 Concepts .................................................... 152
6.2 User-Defined Alphabets and Adaptations ........................... 155
6.2.1 User-Defined Alphabets .................................... 157
6.2.2 Adapting Existing Types as Alphabets .................... 159
6.3 The Nucleotide Submodule .......................................... 161
6.3.1 General Design ............................................. 162
6.3.2 Canonical DNA Alphabets ................................. 164
6.3.3 Canonical RNA Alphabets ................................. 165
6.3.4 Other Nucleotide Alphabets................................ 166
6.4 The Amino Acid Submodule ........................................ 166
6.4.1 General Design ............................................. 167
6.4.2 Amino Acid Alphabets ..................................... 168
6.4.3 Translation .................................................. 169

xii Contents
6.5 Composite Alphabets ................................................ 170
6.5.1 Alphabet Variants........................................... 170
6.5.2 Alphabet Tuples ............................................ 173
6.5.3 Alphabet “any” Types ...................................... 176
6.6 The Quality Submodule.............................................. 178
6.6.1 General Design ............................................. 179
6.6.2 Quality Alphabets .......................................... 179
6.6.3 Quality Tuples .............................................. 180
6.7 Discussion ............................................................ 182
6.7.1 Performance................................................. 183
6.7.2 Simplicity ................................................... 184
6.7.3 Integration................................................... 185
6.7.4 Adaptability ................................................. 185
6.7.5 Compactness ................................................ 185
7 The Range Module ......................................................... 187
7.1 General Design ....................................................... 187
7.2 Container ............................................................. 189
7.2.1 Concepts .................................................... 189
7.2.2 Bit-Compressed Container ................................. 191
7.2.3 Containers of Containers ................................... 191
7.2.4 Fixed-Capacity Containers ................................. 192
7.3 Views.................................................................. 194
7.3.1 General Design ............................................. 194
7.3.2 Alphabet-Specific Views ................................... 200
7.3.3 Some General-Purpose Views.............................. 202
7.3.4 Implementation Notes ...................................... 203
7.4 Discussion ............................................................ 207
7.4.1 Performance................................................. 208
7.4.2 Simplicity ................................................... 214
7.4.3 Integration................................................... 216
7.4.4 Adaptability ................................................. 217
7.4.5 Compactness ................................................ 217
8 The Input/Output Module ................................................. 219
8.1 The Stream Submodule .............................................. 220
8.2 Serialisation .......................................................... 221
8.3 Formatted Files ....................................................... 222
8.3.1 Files and Formats ........................................... 223
8.3.2 Records and Fields ......................................... 224
8.4 The Sequence File Submodule ...................................... 225
8.4.1 Input ......................................................... 226
8.4.2 Output ....................................................... 228
8.4.3 Combined Input and Output ............................... 229
8.4.4 Asynchronous Input/Output................................ 230
8.5 Discussion ............................................................ 232

Contents xiii
8.5.1 Performance................................................. 232
8.5.2 Simplicity ................................................... 235
8.5.3 Integration................................................... 238
8.5.4 Adaptability ................................................. 239
8.5.5 Compactness ................................................ 240
9 The Search Module ......................................................... 243
9.1 The FM-Index Submodule ........................................... 244
9.1.1 Unidirectional FM-Index ................................... 245
9.1.2 Bidirectional FM-Index .................................... 248
9.2 The k-Mer-Index Submodule ........................................ 248
9.2.1 Shapes in SeqAn3 .......................................... 249
9.3 General Algorithm Design ........................................... 252
9.4 The (Search) Algorithm Submodule ................................ 253
9.4.1 Search Strategies............................................ 255
9.5 The Configuration Submodule....................................... 256
9.5.1 Excursus: Aggregate Initialisation and
Designated Initialisers ...................................... 257
9.5.2 Search Config Elements .................................... 258
9.6 Discussion ............................................................ 261
9.6.1 Performance................................................. 261
9.6.2 Simplicity ................................................... 264
9.6.3 Integration and Adaptability ............................... 268
9.6.4 Compactness ................................................ 269
10 The Alignment Module .................................................... 271
10.1 The Aligned Range Submodule ..................................... 272
10.1.1 Concepts and Function Objects ............................ 273
10.1.2 Gap Decorators ............................................. 274
10.2 The Scoring Submodule ............................................. 277
10.2.1 Alphabet Scoring Schemes................................. 278
10.2.2 The Gap (Scoring) Scheme ................................ 279
10.3 The Pairwise (Alignment) Submodule .............................. 280
10.3.1 Algorithm Interface......................................... 280
10.3.2 Alignment Result Type ..................................... 282
10.3.3 Theoretical Background and Implementation Details .... 283
10.4 The Configuration Submodule....................................... 284
10.5 Discussion ............................................................ 287
10.5.1 Performance................................................. 287
10.5.2 Simplicity ................................................... 289
10.5.3 Integration................................................... 292
10.5.4 Adaptability ................................................. 293
10.5.5 Compactness ................................................ 294

xiv Contents
Part III Lambda
11 Lambda: An Application Built with SeqAn ............................. 299
11.1 Introduction........................................................... 299
11.1.1 Previous Work .............................................. 301
11.1.2 History of LAMBDA ....................................... 302
11.2 Implementation ....................................................... 303
11.2.1 Index Creation .............................................. 304
11.2.2 Search ....................................................... 306
11.3 Results ................................................................ 307
11.3.1 Notable Features ............................................ 308
11.3.2 Performance................................................. 309
11.4 Discussion ............................................................ 313
11.4.1 From SeqAn2 to SeqAn3................................... 314
11.4.2 Algorithmic Choices ....................................... 316
Part IV Conclusion and Appendix
12 Conclusion................................................................... 321
Appendix A ....................................................................... 325
A.1 Notes on Reading This Book ........................................ 325
A.1.1 References and Hyperlinks................................. 325
A.1.2 How to Read Code Snippets ............................... 325
A.2 Software and Hardware Details...................................... 328
A.2.1 Benchmarking Environment ............................... 328
A.2.2 Helpful Software............................................ 329
A.3 Copyright ............................................................. 329
A.3.1 SeqAn Copyright ........................................... 329
A.4 Longer Code Snippets................................................ 331
A.5 Detailed Benchmark Results (Local Aligners)...................... 337
References......................................................................... 339

Part I
Background
The first part of this book lays the foundation for the remaining parts. It briefly
introduces the reader to sequence analysis, a central field in current bioinformatics
research. It then covers the design and implementation of the SeqAn library prior
to the release of version 3 and discusses in how far it was successful in achieving
its set goals. Finally, this part devotes a large chapter to explaining the recent and
not-so-recent developments in the C++ programming language and how they might
enable us to solve the current challenges in sequence analysis in more elegant and/or
more efficient ways.

Chapter 1
Sequence Analysis
Sequence analysis is a domain in bioinformatics which encompasses all computer-
aided studies of biological sequence data. This data is produced from molecules
such as DNA and RNA, which store a cell’s genetic information, and proteins, which
are the “machines” of a cell and provide a myriad of functions including signalling,
metabolism and immune response. While these biological molecules (especially
proteins) exhibit complex three-dimensional structures, they can also be represented
as linear polymer sequences of their molecular building blocks.1 These molecular
building blocks in turn are nucleotides (in the case of DNA/RNA) or amino acids (in
the case of proteins). They are the basic units of information in sequence analysis,
and the types of these units are referred to as “alphabets” in the context of computer
science.2
The type of analysis performed on such data varies greatly: it ranges from
functional analysis (e.g. “what is the purpose of this gene in the cell?”) over
comparative analysis (e.g. “how is sequence X related to sequence Y in another
or the same organism?”) to quantitative analysis (e.g. “what does the frequency of
this RNA transcript indicate regarding the activity of the cell?”). Subject of research
may be a single short sequence like a gene, the entire genome or transcriptome, or
even all genetic material in some sample. The latter is called metagenomics and is
becoming increasingly common.
Scientific domains that perform sequence analysis or that make use of sequence
analysis tools are even more diverse. They include most areas of modern biological
research, because the need to understand genetics and evolutionary processes has
become pervasive. But sequence analysis has also come to influence fields such as
ecology where it is used to assess the microbial diversity and its response to certain
perturbations (Mackelprang et al., 2011). This research in turn has far-reaching
1 The three-dimensional structure as well as the connection between sequence representation and
three-dimensional structure is the subject of structural bioinformatics.
2 Chapter 6 discusses them in detail.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
H. Hauswedell, Sequence Analysis and Modern C++, Computational Biology,
https://doi.org/10.1007/978-3-030-90990-1_1
3

4 1 Sequence Analysis
implications for other fields such as climatology. In the realm of medical research,
sequence analysis is central to identifying genetic markers for hereditary diseases
(Liu et al., 2019) as well as cancer (Banerji et al., 2012). It is becoming more and
more important for analysing the human microbiome (Turnbaugh et al., 2007) and
its contribution to human health. And it is also part of infectious disease research and
treatment, both, for detecting contagion in a sample (Ho & Tzanetakis, 2014) and
in developing vaccines (Maiden, 2019). Through its role in developing genetically
modified organisms (GMOs), sequence analysis contributes to further fields such as
agriculture, industrial processing and energy production.
The substance of all sequence analysis is the sequence data. This data is
generated by different (bio-)technological methods and the properties of these
techniques have a profound effect on the types of analysis technically possible and
economically feasible. Especially, the technological leaps in DNA/RNA sequencing
have dwarfed progress in other scientific domains:
[T]he first whole human genome sequencing in 2000 [...] cost over $3.7 billion and took
13 years of computing power. Today, it costs roughly $1000 and takes fewer than three days.
With trillions of genomes waiting to be sequenced, both human and otherwise, the genomic
revolution is in its infancy. (Bannon, 2014)
The decline in cost over the years for sequencing one human genome is
displayed in Fig. 1.1. This is given as a general indicator for the trend of sequencing
costs although—as noted above—attaining a genome is not always the goal and
other forms of sequencing are even cheaper, e.g. species identification through
the so-called barcoding. While the price curve has flattened in recent years,
new sequencing technologies promise to produce longer sequencing reads which
improve the quality of some and enable new research areas (Pollard et al., 2018).
It is important to note the logarithmic scale of the Y-axis in Fig. 1.1 and the
expected progress suggested by Moore’s Law which vaguely indicates development
of computing power in the same time.3
This connection between progress in sequencing technologies and computing
power is very important, because decreasing prices imply increasing availability of
sequencing data and corresponding growth of sequence databases. Many problems,
like searching for all homologues (“related sequences”) of a given sequence, grow
in computational complexity with the size of the database. Often this relationship
is even super-linear, i.e. searching a database twice the original size is more than
twice as difficult for the computer. And, as Fig. 1.1 indicates, sequence data grows
at orders of magnitude faster than the capabilities of computer hardware, so solving
well-known problems becomes more and more costly over time. To counter this
trend, high-performance sequence analysis software needs to be developed that
reduces complexity on an algorithmic level.
The increasing diversification of research areas using sequence analysis and
the progress of sequencing technologies have led to many new research questions
for which equally many new applications have been published. Developing these
3 More on Moore’s Law and why effective speed-ups may even be lower in Sect. 2.4.1.

1 Sequence Analysis 5
Fig. 1.1 Decline in the sequencing cost of a human genome. Note the log-scale Y-axis and
the expected decline based on Moore’s Law. Public domain image. Courtesy: National Human
Genome Research Institute
sequence analysis applications is a scientific area of its own4 and significant
resources go into developing novel applications—either to solve new problems
or to solve existing problems more efficiently. However, the main algorithmic
steps in most of these applications are very similar (Gogol-Döring, 2009), e.g. the
reading and writing of common file formats, the indexing of large databases and the
computation of sequence alignments.
Thus, software libraries can help reduce the cost of creating new applications.
Software libraries are pre-written program code, mostly algorithms and data
structures, that can be used by applications to perform such frequent tasks for them.
Since library code is shared between many applications and often reused, more time
is invested into quality control and performance optimisations; this leads to better
applications. And because the full implementation of complex algorithms can be
hidden behind a simple interface, using libraries enables less-versed programmers
to solve difficult problems. This is especially important in bioinformatics where
application developers are often domain specialists but not experts in software
engineering.
This book is about SeqAn, a software library written in C++ , that covers the most
important areas of sequence analysis and enables bioinformaticians to create high-
performance solutions to existing and new challenges.
4 Many consider it part of research software engineering, see also Gesellschaft für Forschungssoft-
ware (2018).

Chapter 2
The SeqAn Library (Versions 1 and 2)
This chapter gives a brief overview of the SeqAn library, important design goals and
programming principles, as well as an analysis of in how far these were reached.
I will discuss all aspects that I deem necessary to understanding the design and
development process of SeqAn3, but I strongly recommend reading the original
SeqAn publication (Döring et al., 2008), the publication documenting the second
major release of SeqAn (Reinert et al., 2017) and the doctoral thesis of Andreas
Gogol-Döring (Gogol-Döring, 2009) that explain the original motivation and design
choices in detail.
2.1 History
The SeqAn library is being developed primarily in Knut Reinert’s groups at Freie
Universität Berlin and Max Planck Institute for molecular Genetics, but it has
contributors from many other research groups in Berlin and around the world.
Since moving to a public Git repository in 2015, the number of contributions from
individuals not affiliated with Knut Reinert’s lab or cooperation partners has grown
steadily. The most important events are shown in Table 2.1.
The author of this book knows the library since his undergraduate thesis in 2009
and has worked with it in different roles since. As of SeqAn-2.1, he is the shared
project lead and release manager (together with René Rahn). He is the main architect
of the SeqAn3 library.
Since SeqAn is a software library, SeqAn’s history is of course also the history
of the applications built with SeqAn. It is beyond the scope of this work to cover
all these applications, but several noteworthy examples are Bowtie (Langmead
et al., 2009), TopHat (Trapnell et al., 2009), DELLY (Rausch et al., 2012),
FLEXBAR (Dodt et al., 2012), RazerS (Weese et al., 2012) Mason (Holtgrewe,
2010), Stellar (Kehr et al., 2011) and SLIMM (Dadi et al., 2017). Part III discusses
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
H. Hauswedell, Sequence Analysis and Modern C++, Computational Biology,
https://doi.org/10.1007/978-3-030-90990-1_2
7

8 2 The SeqAn Library (Versions 1 and 2)
Table 2.1 A brief history of
important SeqAn events
Year Event
2008 SeqAn-1.0; first publication (Döring et al., 2008)
2009 Doctoral thesis of Gogol-Döring (2009)
2013 SeqAn-1.3; with significant changes
2015 SeqAn-2.0; move to GitHub
2016 SeqAn-2.1; follows semantic versioning
2017 Second publication (Reinert et al., 2017)
2018 SeqAn-2.4; last feature release of 2.x series
2019 SeqAn-3.0
2020 planned: SeqAn-3.1 (stable) and third publication
Lambda (Hauswedell et al., 2014), an application developed by the author based on
SeqAn.
2.2 Design Goals
In his dissertation, Gogol-Döring (2009) defined the over-arching goals of the
library as being an instrument of engineering (“Enabling the rapid development
of efficient tools [.]”) and as being academic/instructive (“Promoting the design,
comparison and testing of algorithms[.]”).
He formulated the concrete design goals as (all quotes by Gogol-Döring, 2009)
Performance “[. . . ] designed to produce code that runs as fast as possible”.
Simplicity “All parts [. . . ] are constructed and applicable as simple as possible.
[sic]”.
Generality “All parts [. . . ] are applicable in as many circumstances as possible”.
Refineability “Whenever a specialization is reasonable, it is possible to integrate
it easily[.]”.
Extensibility “[It] can always be extended without changing already existing
code”.
Integration “[It] is able to work together with other libraries and built-in types”.
2.3 Programming Techniques
To achieve the previously defined goals, Gogol-Döring describes the following
programming techniques:
C++ “We decided to implement SeqAn in C++ , because performance is among
our main goals [. . . ] and the extended features of C++ , namely templates[. . . ],
are well suited to an excellent library design.” (Gogol-Döring (2009))

2.3 Programming Techniques 9
Generic Programming “Generic programming designs algorithms and data struc-
tures in a way that they work on all types that meet a minimal set of requirements
[. . . ] [, it] promotes the generality of the library.” (Gogol-Döring (2009))
Template Subclassing This term is used by Gogol-Döring to describe a kind of
polymorphism based on partial template specialisation and function overloading
using partially specialised template parameters.
Global function interfaces Functions declared at namespace scope (instead of as
members of a class) are called global functions by Gogol-Döring, otherwise
often also known as free functions. The use of free functions for polymorphism
is required by generic programming, but SeqAn extends this approach even to
object interfaces.
Metafunctions A “metafunction” is described by Gogol-Döring as an entity that
“returns” for a given type or constant another type or constant (at compile-time).
SeqAn1/2 uses metafunctions not only as observers of the properties of a type
but also as modifiers of these properties.
All of these points are elaborated on in the doctoral thesis of Gogol-Döring. I will
cover the C++ programming language extensively in Chap. 3 but want to guide the
reader through the remaining techniques in the following sections as it is important
to understand the specifics of SeqAn1/2 to comprehend (and appreciate) the changes
in SeqAn3.
2.3.1 Generic Programming
Generic programming is a paradigm that became popular in the C++ community
later than object-oriented programming (OOP) and its goal is to overcome some
(performance) problems of OOP (Duret-Lutz et al., 2001). It is facilitated through
the use of function and class templates and it is strongly associated with static poly-
morphism (see below). Beside performance, the main goal of generic programming
is the reuse of code within a code base and interoperability with user-defined types:
Generic programming recognizes that dramatic productivity improvements must come from
reuse without modification, as with the successful libraries. Breadth of use, however, must
come from the separation of underlying data types, datastructures, and algorithms, allowing
users to combine components of each sort from either the library or their own code. (Dehnert
& Stepanov, 2000)
2.3.2 Template Subclassing
Polymorphism is a key feature in most programming languages and is part of
different programming paradigms. Bjarne Stroustrup defines it as “providing a

single interface to entities of different types”.1 In Snippet 2.1,2 I present an example
(adapted from the example in Gogol-Döring (2009)):
Code snippet 2.1: Polymorphism in object-oriented programming vs template
subclassing. Adapted from “Listing 2” in Gogol-Döring (2009). Neither is valid
SeqAn code
• Given a container of integers, there shall be a find() function that finds the
position of the first occurrence of a given integer in that container.
• The trivial solution is to do a linear-time scan over the container.
• But for containers that are ordered, such a search can be performed in logarithmic
time; for these containers, a more refined algorithm should be selected.
• Furthermore, a polymorphic interface should be able to handle objects of base
type and the derived type.
In object-oriented programming, polymorphism is implemented via inheritance
and virtual member functions; derived classes inherit from base classes. Pointers
1 http://www.stroustrup.com/glossary.html.
2 Please see Sect. A.1.2 for notes on how to read code snippets in this thesis.

and references to the base type can also bind objects of the derived type, so one
can pass an object of type IntMap to print_idx_of() in Snippet 2.1. When the
find() member function is invoked, a virtual function lookup selects the most
refined implementation at runtime. Because the selection happens at runtime, this
form of polymorphism is also called dynamic polymorphism.
In generic programming on the other hand, polymorphism is implemented via
templates and (free) function overloading. The selection of the best/most refined
implementation happens at compile-time, so it is called static polymorphism. Since
it happens at compile-time, static polymorphism is notably faster than dynamic
polymorphism (Driesen & Hölzle, 1996), which is the reason SeqAn prefers it.
Template subclassing is one “style” of static polymorphism (there are others).
Instead of through inheritance, a base template is defined and derived classes
are modelled as template specialisations of that template. The so-called tag types
are often used to denote such specialisations.3 Generic functions are then also
implemented as free/global function templates with some template parameters
“fixed”. If an overloaded free function is invoked, the overload that is most refined
is picked by the compiler.
Both of the mentioned styles have in common that one can refine arbitrarily
often/“deep” (in the case of template subclassing by making the tags also be
templates that are further specialised). They also share that the polymorphism is
restricted to one’s own types, i.e. one needs to explicitly inherit from the respective
base class (dynamic polymorphism) or specialise the respective base template
(template subclassing); one cannot plug in foreign types, e.g. from a different library
(more on this in Sect. 3.4).
2.3.3 Global Function Interfaces
As previously explained, generic algorithms have to be implemented as free
functions in the generic programming paradigm. This is, however, not true for
all functions. For a long time, the C++ standard library has provided algorithms
as free functions, but it still implemented most other functions (that are related
to the properties of an object more closely) as member functions. For example,
std::find() is a generic free function that can be called with different containers
(or more precisely their iterators) as arguments, but .size() is a member function
of the respective container.
In later revisions of the C++ standard (C++ 11, C++ 17), the standard library picked
up free function wrappers for many of these member functions, e.g. std::begin(),
std::end(), std::size(), std::empty(). The reasoning is that although the
3 They have no other purpose and are usually optimised out of the final code entirely.

functions are seen as accessing properties of the object and not as free-standing
components, working with a free function is more flexible in a generic programming
context.
If for example a generic algorithm needs to know an object’s size, it would
previously always look for a .size() member function. This works if all input
types of the algorithm are designed together with the algorithm, but it will fail if a
user provides a type from a different library which happens to provide a .length()
member and not .size(). If one’s algorithm instead looks for a free function
size(obj), the user of the library can provide a custom wrapper around the other
library’s type so that it will satisfy the requirements of the algorithm without needing
to be changed (“reuse without modification”; Dehnert & Stepanov, 2000).
SeqAn has used this style since its inception, however in a more radical fashion
where practically all functions are free functions. They are not even wrappers around
member functions but directly access the state of an object (e.g. seqan::length()
directly accesses respective data members). This is a notable difference to the
standard library that provides encapsulation on an implementation level (the actual
functionality is implemented as members) and only exposes these member functions
via free function wrappers.
The implications of this for the general library design are important to note. On
the one hand, the users are able to overload implementation details that might other-
wise be considered private, granting a higher level of extensibility/refineability;
on the other hand, this can introduce subtle changes in other parts of the library
that rely on the previously defined behaviour. In effect, the definition of how a type
behaves becomes highly non-local, because essential functions can practically be
overridden from anywhere in the library or even in application code.
2.3.4 Metafunctions
What we need therefore is a mechanism that returns an output type (e.g. a value type) given
an input type (e.g. the string) [...]. Such a task can be performed by metafunctions, also
known as type traits [...]. A metafunction is a construct to map some types or constants to
other entities like types, constants, or objects at compile-time. (Gogol-Döring, 2009)
Especially the last sentence of the quote articulates well the mechanism behind
metafunctions/type traits. Note that I would not equate the terms metafunction and
type trait entirely, and I prefer using the latter (see also Sect. 3.3).
Following a similar argument as in the previous section, Gogol-Döring argues
that it is beneficial to have “global” type metafunctions (e.g. seqan3::Value<T>::
Type) over relying on a type’s member types (e.g. T::value_type). The
C++ -standard adopted this style much later and a transformation type trait
that does exactly what Snippet 2.2 does will be included in C++ 20 under the
name std::ranges::range_value_t<T>. Note that this is a wrapper and
that, similarly to the free functions and in contrast to SeqAn1 and SeqAn2, the

actual implementation is provided by the type as a member, i.e. in most cases
::value_type.
A notable difference of the style used in SeqAn and the (modern) standard
library is that in SeqAn metafunctions are not only used as accessors but also as
modifiers.4 This means they do not simply expose certain (type) properties but can
be specialised/overloaded to change the properties that are exposed for existing
type(s):
Code snippet 2.2: “Listing 4: meta functions [sic] example” from Gogol-Döring
(2009)
SeqAn offers the metafunction Size [...]. This type is by default size_t, and it is
hardly ever changed by the user, so it is not worth to specify it in another template argument.
Nevertheless [...][,] it is possible to overwrite the default with a new type [...] by defining
a new specialization of the metafunction Size. (Gogol-Döring, 2009)
The quote suggests that this “feature” was initially reserved for manipulating only
obscure properties of types, and however later the design was adopted throughout
the library and is even taught in the beginner’s tutorial for working with suffix arrays:
All Indices in SeqAn are capable of indexing Strings [...] up to 264 characters. [...][If] the
text to be indexed is shorter, e.g. it does not exceed 4.29 billion (232) characters[...], one
can reduce the memory consumption of an Index by changing its internal data types, with
no drawback concerning running time. [...]
In order to change the size type of the suffix array entry we simply have to overload the
metafunction SAValue:
4 There are customisation points in the standard library that involve specialising a type trait, e.g.
std::tuple_size, but they are very few and clearly marked as such. It is also explicitly
stated that such specialisations may only affect newly defined types and not manipulate the traits
of existing types (ISO/IEC 14882:2017, 20.5.4.2.1).

https://seqan.readthedocs.io/en/master/Tutorial/DataStructures/Indices/StringIndices.html
The implications of this are similar to the implications of being able to overload
functions that manipulate the behaviour of existing types (see the previous subsec-
tion). Another non-obvious implication of the “global type trait modifiers” is that
they are indeed “global”: once one overrides the SAValue type, it affects all indexes
over the respective text type and one cannot create indexes over the same text type
with different traits—as would be possible if SAValue were a template parameter
of the index.5
2.4 Discussion
Measuring the impact of the SeqAn library accurately is not easy. In general,
research software has a hard time being properly attributed in many domains of
science (Soito & Hwang, 2016). Even though citable publications have always
existed for SeqAn, many instances have become known were software that uses
SeqAn does not properly cite it, instead placing only link to the project homepage
(Dröge et al., 2014) or not even that.
I also assume that the number of instances not known is far greater, since being
a software library (and not an actual application) makes the contribution to research
even less visible for many biologists and bioinformaticians. There are neither clear
guidelines for citing software libraries nor enforcement of such practices by major
journals (Soito & Hwang, 2016).
I would still maintain that the SeqAn library has been a big success. Some of
the most highly cited bioinformatics applications released in the last decade make
use of SeqAn, among them are Bowtie (Langmead et al., 2009), Tophat (Trapnell
et al., 2009) and DELLY (Rausch et al., 2012). Furthermore, the team around
SeqAn published applications based entirely on the SeqAn library that outperformed
state-of-the-art competitors, often by multiple factors, e.g. RazerS (Weese et al.,
2012), Masai/Yara (Siragusa et al., 2013; Siragusa, 2015) and Lambda (Hauswedell
et al., 2014). SeqAn has also been used outside the domain of bioinformatics and
computational biology, e.g. in image processing/text recognition (Yoon et al., 2016).
Gogol-Döring (2009) analysed existing C++ sequence analysis libraries, includ-
ing BATS (Giancarlo et al., 2007), Bio++ (Guéguen et al., 2013), BTL (Pitt et al.,
5 In practice, it is possible to workaround this limitation by defining different text type special-
isations and then defining different SAValue specialisations for each. This implies substantial
changes to the application code.

2.4 Discussion 15
2001), libsequence (Thornton, 2003), the NCBI C++ Toolkit (Vakatov et al., 2003)
and SCL (Vahrson et al., 1996). Out of these, only libsequence and Bio++ have
had bug-fix releases in the last two years and only libsequence received new
features. Development of the remaining libraries seems to have stalled. In the
meantime, some important new libraries have been published, most of which are
specialised and perform only a subset of SeqAn’s features. A popular example is
htslib, a library factored out from Samtools (Li et al., 2009), more on how SeqAn
compares to htslib below. One of the few libraries aiming at a broader feature set
is SeqLib (Wala & Beroukhim, 2017). It compared favourably against SeqAn in
some published benchmarks, and however, it was later shown that the authors had
built SeqAn in Debug mode, skewing the results in their favour.6 There is notably
less development lately and usage by other projects is insignificant compared with
SeqAn. libgenometools is a C library developed together with the GenomeTools
application (Gremme et al., 2013). Its feature set overlaps with SeqAn to a certain
degree and it has some unique features (e.g. for data visualisation), but it has seen
no release and almost no commits in 2018 and 2019.
On the other hand, SeqAn has had a continuous stream in contributions and a
notable increase of contributors over the years. Contributions have come not only
from labs closely associated with SeqAn like the Reinert lab but also from external
researchers and developers all over the world. SeqAn picked up an (optional)
update notification system with version 2.3.0. By aggregating and evaluating the
requests received from applications, one can now get rough estimates of library
usage. Plotting the approximate locations of the requesting IP addresses (resolved
via geolocation) yields a map as in Fig. 2.1. It should be noted that this service is
Fig. 2.1 Locations of SeqAn-based applications that performed update requests. Automatically
generated image, which includes content licensed under cba by OpenStreetMap contributors
6 https://github.com/walaj/SeqLib/issues/12.

Fig. 2.2 Usage and user numbers reported during one year. Automatically generated image
entirely optional, many SeqAn-based applications do not make use of the argument
parser (which is the component that triggers the request), and major operating
system vendors like Debian GNU/Linux and derivates like Ubuntu deactivate the
respective feature by default. So the data always only displays a subset of SeqAn
use-cases, but it is still impressive to see the number of unique new users climb over
time (Fig. 2.2).
SeqAn as a project is also part of multiple networks and initiatives. Together
with OpenMS (Röst et al., 2016), KNIME (Berthold et al., 2007) and others, it
constitutes the Center for Integrative BioInformatics (CIBI), which is a node in the
German network for Bioinformatics (de.NBI, Tauch & Al-Dilaimi (2017)), which
in turn is the German part of the ELIXIR network (Crosswell & Thornton, 2012).
Besides these publicly funded initiatives, SeqAn has had research and development
cooperation with important (hardware) companies like NVIDIA and Intel, being
at times both an NVIDIA CUDA Research Center7 and an Intel Parallel Compute
Center.8 Kristina Kermanshahche, Chief Architect of Intel Health & Life Sciences,
announced the latter by saying “ Intel regards SeqAn as a very promising software
package that has all the right ingredients to considerably speed up Next Generation
Sequencing analysis[.]”.9 According to Prof. Dr. Knut Reinert, he acquired total
funding of close to three million euros in SeqAn-related grants over the last 10
years.
7 https://developer.nvidia.com/academia.
8 https://software.intel.com/en-us/ipcc.
9 https://www.fu-berlin.de/en/presse/informationen/fup/2015/fup_15_285-professor-reinert-leitet-
intel-parallel-computer-center/index.html.

2.4 Discussion 17
All in all these facts add up to SeqAn being a success story and the involved
researchers have ample reason to be proud. I would still like to reflect self-critically
on the original design goals and decisions in the next sections. If some criticism
reads as overly harsh, this is not to diminish the achievements of SeqAn1/2 but to
raise the awareness of the reader for areas of potential improvement.
2.4.1 Performance
Performance, usually measured in execution speed but sometimes also in memory
usage, has always been the stated primary goal of SeqAn (Gogol-Döring, 2009).
Considering the challenges discussed in Chap. 1, this focus is and remains com-
pletely valid. And in fact the performance of SeqAn has been excellent in all
important areas including Input/Output, Indexed Search and Alignment.
SeqAn supports many typical bioinformatics file formats for Input/Output,
including FASTA, FASTQ, VCF, SAM and BAM. The performance of I/O is
frequently cited as a main bottleneck in many data evaluation pipelines (Buffalo,
2015; Kosar, 2012). Routinely comparisons performed by third parties confirm that
SeqAn performs very well, often better than the reference implementations, see
Table 2.2.
Another core part of SeqAn is full-text indexing, including q-gram/k-mer
indexing, suffix arrays and FM-indexes. It allows for efficient searching of large
databases and is a core part of read mappers and aligners alike. After the first
release, significant contributions to this part of the library were made by Weese
(2013), Siragusa (2015) and Pockrandt et al. (2017). As shown in Table 2.3,
SeqAn’s wavelet-tree-based FM-indexes are already very competitive. However,
EPR-dictionaries (an FM-index type first available in SeqAn) deliver even higher
speed-ups (more on this in Sect. 9.1).
The third pillar of SeqAn for which performance is crucial is sequence alignment.
Sequence alignment is a part of almost all traditional sequence analysis tools, and
SeqAn can perform all manner of different alignment algorithms (Needleman &
Wunsch, 1970; Smith & Waterman, 1981 and many variations thereof) via its
generic alignment module (Rahn et al., 2018). It also offers an implementation
of a more specialised algorithm for edit-distance alignments (Myers, 1999). After
the significant structural work by Rahn et al. (2018), this module displayed huge
performance gains (see below).
Table 2.2 Parsing 2.2GiB of
simulated reads in the BAM
format
Bamtools htslib SeqAn PySAM
Time [s] 39 s 31 s 17 s 59 s
These results are taken from a third party benchmark
performed on the current program versions in 2016:
https://github.com/wilzbach/bam-perf-test

Table 2.3 Performance of different FM-indexes. This is part of Table 1 from Pockrandt et al.
(2017) and only given here to illustrate the speed-up of SeqAn’s new implementation (2EPR, in
bold) over the original implementation (2WT) and competitors (2SDSL and 2SCH)
DNA Murphy10 IUPAC Protein
Index Time Factor Time Factor Time Factor Time Factor
2WT 9.32s 1.00 19.15s 1.00 23.44s 1.00 28.83s 1.00
2EPR 4.69 s 1.99 5.78 s 3.31 5.67 s 4.13 6.21 s 4.64
2SDSL 12.21 s 0.76 20.58 s 0.93 24.43 s 0.96 29.76 s 0.97
2SCH 14.08 s 0.66 22.18 s 0.86 26.11 s 0.90 31.81 s 0.91
These results show that the general strategy and design decisions were the right
ones to achieve a high performance. But a notable dimension of performance was
not addressed by the original SeqAn release at all: parallelism/concurrency. In fact
the term “parallel” appears in none of the original publications (Gogol-Döring,
2009; Döring et al., 2008; Reinert et al., 2017). Parallelism is important because
the hardware that is being programmed for has changed dramatically in the last
years. The observation called “Moore’s law” describes the doubling of the number
of transistors in dense integrated circuits every one or two years (Moore, 1965). It is
often misunderstood to mean the doubling of “CPU speed” or even raw CPU clock
speed, because this used to be strongly correlated. Since a few years now, this has
not been the case as Sutter (2005) explains well:
Over the past 30 years, CPU designers have achieved performance gains in three main areas
[...]
• clock speed
• execution optimization
• cache
[...] Speedups in any of these areas will directly lead to speedups in sequential
(nonparallel, single-threaded, single-process) applications, as well as applications that do
make use of concurrency. [...] CPU performance growth as we have known it hit a
wall[.] [...] Applications will increasingly need to be concurrent if they want to fully
exploit continuing exponential CPU throughput gains[.] (Sutter, 2005; emphasis is mine)
SeqAn1 not offering parallelised algorithms does not mean that one could not
have parallelism in applications based on SeqAn, it was simply the philosophy of
the library that any parallelism should be implemented application-side. This shifted
slightly with the introduction of parallel BAM I/O during a later release of SeqAn1,
and much later in with the release of SeqAn-2.3 where parallelised and vectorised
alignment code was added (Rahn et al., 2018). This yielded impressive speed-ups
as can be seen in Fig. 2.3.
Pivotal to this change of philosophy was the realisation that certain forms of
desirable parallelism are impossible or too difficult to achieve by SeqAn’s users.
And with growing relevance, it could not be left up to the individual application
developer. Instead, important interfaces should directly offer access to high-level
parallelisation. These changes were important to preserve SeqAn’s status as a widely

2.4 Discussion 19
Fig. 2.3 Speed-up of alignment computation with threads and SIMD. Alignments per second (a/s)
given for 2M 150bp Illumina reads at different threads and with/without AVX2. Image kindly
provided by René Rahn
recognised performance-oriented bioinformatics library, but they were applied ex-
post, and there was no clear strategy of implementation (some parts of SeqAn relied
on OpenMP (Dagum & Menon, 1998), others on C++11 threads and others on Intel
TBB (Pheatt, 2008)). Furthermore, the user-visible interfaces to parallelised features
were not uniform: some aspects were controllable by runtime parameters, others by
tags and others only via C macros or even shell environment variables.
It is clear that a successor to SeqAn1/2 would need to address parallelisation
head-on and provide clear interfaces that enable users to easily choose between
different levels of parallelisation.
In this context it should be mentioned that in the quest for even better perfor-
mance, SeqAn developers put significant effort into targeting high-performance
processors other than the CPU. Enrico Siragusa developed support for CUDA,
which targets NVIDIA graphics processors (Nickolls et al., 2008), and Marcel
Ehrhardt developed support for the Intel Xeon Phi Co-processor. For different
reasons, none of these approaches ultimately led to usable applications. It remains
to be seen whether it is feasible for a generic library to support such specialised
devices.
2.4.2 Simplicity
The second goal formulated for SeqAn is Simplicity. This refers to both learning
how to use the library and the ability to contribute to and maintain it continuously.
While one might argue that SeqAn has been as simple as possible (under the primacy
of performance and the constraints of C++ 98), I would argue that it was everything
but simple.

A very steep learning curve is one of the criticisms heard most often about
SeqAn, and my personal experience in teaching students and new members of
the SeqAn team over the course of multiple years confirms this. Even experienced
C++ developers struggle in understanding and contributing to SeqAn1 and SeqAn2.
This is the direct result of the programming techniques described above
(Sect. 2.3). Some are difficult to apply in their own regard and some lead to
secondary problems.
Non-locality
As mentioned in Sect. 2.3, the core implementation of a type in SeqAn1/2 is often
not part of the type itself but implemented as free functions. These are not defined
in the same header file as the type if a less specialised template/overload provides
the functionality (which is the design for avoiding code duplication). The result is
strong fragmentation of the implementation that is very difficult to track. This is
reinforced through complex specialisation hierarchies that are not obvious from the
code or the documentation and many intermediate layers of function wrappers and
shims that obscure the call-graph. To add more complexity to the matter, header files
in SeqAn1/2 do not include those headers that they require—which would give a
hint on where to look for “inherited” functionality. Instead, there are singular “meta-
includes” for every module and the headers inside the modules have no includes.
When attempting to understand the mechanics of a specific type in SeqAn1/2,
one routinely has to open a debugger and step through the called functions,
often jumping between multiple files. Understanding the path of template type
instantiation (e.g. answering the question “which specialisation of metafunction X
is selected for my type Y?”) is even more difficult, because the “trick” using the
debugger is not available for metafunctions.
Code Complexity and Feature Creep
Feature creep describes the continuous and excessive growth of features in a piece of
software or hardware resulting in it becoming more difficult to use and/or less stable
(Sullivan, 2005). Since SeqAn1/2 was developed in a single repository together with
custom tooling and many applications,10 the policy was that any code that might be
useful to more than one application should become part of the library. Furthermore,
the process of integrating a new application was very liberal, some applications
being the results of small student projects or proof-of-concepts. At its height, the
repository contained close to 40 applications (it has since been reduced to 28).
This combination led to a strong growth of the library code base and the
incorporation of many features with little relevance to the general user base. The
10 This is a problem in its own right.

2.4 Discussion 21
number of modules in SeqAn2 increased to currently 48, containing a total of 706
header files and 181,000 lines of code.11
In the absence of clear policies and without project members dedicated to
maintenance, modernisation and code quality, the general complexity of the code
base increased significantly. This reflects the “second law of software evolution”
formulated by Lehman (1980): “As an evolving program is continuously changed,
its complexity, reflecting deteriorating structure, increases unless work is done to
maintain it or reduce it.”.
Unconstrained Templates
I elaborated on template subclassing in Sect. 2.3, and while in general this
has been the type of polymorphism in SeqAn1/2, there are in fact also higher
abstraction levels. One example is that specialisations of seqan::String<>,
seqan::StringSet<> and seqan::Segment<> are all considered “sequences”.
Since they do not share a common base template, one cannot easily create a generic
function that accepts exactly the specialisations of all three. The easiest way to
write a function that accepts at least the specialisations of all three is to write an
entirely unconstrained template (that formally accepts any type). This can be seen
for begin() and end() defined in sequence_interface.h, but there are many
more unconstrained templates in SeqAn1/2.
An effect of unconstrained templates is that misuse of the interface is not reported
immediately. Instead, a compiler-error happens much further down the call-graph
when an unsupported operation is called on the falsely given type (or possibly even
a dependent type of that type). These kinds of error messages tend to be very long
(often spanning multiple pages) and hard to understand (the error highlighted by the
compiler seems to be entirely unrelated to the problem).
Another issue is that unconstrained templates increase the non-locality described
above and make it harder to search for the relevant overloads in the code base. They
also interact with implicit conversion in ways that are unexpected to many users
which is why the CPP Core Guidelines have a rule against them (see the reference
for an example).12
11 SeqAn1 was split into core and extra with the intent being that core should be held to
higher quality standards and extra be more of testbed for the actual library. But when I became
involved more strongly with the library, this separation had already weakened significantly and
core had strong dependencies on extra rendering the differentiation meaningless. They were
merged for SeqAn2.
12 “T.47: Avoid highly visible unconstrained templates with common names”.
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rt-visible.

Documentation
The documentation of a software library is an integral factor of its maintainability
and its ease of use (Geiger et al., 2018). Documentation includes API documentation
(documents describing the interfaces of classes, functions, etc.), Tutorials, ReadMes,
Wikis and possibly other resources that help in using the software. For libraries,
API documentation is the most important aspect of documentation as it is the
primary way users learn about features of the library and interact with the individual
components. It should not be necessary for users to look at the source code
of a library to use it, and the API documentation should provide all necessary
information.
API documentation is typically written inside the source code as comments (in a
certain style or markup language). These comments usually precede the entity that
they document or are found in its proximity. Third party software then generates
readable documentation (e.g. in HTML or PDF format) from the comments, often
also performing rudimentary parsing of the source code and enforcing that the
documentation matches the actual interfaces defined by the code.
The most common documentation generator for C++ software is Doxygen (van
Heesch, 2008) which uses a syntax similar to Javadoc (Kramer, 1999), one of the
earliest documentation generators. When SeqAn was first developed, the authors
came to the conclusion that Doxygen would perform poorly on SeqAn (due to the
unorthodox programming techniques) and decided to develop their own system:
dddoc. It was part of the first SeqAn release and is briefly described in Gogol-
Döring’s dissertation (Gogol-Döring, 2009). I cannot judge whether developing a
custom documentation generator was the most sensible option at the time, but it did
increase the burden to contribute to SeqAn, especially since the syntax was very
different from the well-known examples Doxygen and Javadoc. The code generator
also performed no parsing of the source code; documentation entries were parsed
completely independent of context. This increased the chance of documentation
error and contributed to non-locality (documentation of an entity could be in an
entirely different place than the entity itself). Furthermore, there was no method of
enforcing that an entity be documented at all and casual examinations of the SeqAn-
1.0 source code show that many were not.13
During the development of SeqAn2 an entirely new, stand-alone documentation
generator, called dox, was created (Kahlert, 2015, see Fig. 2.4). This improved over
dddoc in that its syntax was modelled after Doxygen and the visual appearance of
the generated documentation was much more modern. However, the core problems
mentioned above, the independence of the documentation and code as well as the
lack of policy (enforcement) in regard to the completeness of the documentation,
were not solved. The documentation as a whole was not able to explain the
techniques of SeqAn well enough to make it appear like more traditional C++ . For
13 It is not clear whether this was a lack of “enforcement” or a general lack of policy in this regard.

2.4 Discussion 23
Fig. 2.4 Screenshot of the API documentation of SeqAn-2.4 (built with dox). Screenshot taken by
me; content is part of SeqAn documentation, see Sect. A.3.1
Table 2.4
Source-lines-of-code and
comment-lines-of-code in
different SeqAn releases
SeqAn release SLOC CLOC CLOC in %
SeqAn-1.0 88,332 36,578 29.28
SeqAn-2.0 168,488 94,635 35.97
example, template subclassing was explained similarly to inheritance, but typical
documentation of the latter, like inheritance graphs, was notably absent.
To put the matter of completeness of documentation into perspective, I have
given the source-lines-of-code and the comment-lines-of-code for the respective
.0-releases in Table 2.4. These were measured with the cloc tool and only the
library folder was considered.14 Care should be taken when using these numbers
to compare different projects, but considering that certain style decisions (e.g. the
maximum line width and when/where to break lines) have remained constant from
SeqAn1 until SeqAn3, they do have some descriptive value. The numbers are
discussed and compared with SeqAn3’s in Sect. 4.4.2.
Later criticism notwithstanding, it should be noted that the relative amount of
comments in all SeqAn releases is well above average. The OpenHUB platform,
which performs statistics and analytics of open source software projects and covers
almost 500,000 projects, shows an average of 22% comment-lines-of-code for C++
14 https://github.com/AlDanial/cloc/.

projects.15 And there is reason to believe that academic software is usually below
average (Lemire, 2012).
2.4.3 Generality, Refineability and Extensibility
I am discussing these design goals together, because they all deal with the ability
to adapt SeqAn to one’s needs (with some aspects of generality being discussed as
part of integration below). In general, SeqAn1 and SeqAn2 offer a maximum degree
of freedom in regard to their adaptability. The global function and metafunction
interfaces described previously (especially when used as modifiers), in combination
with a lack of the classic C/C++ protection model, place only few restrictions on
how a user can apply, refine or extend the existing code.
While there are cases where this degree of freedom is useful, the added
complexity should not be underestimated. The core problem for a user wishing to
adapt the behaviour of the code is not knowing which entity to customise, because
any entity can be customised. An example should explain this: given is an int
property/member of a type that shall be refined to appear 1 larger than the actual
value. One would typically specialise the accessor function to just add 1 when
returning the value. But with multiple global shim functions, it may not be clear
which function is the “accessor” (see Snippet 2.4.2), and specialising any function
in the call-graph will likely yield the desired result. However, in a different context,
the call-graph may look slightly different and the specialisation might be skipped,
resulting in faulty behaviour.16
Users may also be tempted to not specialise the accessor function at all, and
instead manipulate the private state of the object after creation—since the member
is public, the user can change the value instead of overriding access functions.
Gogol-Döring anticipates criticism of giving up the classic C/C++ protection model
but explains:
Global functions lack a protection model: They cannot be private nor [sic] protected, and
they cannot access private and protected members of a class. [...] The main reason for a
protection model is to prevent the programmer from accessing functions or data members
that are intended for internal use only. A simple substitution for this feature is to establish
clean naming conventions: We state that a ’_’-character within an identifier indicates that it
is for internal use only. [...] [We] decided to declare data members to be public, but only
functions that belong to the core implementation of [...] [a class] are allowed to access
them by convention. (Gogol-Döring, 2009)
15 https://www.openhub.net/p/seqan/factoids (note that these statistics cover the entire repository,
not just the library).
16 The obvious solution to this problem is to specialise as close as possible to the type, but in
absence of a language mechanism enforcing this, errors are easy to make—especially since the
function most visible to the user is the one “furthest” from the type.

2.4 Discussion 25
Having a convention is better than not regulating access at all, and however,
a convention is a poor replacement for a language feature. Research has shown
repeatedly that programming conventions are violated if they are not enforced via
technical measures (Hedin, 1996; Prause & Jarke, 2015). A cursory examination
of the most popular SeqAn applications shows that all of them make use of at
least some “private” library functions or access “private” data members of library
types. This is made easier by the fact that they are distributed in the same repository
(see Sect. 2.4.4) and breaking changes to “private” library interfaces are visible in
continuous integration (placing the burden of keeping the application in a functional
state on the library maintainers).
I conclude that the chosen approach to customisation may be the most liberal,
but neither the most user-friendly nor the one that guarantees the highest quality of
code. Best practice guides for library design recommend limiting customisability to
clearly specified customisation points and taking extra care in designing those (see
Sect. 3.7).
2.4.4 Integration
Integration covers the ability to use the library with existing projects, both on source
code level, i.e. the interaction with existing C++ types and functions, and on a
project level, i.e. the interplay between repositories, build systems and packaging
frameworks.
Source-Code Level Integration
Gogol-Döring mostly defines integration in terms of applying the extensibility
discussed above to many or all types of the standard library or a third party library:
The idea of global Interfaces imply the possibility of using shims, which make the
library adaptable both for additional external data structures and for built-in types. We
demonstrated in Sect. 6.1, that algorithms in SeqAn may be generic to an extend [sic] that
we called ‘library spanning programming’, because they can be used for data structures
from arbitrary sources, as soon as the necessary shims are available. SeqAn comes with
an adaptor for basic_string of the Standard library (and its iterators), as well as for
C-style strings, i.e. for zero-terminated char arrays. However, it is also quite possible to
integrate other third party libraries easily into SeqAn. (Gogol-Döring, 2009)

Code snippet 2.3: Overload for length() and std::basic_string from Gogol-
Döring (2009)
This approach works well for integrating a single type, but it scales very poorly
to the size of a library. Consider the example of adapting a std::basic_string
to work like a Sequence in SeqAn1/2 (Snippet 2.3). The interface consists of
over 30 functions and over 15 metafunctions (the exact number depends on some
special cases). If one were to add overloads/specialisations for all containers
from the standard library (std::basic_string, std::array, std::vector,
std::deque, std::list, std::forward_list), that amounts to over 270 func-
tions/metafunctions and thousands of lines of “copy’n’paste” code. This is the
opposite of generic programming and prone to errors.
To complicate matters further, adding a specialisation for a type is entirely orthog-
onal to any existing forms of refinement based on template subclassing (Sect. 2.3.2).
For example, if an algorithm behaves in a generic way for String<TAlph, TSpec>
and in a refined way for String<TAlph, Alloc<TSpec>>, one cannot cleanly
express that the overload for std::basic_string<TChar, TTraits, TAlloc>
should behave like one or the other; one needs to “copy’n’paste” code or change
the library code by inserting another delegation layer that can be called from library
code and the new overload.
As a result, SeqAn relied even more heavily on its own types and did not use
standard library types. In fact, support for standard library types was very poor for
a long time, something users often criticised.
The clean solution to these problems is using C++ concepts, a language feature
which I will introduce in Sect. 3.4. Within the limits of C++ 98, SFINAE could
have been used more often to facilitate refined overload resolution of functions
(and specialisation of type templates). SFINAE stands for substitution-failure-is-
not-an-error and describes how a failed template substitution does not result in
a compiler-error, but only in not considering that function template in the set
of possible overloads.17 This effect can be used to craft overloads specifically
for certain groups of types or based on certain conditions. Care needs to be
taken, though, because no two such overloads should remain in the valid set to
prevent ambiguity (there is no intrinsic notion of refinement/specialisation after the
resolution of SFINAE). Järvi et al. (2003) performed early research on this and
provide guidance on using SFINAE for controlling overloads.
17 https://en.cppreference.com/w/cpp/language/sfinae.

2.4 Discussion 27
For SeqAn’s 2.1-release I, added support functions to have SeqAn recognise all
standard library containers using a combination of SFINAE and C macros. Due to
the abundance of unconstrained primary function templates (that always collide with
overloads that do not use template subclassing), this was not possible without many
library code changes.
In effect, I would argue that SeqAn1/2 was able to facilitate ad hoc specialisations
of single third party types sufficiently well but was not able to properly handle third
party libraries as a whole. Its reliance on self-provided types over standard library
types and its poor handling of the latter underlines this weakness.
Project-Level Integration
A dimension of integration that played a much smaller role in Gogol-Döring
(2009) is the integration on project level, and this includes the practical and legal
implications of (re-)distributing the library and the administrative overhead of
including it as a dependency and maintaining updates.
Legal Terms
The licence of the SeqAn library was originally the GNU Lesser General Public
License (Free Software Foundation, 2002). It was changed to the 3-clause BSD
License in SeqA-1.3.18 Neither of the two licences requires that other software
integrated with SeqAn have the same licensing terms (no strong copyleft), but the
LGPL imposes some obligations regarding changes to the library itself. The BSD
licence, on the other hand, is considered as one of the most permissive Free and
Open Source Software licences and requires only attribution.19
Project Hosting
When SeqAn1 was released, public source code hosting was not yet popular for
academic software. However, with the release of SeqAn-2.0, the project moved to
GitHub (see Fig. 2.5).20 Beyond the technical benefits of git as a version control
system, having SeqAn hosted, there has increased the visibility of the project, the
amount of external contributors and the ease with which it can be integrated in
other repositories (e.g. via git submodules). While these aspects are not crucial to
integration of the project, they most certainly help our users. Research suggests
18 https://www.freebsd.org/internal/software-license.html.
19 https://opensource.org/faq#permissive.
20 https://github.com.

Fig. 2.5 Screenshot of the SeqAn project page on GitHub. Screenshot taken by me; the look and
feel of the website and service is copyright © GitHub, Inc. All rights reserved
that most academic projects would benefit from being publicly hosted in a similar
manner (Blischak et al., 2016).
Build Systems
Since SeqAn is a header-only library, it is distributed as source code and cannot be
prebuilt as a shared object. This model is a direct consequence of the strong reliance
on templates. While it implies longer compilation times, it has the added benefit of
easy distribution and integration. In principle, it is sufficient to add SeqAn’s include
folder to the compiler’s include path to start using SeqAn; special build steps are not
required for the library. However, in practice, a few compiler flags do need to be set
to enable threading support, raise the C++ standard level of the compiler and detect
(optional) dependencies like ZLIB or BZip2. It is therefore advisable to use a build
system that analyses the environment and sets respective flags automatically. As of

2.4 Discussion 29
version 1.2, SeqAn supports CMake,21 the de facto standard for cross platform C
and C++ projects (Wojtczyk & Knoll, 2008).
Semantic Versioning
An application that decides to add a dependency on a library needs to consider the
stability of the library, i.e. the costs and risks associated with an upgrade of the
library. Upgrades may seem optional, but often they are not, because new updates
provide necessary fixes or security patches (Raemaekers et al., 2014). One way to
clearly define the costs and risks associated with an upgrade is to follow semantic
versioning and assign version numbers accordingly (Preston-Werner, 2013). Two
core aspects of semantic versioning are having a clearly defined public API and
promising to the user that no breaking changes to that API will be introduced within
one major release (versioning is major.minor.patch). SeqAn1 already failed in
regard to the first requirement.22 The second aspect is the central paradigm of
semantic versioning. This is a notable restriction on the changes developers can
make to the project, but it provides a very strong guarantee for safe upgrades to
the user. It was introduced for all SeqAn2 versions, beginning with SeqAn-2.1, by
declaring all documented interfaces as part of the API.
An argument brought forth against the necessity of semantic versioning by
previous maintainers of SeqAn is that header-only libraries (see above) can be
shipped together with the application and that there is no “forced upgrade”. This
ignores the possibility of depending on a new feature or security update as well
as the interdependencies of components in complex modern software; e.g. an
application might depend on two third party components that each depend on
SeqAn—if one updates its SeqAn requirement but not the other, the application
can become unbuildable. To underline the disruptive nature of breaking updates,
one should consider that OpenMS (Röst et al., 2016), a project closely affiliated
with SeqAn, still uses SeqAn-1.4.1, because upgrading was seen as too expensive
by the maintainers. The importance of semantic versioning for the long-term health
of a software library cannot be underestimated (Raemaekers et al., 2014) and lack
of semantic versioning in previous versions of SeqAn was a major problem.
Framework-Style Repository
An unfortunate development that began with SeqAn-1.1 was bundling applications
together with SeqAn and later also the custom testing and documentation infrastruc-
21 https://www.cmake.org.
22 One could derive from the rules quoted in Sect. 2.4.3 that any name not containing _ be part of
the API, but, since many such names are also not documented, it is not clear what the user should
rely on.

ture. This was likely one of the results of the lack of semantic versioning, because to
prevent regularly breaking the applications through changes in the library interface,
they were simply tried and tested together. Many drawbacks of this approach have
been discussed already. The relevant impact on integration was that developers who
wanted to build an application with the SeqAn-library needed to checkout an entire
ecosystem of library + applications + infrastructure. This conflation on repository
level was mirrored also in the documentation where instructions on building “SeqAn
applications” were mixed with the “first steps guide” to programming with SeqAn.23
This first steps guide involved preparing a directory inside the repository by running
a provided Python script and then editing a file in that directory. There were no
instructions on adding SeqAn to an application with existing infrastructure and
the provided CMake module failed when used individually. Other implications
of the repository style are a confusing licensing situation (the applications each
have individual licence files with different terms) and difficulties in packaging (see
below). Some problems were fixed by myself and René Rahn after becoming the
responsible developers, but the general structure of the repository remained largely
the same.
Package Managers
Many users do not install their applications by downloading a package from the
author/vendor but by utilising a package manager. Package managers automate
the install, upgrade and removal of software packages, keep track of dependencies
between packages and help maintain a consistent and up-to-date state of the
entire set of installed software (Spinellis, 2012). Some operating systems provide
package managers by default, typically GNU/Linux distributions or BSD-based
operating systems (e.g. APT on Debian GNU/Linux24), but there are also stand-
alone managers that can be installed individually (e.g. Homebrew (Jackman et al.,
2016) on macOS or Conda (Grüning et al., 2018) which is popular with data
scientists). Being present in such managers has the advantage that developers can
easily get access to new SeqAn releases, but, most importantly, it is usually required
so that application developers can add their SeqAn-based applications to said
package managers. It is a quality indicator, because application developers know if
they can easily integrate a library with their application when they ship it. Initially,
SeqAn was only available in few managers and often only as part of an “application
bundle”. The lack of semantic versioning (see above) made packagers reluctant to
add a dependency on SeqAn, because it meant unforeseen breakage could happen.
23 This conflation also happened on a project level: events were hosted for “application users”,
e.g. someone wanting to perform read-mapping with RazerS, and “library users”, i.e. developers
interested in creating a new application, at the same time and place. In my opinion this needlessly
complicated the situation for both audiences and greatly increased the notion that “SeqAn is
difficult to use”.
24 https://wiki.debian.org/PackageManagement.

2.4 Discussion 31
After this was fixed in SeqAn-2.1 and the CMake support was brought up to shape,
SeqAn2 was packaged for all major GNU/Linux distributions, FreeBSD, the two
macOS package managers Homebrew (Jackman et al., 2016) and MacPorts25 as
well as the domain-specific package managers Conda (Grüning et al., 2018) and
Easybuild (Hoste et al., 2012).
Workflows
Bioinformatics applications are increasingly deployed as part of pipelines or
workflows (Curcin & Ghanem, 2008). Workflows allow researchers that are not
programmers to combine different applications and perform integrated analyses.
They can help improve structure and reproducibility, and they replace many previous
uses of shell-scripts and Makefiles (Leipzig, 2017). It is arguably the obligation
of the application developer to ensure that their programs run in a workflow
system and not the responsibility of a software library. However, the SeqAn project
anticipated that many of the developers using SeqAn would welcome help in
targeting workflow systems. Since SeqAn already comes with an argument parser
that handles command line arguments to the application and can also generate
help and manual pages, this was expanded to also generate descriptor files for
the KNIME workflow system (Berthold et al., 2007). An example workflow is
shown in Fig. 2.6. SeqAn chose to support KNIME, because the projects have a
long-standing history of cooperation and KNIME is a promising workflow system
with large industry support. It should, however, be noted that KNIME’s largest user
groups are in chemistry/cheminformatics, “business intelligence” and “predictive
analytics” (Warr, 2012). It is not unpopular among bioinformaticians, but most
comparative studies of workflow systems in bioinformatics and sequence analysis
focus on other workflow systems (Curcin & Ghanem, 2008; Leipzig, 2017). Support
for the Galaxy system (Afgan et al., 2018) was a frequently requested feature. More
recently, the Nextflow platform (Di Tommaso et al., 2017) has gained popularity
and there have been attempts to standardise workflow languages and the description
of applications/nodes within them as CommonWL (Amstutz et al., 2016). Future
versions of SeqAn should evaluate if more workflow systems can be targeted via
specialised description generators or an open standard like CommonWL.
2.4.5 Summary
SeqAn1 and SeqAn2 were successful and influential C++ libraries in the domain
of sequence analysis. Not only groundbreaking applications were built with the
help of SeqAn, but also prototypes and small applets for the use in workflows.
25 https://www.macports.org.

Fig. 2.6 A KNIME workflow that includes SeqAn applications. Image is part of the SeqAn1/2’s
documentation, see Sect. A.3.1
The performance of SeqAn was superb, although there was no coherent strategy for
attaining the best possible speed in the context of an increasingly parallel execution
environment.
The library strove to be as simple as possible, but the use of exotic programming
techniques led to a very steep learning curve. The academic nature of the project and
regular changes in its technical leadership led to a lack of consistent policy (enforce-
ment) and direction. This in turn led to an ever-increasing size and complexity of the
code base, further raising the bar for understanding and contributing to the library.
In addition to having to understand the code base itself, SeqAn forced contributors
to learn custom tooling, because it did/could not rely on industry standard tooling.
SeqAn anticipated many developments in the C++ language but had to rely on the
now-old C++ 98 standard. It later adopted certain convenience features from C++ 11
and C++ 14, but the general design still reflected C++ 98 strongly and did not take the
many structural advantages of Modern C++ into account.
Documentation of the library was always above average, especially for an
academic project. However, there was no policy (enforcement) that ensured that (at
least) all public entities were documented. Considering the extraordinary complexity
of the library, better documentation would have certainly been helpful.
SeqAn allowed for a high degree of customisation in regard to small changes,
adapting single type and overriding the behaviour of almost any library routine
(“hacks”). However, the manner of customisation was obscure and the potential for
error high. It had poor support for adapting large number of types from third party
libraries, and standard library types were always second-class citizens. Not relying
on standard library types and functions implies a lot of code/logic duplication and
the additional overhead for users to re-learn.
On project level, there was a lot of conflation between application development
and library development, needlessly complicating the maintenance, distribution
and packaging of the library. Many best practices in software development were
introduced by myself and René Rahn in the last versions of SeqAn2. These improved
the quality of changes and new additions to the library, but ultimately it was
decided that the technical debt was too large to continue improving on the library
incrementally and that a more radical re-design was necessary.

Other documents randomly have
different content

hoputimme hevosiamme, hra Vaucourt vetäytyi taaksepäin, ja Selim
ja minä viritimme kivääriemme hanat ja lähestyimme varovasti
epäilyksenalaista paikkaa. Metsä suhisi, sade oli jo melkein tauonnut,
mutta sen sijaan alkoi tuulla, ja taivaalla kiiti repaleisia pilviä, jotka,
milloin keräysivät yhteen ja peittivät kuun, milloin hajaantuivat niin,
että näimme tähtien taivaalla valjuina tuikkivan. Naapurimme hermot
alkoivat taas huomauttaa olemassaolostaan vähemmin miellyttävällä
tavalla. Minä puolestani tunsin itseni perin onnelliseksi. Hiljaisuus,
yksinäisyys, karu metsä, yö, raitis tuuli, epävarmuus ja uhkaavat
vaarat hurmasivat minua sanomattomasti; olin aivan kuin
uudestisyntynyt. Muurien sisällä oli ahdasta ja painostavaa, täällä
hengitin syvästi ja vapaasti kuin Farys Arabistanissa [Farys, erään
Mickiewicz'in runon sankari]. Siellä muurien sisällä olivat ajatukseni
alituisesti samoilleet katkerissa muistoissa ja yksityisissä huolissa,
täällä oli vierelläni veljeni, tattari, kädessäni toinen veli, ratsupyssy,
edessäni avara, vapaa luonto, unhotus, huumaus… kuolema. Ellei
ajatus tovereitteni epävarmasta asemasta olisi pidättänyt minua, niin
olisin kääntänyt kasvoni taivasta kohden ja huutanut keuhkojeni
koko voimalla: hei, hei, sinä suhiseva, kiehtova metsä! — ja sitten
olisin kuunnellut kaikua metsästä, missä tuuli humisutti puitten
latvoja kuin urkuja soittaen. Liettualaisuus heräsi minussa, ja
minusta tuntui kuin olisin ollut Bialowiczin aarniometsissä, soitten ja
rämeikköjen keskellä, missä vesi sydänyöllä laulaa, missä
switeziankojen [slaavilaisen jumaltaruston switeziankat vastaavat
suunnilleen meidän uduttariamme] vihlovat huudot ilmassa kaikuvat,
missä mustat kannot vihreinä loimuavat ja eläimet puhuvat
ihmiskielin.
En siis ajatellut vaaroja enkä niistä välittänyt, mutta Mirza valvoi.
Hänen rypistyneet kulmansa osoittivat mitä suurinta

tarkkaavaisuutta; hän tähysteli jokaista pensasta ja puunrunkoa. Tie
kävi yhä huonommaksi.
Vihdoin, puolen tunnin ratsastuksen jälkeen, virkkoi Selim:
— Meidän täytyy lepuuttaa hevosia.
Poikkesimme syrjään tieltä, ja kun olimme päässeet jonkun
matkaa metsän sisälle, hyppäsimme alas ratsailta ja vapautimme
eläimet satuloista. Hellittämättä suitsia käsistämme laskeuduimme
sitten levolle puitten suojiin. Kukaan ei puhunut sanaakaan, kaikki
olivat liian väsyneitä. Hiljaisuutta häiritsi vain ruohon rouske
hevosten hampaissa. Olimme levänneet noin neljännestunnin, kun
Selim ja minä äkkiä hypähdimme pystyyn.
— Mitä se on? — kuiskasimme molemmat yhtaikaa.
Pimeydestä ja hiljaisuudesta tunkeutui korviimme kummallista
melua — jotakin lapsen itkun ja lampaan määkinän tapaista ääntä.
— Lammas, — virkkoi Mirza.
— Ei, vaan huuhkaja, — vastasin minä. Samassa lakkasivat
hevoset syömästä, luimistivat korviaan ja alkoivat vainuta
sieramillaan.
Ei, se ei voinut olla lammas eikä huuhkaja! Tuossa valittavassa,
epäselvässä äänessä oli jotakin peloittavaa juuri sentähden, että se
tuntui niin luonnottomalta. Susien ulvonta tai ratsupyssyjen pauke ei
olisi saattanut meitä niin levottomiksi.
Selim, joka rohkeudestaan huolimatta oli hieman taikauskoinen,
tuijotti pimeyteen ja kuiskasi:

— Tuo ei oikein miellytä minua, yhtä vähän kuin mitkään sellaiset
vaarat, joiden laadusta en ole selvillä.
— Mon Dieu! — sopersi hra Vaucourt kauhusta vavisten.
— Kas, tuolla on jotakin valkoista, — kuiskasi Selim ja osoitti
metsään.
Katsahdin sinne. Mirza erehtyi, siellä ei näkynyt mustaa eikä
valkoista.
— Minäpä lähden sinne pyssyineni, — sanoin. — Ei kai se
paholainen minua purematta niele.
Lähdin sinne päin, mutta sitä mieltä olin minäkin, että sellaiset
vaarat ovat pahimmat, joiden laatua ei tunne. Tosiaankin, kun kuulin
tuota merkillistä ääntä joka puolelta, olin valmis uskomaan, että
kaatuneitten henget istuivat puitten oksilla kuin varpuset ja puhelivat
toisilleen.
Minunkin hermoni alkoivat täristä. Säpsähdin joka kerta kun oksa
jalkani alla risahti. Kerran olin kuulevinani äänen kuiskaavan
korvaani: "He, Polonais!" [Hei, puolalainen!] Mutta se oli vain
mielikuvitusta.
Kiersin sen puun, jonka alla olimme levänneet. Ääni kuului
yhtämittaa, välistä sieltä, välistä täältä, niin että se lopulta lakkasi
minua peloittamasta ja alkoi väsyttää.
— Mene hiiteen! — mutisin itsekseni ja palasin takaisin puun alle.
Siellä tapasin ainoastaan hra Vaucourtin, joka minun lähestyessäni
luuli, että paholainen itse oli tullut noutamaan hänen sieluansa.

— Missä Mirza on? — kysyin minä.
— Hän meni o-o-o-ottamaan se-se-selkoa… yritti hän vastata,
hampaitten lyödessä loukkua.
Kului neljännestunti, puoli tuntia, mutta Selimiä ei kuulunut.
Nyt aloin minäkin todella pelätä. Suuri onnettomuus saattoi olla
tapahtunut: Mirza oli kenties poistunut liian pitkälle, eksynyt
kirottuun metsään, eikä löytänyt takaisin.
Kolme neljännestuntia oli jo mennyt, eikä Selimiä vieläkään
näkynyt.
Olisin tosin voinut ampua tai huutaa, siten opastaen häntä, mutta
samalla olisin myöskin voinut houkutella kimppuumme preussilaisen
vartiojoukon. Mutta muusta ei ollut apua. Ilman Selimiä en olisi
jättänyt leiriämme, vaikka koko preussilainen armeija olisi hyökännyt
kimppuumme. Ilmoitin hra Vaucourt'ille aikomukseni.
Hän heittäytyi polvilleen eteeni ja rukoili minua luopumaan
aikeestani.
— Silloin olemme kaikki mennyttä miestä, sanoi hän.
Minä koetin viheltää, mutta hiljaisuus vain vastasi; salaperäinen
äänikin oli vaiennut. Huusin sentähden kerran ja sitten useamminkin:
— Mirza… halloo!
Ei vastausta.
Jouduin sellaiseen tuskaan Mirzan tähden, että tukka nousi
päässäni. Sillä hetkellä oli se poika minulle kallisarvoisempi kuin

kaikki muu maailmassa, ajattelematta sen enempää asetin
ratsupyssyn poskelleni ja laukaisin.
Punainen salama ja pamaus. Koko metsä vaikeni kuin kauhusta.
Hra
Vaucourt huusi:
— Herra, ota minun sieluni!
— Sitä ei huolisi pirukaan, — vastasin minä kärsimättömästi.
Hiljaisuutta ei kestänyt kauempaa kuin viisi minuuttia; silloin kuulin
hiljaista rapinaa pensaikosta, aivan kuin joku olisi tallannut kuivia
oksia.
Se saattoi olla Selim, mutta myöskin joku toinen. Hra Vaucourt ja
minä asetuimme väijyksiin puun taakse.
Hetkisen kuluttua ilmestyi tumma olento pensaikosta.
Onnettomuudeksi siirtyi kuu juuri silloin pilveen, ja tuli niin pimeä,
että me töin tuskin saatoimme eroittaa haamun.
Etukumarassa lähestyi se verkalleen ja varovasti, pysähtyen tuon
tuostakin kuuntelemaan. Huomasin sen kädessä pyssyn.
Viiden askeleen päähän meistä pysähtyi olento. Teroitin katseeni
mahdollisimman tarkaksi — se ei ollut Mirza.
Pimeys esti minua eroittamasta, oliko tuntematon preussilainen tai
sotilas ensinkään. Tosin näin, ettei hänellä ollut kypärää päässä,
mutta eiväthän baijerilaiset ja saksilaiset käyttäneet kypärää, ja
preussilaisillakin saattoi olla lakkeja.

Kuka tuo viiden askeleen päässä oleva mies olikaan, tarvitsin
häntä joka tapauksessa ja päätin ottaa hänet kiinni.
— Tsss! — kuiskasi hän.
— Tsss! — vastasin minä puun takaa.
Me leikittelimme oikeastaan sokkosilla. Leikki oli tosin tällä kertaa
hieman vaarallista, mutta ei suinkaan vähemmin mieltäkiinnittävää.
Vieras astui vielä kaksi askelta lähemmäksi. Samassa heittäydyin
minä kuin tiikeri hänen kimppuunsa, ja sekuntia myöhemmin istuin
hajareisin hänen rintansa päällä ja pidin lujasti kiinni kurkusta,
huutaen:
— Vaikene, tai olet kuoleman oma! Mutta tuntematon, joka oli
voimakas mies, heitteli itseään sinne tänne kuin hullu. Minä olen
myöskin verrattain vahva mies, mutta vain suurella vaivalla sain
pidellyksi hänet aisoissa. Hän potki kuin vimmattu ja oli vähällä
heittää minut kumoon.
Äkkiä jäi hän liikkumattomana makaamaan niinkuin olisi rautakäsi
naulannut hänet maahan. Katsahdin taakseni: Selim oli polvillaan
hänen sääriensä päällä ja kurottautuen olkani yli, työnsi hän
välkkyvän pistimenkärjen miehen silmien eteen.
— Jos teet pienimmänkin liikkeen, niin pääset hengestäsi, —
kuiskasi hän.
Tuntematonta ei enää haluttanut ponnistella vastaan. Selim nousi
ylös, otti ratsupyssynsä ja virkkoi:
— Laske hänet!

Minä tottelin; vanki lepäsi liikkumattomana kuin tukki.
— Tappakaa minut sitten, — sanoi hän.
— Nouse ylös!
Mies-raukka nousi pystyyn ja virkkoi läähättäen:
— Mitä te olette miehiänne? Mitä te minusta tahdotte? Minä en tee
kellekään ihmiselle pahaa. Te olette musertaneet luitani. Jos olette
metsänvartijoita, niin ottakaa pyssyni. Minä maksan sakot… mutta
älkää taittako kaikkia luitani. Nyt on sota. Saalis on sen, joka tahtoo
syödä. Saksalaiset pyydystävät ranskalaisia… sudet lampaita, kissat
hiiriä… minä petoja ja jäniksiä, Sellaiset ovat ajat… kaikki
pyydystävät.
— Mikä sinä sitten olet?
— Nimeni on Mathieu Benoit eli vanha Mathieu, jos suvaitsette.
— Hyvä, rauhoitu, vanha Mathieu! Me emme ole metsänvartijoita,
emmekä aijo ottaa pyssyäsi, vielä vähemmin laahata sinua oikeuden
eteen tai ottaa sakkoja — osoita vain meille tietä.
— Niin, mutta te olette ruhjoneet minun vanhat luuni, — mutisi
salametsästäjä.
Selim kaivoi taskustaan muutamia kultakolikoita:
— Osta näillä uudet ja vastaa nyt kysymyksiini… Onko täällä
preussilaisia?
— On.

— Ja ranskalaisia?
— Niitä myöskin. He tappavat toisiaan, ja Isä Jumala katselee
vain, eikä sano sanaakaan.
— Kuuleppas! Vie meidät ranskalaisten luo sellaista tietä, jolla
emme kohtaa preussilaisia.
— Sellaisia teitä tunnen paljonkin.
— Jos kohtaamme enemmän kuin kolme preussilaista, ammun
kuulan otsaasi. Jotta et pääse pakenemaan, sidon sinut nuoraan ja
saat juosta perässä.
— Nykyään on koira suuremman arvoinen kuin ihminen, — mutisi
Mathieu vastaukseksi.
Me nousimme satulaan ja läksimme matkaan. Mathieu kuletti
meitä sellaisia teitä myöten, jotka hän yksin tunsi.
— Tämä tie, — sanoi hän, osoittaen metsätietä, jota olimme juuri
tulleet, — vie Poutvertiin, missä matami Troliella on ravintola.
Sanotaan, että eukolla pitäisi olla rahoja. Tuolla lepää preussilainen,
joka äsken ampui nuoren Vauhartin. Minä tunnen kuin viisi sormeani
ja vien teidät oikeaan paikkaan. Nyt on sellaiset ajat, että kuolema
asuu tiellä, rauha metsässä, saksalainen kirkossa ja keisari
vankilassa.
— Muuten olen minä tyhmä mies, enkä tiedä yhtään mitään, —
lisäsi hän hetkisen kuluttua.
— Ja kenelle sinä myyt saaliisi? kysyi Selim.

— Milloin preussilaisille, milloin ranskalaisille, mutta jos he
ryöstävät sen minulta, niin saavat sen ilmaiseksi. Sota ottaa, hyvä, ja
maksaa selkäsaunalla.
Ukko kävi yhä puheliaammaksi. Hänessä ilmeni sekä
yksinkertaisuutta että järkeilyä: kun suuret tappelevat, kärsivät
pienet ja luulevat, että Isä Jumala on hetkiseksi nukahtanut.
Tiedustelimme oppaaltamme, mitä se kummallinen ääni oli, jota
olimme metsässä kuulleet, ja saimme tietää, että salametsästäjät
huutavat siten toisilleen öiseen aikaan; koska ukko Mathieulla oli
ollut mukanaan kolme poikaansa metsässä, niin oli ääni kuulunut eri
suunnilta.
— Mutta paljon mahdollista, — lisäsi ukko, — että itse
vihtahousukin laskettelee sellaisia ääniä.
Sitten alkoi hän taas:
— Olisittepa joutuneet tekemisiin poikani Jakobin kanssa, jota
myöskin sanotaan "Vääräkädeksi", niin ettepä olisikaan niin helpolla
selvinneet, vaikka olettekin vahvoja kuin turkkilaiset.
Kahden seuduissa yöllä alkoi metsä vihdoin harveta, ja me
saavuimme erääseen solaan. Mathieu kertoi, että metsä, jonka
olimme jättäneet taaksemme, ei ollut suuri, mutta tie kulki melkein
sen ympäri; me olimme niin ollen kulkeneet hyvän matkaa
ympyrässä.
Tuskin oli päivä alkanut sarastaa, kun kuulimme etäältä kukon
kiekunaa, ja hetkisen kuluttua eroitimme aamuisen usvan läpi
ranskalaisten hallussa olevan pienen kaupungin katot ja kirkon

tornin. Vahtisotilaan kaikuva Qui vive! keskeytti kulkumme. Meidän
täytyi odottaa kunnes vartiojoukko saapui, ja tämän mukana
pääsimme sitten kaupunkiin.
Meitä ihmetytti jonkun verran, että raatihuoneella asuvaa kenraalia
ei heti herätetty, vaikka ilmoitimmekin tuovamme viestejä Parisista.
Tyydyttiin vain osoittamaan maille huone, jonka lattialle oli siroteltu
olkia, ja vartija asetettiin ovelle. Me heittäysimme oljille ja vaivuimme
raskaaseen uneen.
Vasta kello kymmenen tienoissa kutsuttiin meidät kenraalin luo,
jonka nimeä en enää muista; suuri osa senaikuisia muistiinpanojani
on joutunut hukkateille. Herra kenraali, pönäkkä ja ihrainen mies,
jolla oli kalanpäätä muistuttava naama ja vihreät silmät, oli
täydellinen — ei soturin, vaan — kissanpäivillä eläneen mässääjän
esikuva.
Kymmenkunta ylempiarvoista upseeria ympäröi häntä
kunnioittavan vaiteliaina. Hra Vaucourt pyysi puheenvuoroa ja alkoi
kaunopuheisesti kuvailla seikkailujamme, joiden todenperäisyyttä
vakuuttivat ulaanien kypärät sekä hevoset, joilla olimme ratsastaneet
kaupunkiin.
Hra Vaucourt puhui totuuden mukaisesti, jättäen kuitenkin
mainitsematta, että minä olin tähdännyt häntä ratsupyssyllä
ohimoon, saadakseni hänet avaamaan venttiilin. Sitäpaitsi sanoi hän
alituisesti "me", kun hänen olisi pitänyt sanoa "he".
Hänen kertomuksensa aikana seisoimme Selim ja minä
vieretysten, nojaten ratsupyssyihimme. Upseerit tarkastelivat meitä
sanattomina. Kun hra Vaucourt oli vihdoin lopettanut kertomuksensa,
nousi kenraali seisaalle ja alkoi:

— Hyvät herrat! Minulla on kunnia esitellä teille tämä Ranskan
urhokas poika (tässä osoitti hän hra Vaucourt'ia). Jos kaikki
ranskalaiset kävisivät hänen… hm hm hm… olisi meidän maamme jo
vapautettu vihollisista. Tämä urhoollinen… hm hm… ei välittänyt…
välittänyt hm hm hm… vaan suoraan kuulasateen läpi ja… hm hm…
saapui hän meidän luoksemme… Ja sentähden, hyvät herrat… hyvät
herrat…
Tässä alkoi kenraali pyyhkiä suutaan ruokaliinalla, upseerit
hymyilivät, Ranskan urhokas poika kävi punaiseksi kuin krapu ja
alkoi vilkuilla meihin. Selim puri huultaan ja minä puolestani koetin
antaa Ranskan urhokkaan pojan katseestani havaita, ettei meillä
ollut ajatusta ryhtyä vastustamaan niitä kiitoslauseita, joita hänen
osakseen tuli.
Sillävälin oli kenraali irroittanut rinnastaan tähden, joita siinä
muuten riippui kokonainen sikermä, ja virkkoi:
— Lähemmäksi, Ranskan urhokas poika! Minä toivon… hm hm…
että diktaattori… hm hm… antaa vahvistuksensa tälle
kunnianosoitukselle, jonka minä nyt teille suon.
Muuan pitkäkasvuinen, kasvoiltaan tuima ja epämiellyttävä upseeri
ei voinut enää hillitä itseään, vaan virkkoi:
— Anteeksi, kenraali .. Minä olen saanut sen käsityksen, että nämä
kaksi herraa…
Kenraali viittasi häntä vaikenemaan ja kääntyi meihin:
— Te muukalaiset, — sanoi hän, — jotka juuri olette liittyneet
Ranskan lipun alle, ottakaa tämä sankari esikuvaksenne, ja saattaa

tapahtua, että teidänkin rintanne koristetaan moisella
kunniamerkillä.
Pelkäsin Selimin ratkeavan ääneen nauramaan tai saavan aikaan
ikävyyksiä. Mutta ei, hän seisoi rauhallisena ja kylmänä, vaikka
saattoi selvästi huomata, että tapaus sekä harmitti että huvitti häntä.
Hra Vaucourt oli muuten kunnon mies; hän selitti, ettei risti
kuulunut hänelle, vaan meille. Siitä oli seurauksena, että kenraali sai
miettiäkseen ongelman: mikä oli suurempi, hra Vaucourtin
urhoollisuusko, vai vaatimattomuus? Sitten erosivat meidän tiemme.
Iltapäivällä kutsui Mirza upseerit päivällisille, joilla hän tuhlasi
rahaa kuin ruhtinas. Seuraavana päivänä läksimme etsimään La
Rochenoirea.
Siten tapahtui matkamme Parisista.

II.
Haute-Saônen departementtiin saakka matkustimme suurimmaksi
osaksi seutujen kautta, jotka olivat ranskalaisten hallussa, vaikkakin
baijerilainen armeija, joka toimi etelässä, oli yhteydessä
pääkaupunkia piirittävän preussilaisen armeijan kanssa. Ainoana
vaaranamme olivat ranskalaiset sotarosvot; kaikki tiet vilisivät niitä,
eikä suinkaan sattunut harvoin, että ne ottivat omalletunnolleen
ryöstöjä ja varkauksia. Heitä emme kuitenkaan pelänneet, meitä kun
taasen oli kolme. "Ranskan urhokas poika" hra Vaucourt oli tosin
toista tietä lähtenyt Bordeauxiin, mutta kohta toisena matkapäivänä
liittyi meihin eräs Jean Marx, syntyään elsassilainen,
kahdeksantoistavuotias nuorukainen, joka ei pelännyt mitään
maailmassa ja olisi yhden päivän tuttavuuden perästä seurannut
meitä vaikka suoraa päätä saksalaisten kitaan. Kaikkialla kohtasimme
sodan kauhuja, kaikkialla hävitettyjä kyliä, joitten asukkaat
piiloutuivat meidän lähestyessämme; pellot olivat viljelemättöminä,
tuon tuostakin tapasimme kuihtuneita, nälkäisiä raukkoja,
aavemaisia olentoja, jotka perunamaissa kuleksien etsivät
menneenvuotisia, puolimätiä perunoita. Minne vain tulimme,
tunkeutui sieramiimme tukahduttava katku. Öisin valaisi
taivaanrantaa tulipalon kajastus, ja etäältä kuului suden ulvontaa.
Muutamia kertoja jouduimme taistelutantereelle, missä ei kuitenkaan

näkynyt enää ihmisten ja hevosten ruumiita, vaan valkoisia
paperitukkoja, jätteitä ammutuista panoksista. Toisin paikoin
peittivät nämä tukot maan kuin lumi, paikotellen oli niitä
harvemmassa. Kerran päädyimme aivan autioon kylään, missä ei
ollut muuta elollista, kuin meitä pakeneva, nälästä kaakattava
kanaparvi ja yksi ihminen, mielipuoli eukko, joka istui puoleksi
luhistuneen tupansa kuistilla ja puheli kovaäänisesti itsekseen.
Ainoastaan suurella vaivalla saimme hänen hajanaisista
vastauksistaan selville, että kylän asukkaat olivat joku päivä sitten
ampuneet ulaaneja ja sitten paenneet, sillä he pelkäsivät, että
heidän kylänsä poltettaisiin.
Eräänä yönä saavuimme tuvalle, jonka ikkunasta loisti valkea, ja
koputimme ovelle. Kesti kauan ennenkuin havaitsimme minkäänlaisia
elonmerkkejä, mutia vihdoin avautui ovi ja eteemme ilmestyi niin
sudennäköiset kasvot, että me melkein luulimme näkevämme sen
pedon edessämme. Marx väitti miestä ruumiinvarkaaksi, ja ehdotti,
ettemme yöpyisi hänen luokseen, kun emme voineet olla lainkaan
varmoja siitä, ettemme yöllä saisi puukonpistoa kylkiluittemme väliin.
Mutta me olimme lopen väsyneitä ja päätimme kuitenkin yöpyä.
Hetkisen kuluttua ilmestyi esille vielä neljä samanlaista olentoa; he
alkoivat vilkuilla meitä salavihkaa ja luoda ahnaita silmäyksiä
matkatavaroihimme. Marx jäi pihalle hevosten luo, Mirza puolestaan
tarttui muitta mutkitta susimiestä kurkkuun, painoi hänet seinää
vasten ja asetti pistoolin piipun hänen otsalleen sanoen:
— Kuuleppas, kirottu elukka! Aja heti paikalla ulos nuo roikaleet,
muutoin ammun sekä sinut että jokainoan heistä, ja jos jotakin
häviää yön aikana, hirtän sinut ilman armoa.

Miehet alkoivat murista, koettipa eräs heistä selittää olevansa joku
virkailijakin ja pyysi uhkaavasti saada nähdä passejamme.
Vastaukseksi tarttui Selim ratsupiiskaansa ja alkoi sillä säälimättä
läiskytellä, ja etenkin tuota virkailijaa. Yö kului rauhassa, ja me
jatkoimme seuraavana aamuna matkaamme.
Mutta tie kävi yhä vaarallisemmaksi. Tulimme nyt seudulle, joka oli
melkein kokonaan saksalaisten vallassa, joita pienet, hajallaan
liikkuvat vapaajoukot lakkaamatta ärsyttelivät ja ahdistelivat. Kylät ja
kaupungit olivat säilyneet paremmin täällä kuin niillä seuduin, missä
kaksi armeijaa taisteli keskenään. Asukkaat eivät tehneet vastarintaa
saksalaisille, vaan taipuivat heidän ylivoimaansa, mutta niinpä olikin
heiltä melkein mahdoton saada minkäänlaista apua tahi neuvoja. Me
päätimme millä hinnalla hyvänsä etsiä käsiimme La Rochenoiren,
vaikka se näytti inhimillisille voimille mahdottomalta, sillä ensiksikään
ei kukaan tiennyt, missä hän joukkoineen oleskeli, ja toisekseen
täytyi meidän, päästäksemme hänen luokseen, mennä suoraan
saksalaisten kitaan, sillä mikään vapaajoukko ei ollut niin vihattu ja
vainottu kuin hänen. Voimme sanoa, että me kolme aloimme nyt
käydä todellista sotaa omin päin. Me liikuimme vain öisin ja
suuntasimme kulkuamme sellaisille paikoille, missä huhujen mukaan
oli joko hyökätty valiojoukkojen kimppuun, lyöty etujoukkoja tahi
vangittu pienempiä osastoja. Mutta kylläpä me sovimmekin hyvin
yhteen, me kolme.
Marx oli tosiaankin kelpo poika. Hän oli luonteeltaan rauhallinen ja
välinpitämätön, kuten elsassilaiset ainakin, ja mursi lystikkäästi
ranskankieltä. Toisinaan tuntui minusta kuin hänen urhoutensa olisi
johtunut jonkinlaisesta puuttuvasta käsityskyvystä, jostain uhkaavan
vaaran tajuamattomuudesta. Mutta syy oli toinen. Marx vihasi
saksalaisia kaikesta sielustaan, ja tuollainen viha katkeroittaa

rauhallista luonnetta paljoa suuremmassa määrässä kuin muunlaista.
Kuitenkin oli hän valpas kuin vainukoira ja epäilemättä kylmäverisin
meistä kaikista.

III.
Vain vähäisen osan matkasta kuljimme rautateitse. Ranskalaisten
haltuun jääneillä seuduilla oli nimittäin useimmat radat hävitetty, ja
yleisen sekasorron aikana ei kukaan välittänyt ryhtyä niitten
uudelleen rakentamiseen. Tämä vaikeutti suuresti sotajoukkojen
liikkeitä, mutta siihen aikaan tahtoivatkin kaikki yksimielisesti syöstä
Ranskan turmioon.
Haute-Saônen depardementti, missä La Rochenoire oleskeli, oli
saksalaisten vallassa. Päästyämme sen rajojen sisäpuolelle, aloimme
taas kulkea metsiä myöten ja liikuimme melkein yksinomaan öisin.
Päivät nukuimme humala- tai viinitarhoissa, tähystelimme ja teimme
uusia matkasuunnitelmia.
Pari päivää ennen tuloamme La Rochenoiren leiriin poikkesimme
erääseen La Mare-nimiseen kylään. Se ei tosin ollut aivan matkamme
varrella, mutta Selim selitti, että hänen täytyi saada tavata erästä hra
La Grangea, joka oli jonkun aikaa asunut siellä; häneen oli Mirza
tutustunut Parisissa.
Me lähestyimme kylää mitä varovaisimmin, sillä oli hyvinkin
mahdollista, että siellä majaili baijerilaisia tai preussilaisia joukkoja.

Oli jo pimeä, aurinko laskenut; ainoastaan lännessä kuulsi vielä
iltaruskon kajastus. Siellä täällä alkoi talojen ikkunoihin ilmestyä
valoa. Kaikkialla näytti vallitsevan mitä suloisin rauha. Me ryömimme
humalatarhassa kuin käärmeet, mutta koirat vainusivat meidät ja
alkoivat haukkua. Samassa näimme muutamia tummia olentoja,
jotka seisoivat aitauksen toisella puolella ja keskustelivat
puoliääneen.
Me kumarruimme alas. Äänet vaikenivat, ja yksi olennoista
kumartui yli aidan ja tähysteli tarkasti humalatarhaan päin. Sitten
alkoi taas kuulua ääniä pimeydestä:
— Mitä te tähystelette, ukko Grousbert?
— Koirat haukkuvat! tuolla liikkuu jotakin.
— Tuollako! Kuu alkaa juuri kohota taivaalle, ja silloin käyvät koirat
aina levottomiksi.
Toinen ääni lisäsi:
— Ehkä siellä on kuolleitten sieluja? Sanotaan, että koirat
vainuavat ihmissieluja.
— Isän, Pojan ja Pyhän hengen nimeen.
Sitten kuului huokauksia.
Mirza kuiskasi minulle:
— Ne ovat varmasti sotilaita.
Miehet alkoivat aidan takana taas haastella, mutta vielä enemmän
kuiskaten:

— Susia juoksentelee myöskin ympäristöllä. Sota siittää susia
niinkuin kaikkea muutakin pahaa.
— Kylläpäs iltarusko oli tänään punainen! Jossain on kai taas
taisteltu.
— Määri arvelee preussilaisten tulevan uudelleen tänne.
— Suojelkoon meitä Pyhä Hubertus heistä!
Nyt nousi Selim piilopaikastaan ja huusi:
— Halloo, ihmiset hyvät! Voitteko opastaa meidät hra La Grangen
luo?
— Oh!… — kuului pelästyneitä ääniä. Mitä te olette miehiänne?
— Tänään sotilaita, huomenna ties mitä! Neuvokaa meille tie,
älkääkä kyselkö.
Eräs miehistä virkkoi:
— Minä seuraan teitä.
Käännyimme nyt leveälle tielle, jota verhosi läpinäkymätön
pimeys.
Kuljettuamme satakunta askelta teki tie äkkinäisen mutkan oikeaan.
Etäämpänä kohosi pieni kirkko, jota ympäröi valkoinen, matala muuri
ja
joukko puita; näiden lomitse pilkisti valaistuja ikkunoita.
— Tuo on Pyhän Hubertuksen kirkko, virkkoi saattajamme, — ja
täällä asuu La Grange.

Tulimme nyt puutarhaan, joka oli niin tiheä, että oksat levisivät
ikäänkuin holviksi käytävien yli. Puutarha näytti olevan perin
huonossa kunnossa. Sen keskustassa oli kaunis sveitsiläiseen malliin
rakennettu huvila, jonka seinät olivat villien viini- ja muitten
köynnöskasvien peitossa, jotta valaistut ikkunat tuikkivat kuin
käärmeen silmät.
Eteisen ovi oli auki, ja me astuimme sisään. Ihmisiä ei näkynyt.
Eteinen oli pimeä, mutta kuu paistoi niin heleästi ikkunasta, että me
saatoimme eroittaa kaikki: keskellä lattiaa oli pöytä, seinänvierillä
kaappeja ja niiden päällä kuivaneita tähkäkimppuja.
Marx koputti ratsupyssyllä lattiaan. Samassa kuului askeleita,
viereiseen huoneeseen antava ovi avautui, ja naisolento —
luultavasti palvelustyttö — näyttäysi, mutta katosi heti, nähtävästi
pelästyen. Marx koputti kovemmin.
Hetkisen kuluttua ilmestyi sama naisolento uudestaan, pitäen
lamppua kädessään, ja hänen perässään tuli vanha harmaahapsinen
ukko. Tämä lähestyi meitä, nosti käden silmilleen ja kysyi verkkaan,
mutta verrattain kovalla äänellä:
— Miksi te meluatte niin, ystäväiseni?
Sitten asetti hän käden korvalleen ja kysyi:
— Hä?
Jos me olisimmekin melunneet, niin ei ukkoparka olisi sitä
varmastikaan kuullut, sillä kaikesta päättäen tuntui hän olevan
kuuro.

Sequence Analysis and Modern C++ Hauswedell

More Related Content

Similar to Sequence Analysis and Modern C++ Hauswedell (20)

Recently uploaded (20)

Sequence Analysis and Modern C++ Hauswedell