SlideShare a Scribd company logo
Twente Research and Education on Software Engineering,
Department of Computer Science,
Faculty of Electrical Engineering, Mathematics and Computer Science,
University of Twente
Automatic Derivation of
Semantic Properties in .NET
M.D.W. van Oudheusden
Enschede, August 28, 2006
graduation committee
prof. dr. ir. M. Aksit
dr. ir. L.M.J. Bergmans
ir. P.E.A. D¨urr
Thesis
Abstract
The Compose .NET project offers aspect-oriented programming for Microsoft .NET languages
through the composition filters model. Most of the information used by Compose (and other
Aspect Oriented Programming implementations) is based on structural and syntactical prop-
erties. For instance, the join point selection is largely based on naming conventions and hier-
archical structures. Furthermore, the analysis tasks of Compose would benefit when there
is more semantical data available. Like the behavior of the reified message used by the meta
filters or the side effects generated by the use of aspects. Fine grained join points, join points at
instruction level, are currently not possible because the information about the inner workings
of a function is not present.
The automatic derivation of semantic properties is an attempt to deduce the behavior of a
.NET application with the use of existing tools. Not only Compose can benefit from this, but
also other applications like finding design patterns in the source code, reverse engineer design
documentation, generating pre- and postconditions, verifying software contracts, checking be-
havioral subtyping, or any other type of statical analysis.
The Semantic Analyzer is the implementation of a static source code analyzer, which converts
instructions into semantic actions and store these actions inside a metamodel, a high level rep-
resentation of the original code. In order to do this, we first have to examine the different kinds
of semantic programming constructs and how these semantics are represented in the target
language, namely the Microsoft Intermediate Language.
Different tools are available to read and parse the Intermediate Language and Microsoft
Phoenix is selected to be used as the main IL reader and converter, called a Semantic Extractor.
The extractors convert an object-oriented language to the language independent metamodel.
In order to search in this model, a query mechanism based on native queries was devel-
oped. Native queries are a type-safe, compile-time checked, and object-oriented way to express
queries directly as Java or C# methods.
A number of tools, such as plugins, were created to make use of the native queries to query
the model to solve some of the semantical information needs. The automatic extraction of the
reified message behavior and the resource usage are some of the functionalities now added to
i
Compose by using the Semantic Analyzer.
Automatically deriving semantical properties by analyzing the source code and the semantics
of the source language can certainly not solve all the problems. Some intent is not present in
the code but in the underlying system and getting a clear behavioral description of a function
is not possible. However, the Semantic Analyzer offers developers an extensive metamodel with
basic semantic related actions, control flow information, and operand data to reason about the
possible intended behavior of the code.
Acknowledgements
Research and development of the Semantic Analyzer and the actual writing of this thesis was an
interesting, but time consuming process. Of course there are some people I would like to thank
for helping me during this time.
First, my graduation committee, Lodewijk Bergmans and Pascal D¨urr, for their insights in AOP
in general and Compose in particular. Their remarks, suggestions, and questions helped me
a great deal in creating this work.
Also the other Compose developers for their work on the whole project and the tips and tricks
I received from them when hunting down bugs and trying to get LATEXto do what I wanted.
Last, but certainly not least, I would like to thank my parents for making this all happen and
always supporting me in my decisions.
iii
Thesis
Reading Guide
A short guide is presented here to help you in reading this thesis.
The first three chapters are common chapters written by the Compose developers. Chapter 1
presents a general introduction to Aspect Oriented Software Development and the evolution
of programming languages. The next chapter, chapter 2, provides more information about
Compose , which is an implementation of the composition filters approach. If you are unfa-
miliar with either AOSD or the AOP solution Compose , then read the first two chapters.
Chapter 3 describes the .NET Framework, the language platform used in the implementation.
Chapter 6 will present more details about the language, so for background information of the
.NET Framework, read chapter 3 first.
The reasons why this assignment was carried out are discussed in the motivation chapter, chap-
ter 4. To learn more about semantics, take a look at chapter 5. How semantic programming
constructions are represented in the .NET Intermediate Language and how this language can
be accessed is described in chapter 6.
The actual design of the Semantic Analyzer is described in chapter 7 and chapter 8 will explain
how to use the analyzer and provides some practical examples.
Finally, the evaluation and conclusions are presented in chapter 9, as are related and future
work.
Code examples in the C# language are created in version 2.0 of the Microsoft .NET Framework
unless stated otherwise. The algorithms use a pseudo C# language for their representation.
Class diagrams were created with the Class Designer of Visual Studio 2005.
More information about Compose and the Semantic Analyzer can be found at the Compose
project website1. The source code for the Semantic Analyzer is available in the CVS archives of
SourceForge2.
1
http://composestar.sf.net/
2
http://composestar.cvs.sourceforge.net/composestar/home/mdwvanoudheusden/code/
v
Contents
Abstract i
Acknowledgements iii
Reading Guide v
List of Figures xi
List of Tables xiii
List of Listings xv
List of Algorithms xvii
Nomenclature xix
1 Introduction to AOSD 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Traditional Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 AOP Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 AOP Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Aspect Weaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2.1 Source Code Weaving . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2.2 Intermediate Language Weaving . . . . . . . . . . . . . . . . . . 6
1.3.2.3 Adapting the Virtual Machine . . . . . . . . . . . . . . . . . . . . 7
1.4 AOP Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 AspectJ Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Hyperspaces Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.3 Composition Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Compose 12
2.1 Evolution of Composition Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
vi
2.2 Composition Filters in Compose . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Demonstrating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Initial Object-Oriented Design . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Completing the Pacman Example . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2.1 Implementation of Scoring . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2.2 Implementation of Dynamic Strategy . . . . . . . . . . . . . . . . 18
2.4 Compose Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Integrated Development Environment . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Compile Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.4 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.2 C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.3 .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Features Specific to Compose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Introduction to the .NET Framework 24
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Architecture of the .NET Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Version 2.0 of .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Common Language Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Java VM vs .NET CLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Common Language Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Framework Class Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Common Intermediate Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Motivation 34
4.1 Current State of Compose /.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 Selecting Match Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.2 Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.3 Fine Grained Join Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Providing Semantical Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 General Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Semantics 39
5.1 What is Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Semantics of Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.1 Static and Dynamic Program Analysis . . . . . . . . . . . . . . . . . . . . . 40
5.2.2 Software Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Semantical Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.1 Value Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.2 Comparison of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.3 Branching Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.4 Method Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3.5 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3.6 Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.7 Type Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.8 Data Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Program Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4.2 Method Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Analyzing the Intermediate Language 49
6.1 Inside the IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.2 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.3 Assemblies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.4 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.1.5 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1.7 Method Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.1.8 IL Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1.8.1 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.1.8.2 Arithmetical Instructions . . . . . . . . . . . . . . . . . . . . . . . 57
6.1.8.3 Loading and Storing . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.8.4 Method Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.1.8.5 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.1.9 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.1.10 Custom Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Access the IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.1 How to Read IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.2 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.3 Mono Cecil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.4 PostSharp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.5 RAIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.6 Microsoft Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7 Design and Implementation 73
7.1 General Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1.2 Design Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.1.3 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.1.4 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.4.1 Semantic Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.4.2 Semantic Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.4.3 Semantic Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1.4.4 Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1.5 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2 Semantic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2.1 Overall Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2.2 From Instructions to Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.2.3 Dealing with Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2.4 Type Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.5 Model Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.5.1 SemanticContainer . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.5.2 SemanticClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2.5.3 SemanticOperation . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2.5.4 SemanticOperand and Subclasses . . . . . . . . . . . . . . . . . . 89
7.2.5.5 SemanticBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.5.6 SemanticAction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.5.7 SemanticType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2.5.8 SemanticAttributes . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2.6 Flow graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 Extracting Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.1 Semantic Extractor Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.2 Mono Cecil Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3.3 PostSharp Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.4 RAIL Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.3.5 Microsoft Phoenix Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.4 Querying the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4.1 Semantic Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.4.2 What to Retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.4.3 Query Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.4.3.1 Predicate Language . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.4.3.2 Resource Description Framework . . . . . . . . . . . . . . . . . . 107
7.4.3.3 Traverse Over Methods . . . . . . . . . . . . . . . . . . . . . . . . 107
7.4.3.4 Object Query Language . . . . . . . . . . . . . . . . . . . . . . . . 108
7.4.3.5 Simple Object Database Access . . . . . . . . . . . . . . . . . . . 108
7.4.3.6 Native Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.4.3.7 Natural Language Queries . . . . . . . . . . . . . . . . . . . . . . 110
7.4.4 Native Queries in Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8 Using the Semantic Analyzer 115
8.1 Semantic Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.3 Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.3.1 ReifiedMessage Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.3.2 Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.3.3 Export Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.3.4 Natural Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4 Integration with Compose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9 Conclusion, Related, and Future Work 126
9.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.1.1 Microsoft Spec# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.1.2 SOUL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.1.3 SetPoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.1.4 NDepend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.1.5 Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.2 Evaluation and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.3.1 Extractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.3.3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.3.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Bibliography 135
A CIL Instruction Set 142
B Evaluation Stack Types 149
C Semantic Extractor Configuration File 150
D Semantic Actions 151
E Semantic Types 153
F SEMTEX Generated File 154
List of Figures
1.1 Dates and ancestry of several important languages . . . . . . . . . . . . . . . . . . 2
2.1 Components of the composition filters model . . . . . . . . . . . . . . . . . . . . . 14
2.2 UML class diagram of the object-oriented Pacman game . . . . . . . . . . . . . . 16
2.3 Overview of the Compose architecture . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Context of the .NET framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Relationships in the CTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Main components of the CLI and their relationships . . . . . . . . . . . . . . . . . 30
3.4 From source code to machine code . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 Structure of a managed executable module . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Assembly containing multiple files . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 Different kinds of methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.4 Method body structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 Graphical User Interface of ILDASM . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.6 Lutz Roeder’s Reflector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.7 Microsoft FxCop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.8 Platform of Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.9 Control flow graph in Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1 Process overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Control flow of the console application with a plugin . . . . . . . . . . . . . . . . 76
7.3 Loop represented as blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4 Structure of the metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.5 Semantic Item and direct derived classes . . . . . . . . . . . . . . . . . . . . . . . . 85
7.6 SemanticUnit and child classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.7 Visitor pattern in the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.8 SemanticContainer and SemanticClass classes . . . . . . . . . . . . . . . . . . 87
7.9 SemanticOperation class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.10 SemanticOperand class and derived classes . . . . . . . . . . . . . . . . . . . . . . 89
7.11 SemanticBlock class with collection of SemanticAction . . . . . . . . . . . . . . 90
xi
7.12 The SemanticAction class with supporting types . . . . . . . . . . . . . . . . . . 91
7.13 SemanticType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.14 SemanticAttribute class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.15 The flow classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.16 Semantic Extractor Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.17 SemanticDatabaseContainer class . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.18 ExtendedList class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.1 Windows Forms Semantic Analyzer application . . . . . . . . . . . . . . . . . . . 118
8.2 Plugin interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
List of Tables
5.1 Comparison operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1 Aritmetical operations in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2 Bitwise and shift operations in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3 Phoenix unit hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Phoenix instruction forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1 Assembly naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A.1 CIL instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
B.1 Evaluation Stack types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
D.1 Available semantic actions kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
E.1 Available semantic common types . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
xiii
Thesis
Listings
1.1 Modeling addition, display, and logging without using aspects . . . . . . . . . . 3
(a) Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
(b) CalcDisplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Modeling addition, display, and logging with aspects . . . . . . . . . . . . . . . . 5
(a) Addition concern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
(b) Tracing concern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Example of dynamic crosscutting in AspectJ . . . . . . . . . . . . . . . . . . . . . 8
1.4 Example of static crosscutting in AspectJ . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Creation of a hyperspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Specification of concern mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Defining a hypermodule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Abstract concern template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 DynamicScoring concern in Compose . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Implementation of class Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 DynamicStrategy concern in Compose . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Adding example in IL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Adding example in the C# language . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Adding example in the VB.NET language . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Getter and Setter examples in C# .NET . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1 Assignment examples in C# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Comparison examples in C# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Exception handling example in C# . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Method AssignWhenMoreThenOne in C# .NET . . . . . . . . . . . . . . . . . . . 46
5.5 Method AssignWhenMoreThenOne in VB .NET . . . . . . . . . . . . . . . . . . . 46
5.6 Method AssignWhenMoreThenOne in Borland Delphi . . . . . . . . . . . . . . . 47
5.7 Method AssignWhenMoreThenOne in Common Intermediate Language . . . . . 47
6.1 Syntax of a class definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Syntax of a field definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3 Example of a field definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4 Syntax of a field definition with default value . . . . . . . . . . . . . . . . . . . . . 53
6.5 Example of a field definition with default value . . . . . . . . . . . . . . . . . . . . 53
xv
6.6 Syntax of a method definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.7 Control flow examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.8 Constant loading instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.9 Condition check followed by a branching instruction . . . . . . . . . . . . . . . . 58
6.10 Method call example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.11 Exception handling in label form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.12 Exception handling in label form example . . . . . . . . . . . . . . . . . . . . . . . 59
6.13 Exception handling in scope form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.14 Stack based expression in the Common Intermediate Language . . . . . . . . . . 60
6.15 Custom attribute syntax in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.16 Custom attribute example in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.17 Example of a custom attribute in C# . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.18 Reflection example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.19 Cecil get types example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.20 PostSharp get body instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.21 Phoenix phase execute example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1 Calling the SemanticExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 For loop in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 Expression in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4 Expression in IL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.5 Semantical representation of the expression . . . . . . . . . . . . . . . . . . . . . . 81
7.6 Part of the Cecil Instruction Visitor . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7 Using the instruction stream in PostSharp . . . . . . . . . . . . . . . . . . . . . . . 101
7.8 Loading the assembly using Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.9 Starting a phase for a function using the Phoenix Extractor . . . . . . . . . . . . . 102
7.10 SODA example in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.11 LINQ query example in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.12 LINQ query example in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.13 LINQ query example in C#.NET 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.14 LINQ query examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
(a) query expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
(b) lambda expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.15 Query function signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.16 Predicate matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.17 Return all distinct operations containing actions assigning a value to a field
named ”name” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.1 Search for all call actions and sort by operation name . . . . . . . . . . . . . . . . 115
8.2 Find all operations using a field named value as their destination operand . . . . 116
8.3 Group jump labels by operation name . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.4 Retrieve all the assignments where an integer is used . . . . . . . . . . . . . . . . 117
8.5 Find operations using a ReifiedMessage . . . . . . . . . . . . . . . . . . . . . . . . 120
8.6 Find the argument using a ReifiedMessage . . . . . . . . . . . . . . . . . . . . . . 120
8.7 Retrieve all the calls to methods of the ReifiedMessage argument . . . . . . . . . 120
8.8 Retrieve other methods which should be analyzed after a proceed call . . . . . . 121
9.1 Selecting getters in SOUL using Prolog . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.2 Examples of CQL queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.1 Contents of the app.config file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
F.1 Part of the SEMTEX file for the pacman example . . . . . . . . . . . . . . . . . . . 154
List of Algorithms
1 GenerateSemanticControlFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2 Connect flow edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3 Start DetermineControlDependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4 DetermineControlDependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5 Determine Flow Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Determine Access Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7 Optimization of Semantic Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
xvii
Thesis
Nomenclature
AOP Aspect-Oriented Programming
API Application Programming Interface
AST Abstract Syntax Tree
BLOB Binary Large Object
CF Composition Filters
CIL Common Intermediate Language
CLI Common Language Infrastructure
CLR Common Language Runtime
CLS Common Language Specification
CQL Code Query Language
CTS Common Type System
FCL Framework Class Library
GUI Graphical User Interface
IL Intermediate Language
JIT Just-in-time
JVM Java Virtual Machine
MLI Meta Level Interface
xix
OOP Object-Oriented Programming
OpCode Operation Code
OQL Object Query Language
PDA Personal Digital Assistant
RDF Resource Description Framework
SDK Software Development Kit
SOAP Simple Object Access Protocol
SODA Simple Object Database Access
SQL Structured Query Language
UML Unified Modeling Language
URI Uniform Resource Identifiers
VM Virtual Machine
WSDL Web Services Description Language
XML eXtensible Markup Language
CHAPTER 1
Introduction to AOSD
The first two chapters have originally been written by seven M. Sc. students [39, 24, 80, 11, 72,
37, 10] at the University of Twente. The chapters have been rewritten for use in the following
theses [79, 16, 75, 42, 23, 41, 71]. They serve as a general introduction into Aspect-Oriented
Software Development and Compose in particular.
1.1 Introduction
The goal of software engineering is to solve a problem by implementing a software system. The
things of interest are called concerns. They exist at every level of the engineering process. A re-
current theme in engineering is that of modularization: separation and localization of concerns.
The goal of modularization is to create maintainable and reusable software. A programming
language is used to implement concerns.
Fifteen years ago the dominant programming language paradigm was procedural program-
ming. This paradigm is characterized by the use of statements that update state variables.
Examples are Algol-like languages such as Pascal, C, and Fortran.
Other programming paradigms are the functional, logic, object-oriented, and aspect-oriented
paradigms. Figure 1.1 summarizes the dates and ancestry of several important languages [83].
Every paradigm uses a different modularization mechanism for separating concerns into mod-
ules.
Functional languages try to solve problems without resorting to variables. These languages are
entirely based on functions over lists and trees. Lisp and Miranda are examples of functional
languages.
A logic language is based on a subset of mathematical logic. The computer is programmed to
infer relationships between values, rather than to compute output values from input values.
Prolog is currently the most used logic language [83].
1
1.1 Introduction
object-oriented
languages
procedural and concurrent
languages
functional
languages
logic
languages
aspect-oriented
languages
2000
1990
1980
1970
1960
1950
Smalltalk
Simula
Ada
Pascal
Algol-60
Algol-68
C
Cobol
Fortran
Lisp
ML
Miranda
Prolog
Sina
Sina/st Java
C++
BASIC
VB
C#
AspectJ
2005 Compose*
Hyper/J
Legenda:
Influenced by
Figure 1.1: Dates and ancestry of several important languages
A shortcoming of procedural programming is that global variables can potentially be accessed
and updated by any part of the program. This can result in unmanageable programs because no
module that accesses a global variable can be understood independently from other modules
that also access that global variable.
The Object-Oriented Programming (OOP) paradigm improves modularity by encapsulating
data with methods inside objects. The data may only be accessed indirectly, by calling the
associated methods. Although the concept appeared in the seventies, it took twenty years to
become popular [83]. The most well known object-oriented languages are C++, Java, C#, and
Smalltalk.
The hard part about object-oriented design is decomposing a system into objects. The task
is difficult because many factors come into play: encapsulation, granularity, dependency,
adaptability, reusability, and others. They all influence the decomposition, often in conflict-
ing ways [31].
Existing modularization mechanisms typically support only a small set of decompositions and
usually only a single dominant modularization at a time. This is known as the tyranny of the
dominant decomposition [74]. A specific decomposition limits the ability to implement other
concerns in a modular way. For example, OOP modularizes concerns in classes and only fixed
relations are possible. Implementing a concern in a class might prevent another concern from
being implemented as a class.
Aspect-Oriented Programming (AOP) is a paradigm that solves this problem.
AOP is commonly used in combination with OOP but can be applied to other paradigms as
well. The following sections introduce an example to demonstrate the problems that may arise
with OOP and show how AOP can solve this. Finally, we look at three particular AOP method-
ologies in more detail.
2 Automatic Derivation of Semantic Properties in .NET
1.2 Traditional Approach
1 public class Add extends Calculation{
2
3 private int result;
4 private CalcDisplay calcDisplay;
5 private Tracer trace;
6
7 Add() {
8 result = 0;
9 calcDisplay = new CalcDisplay();
10 trace = new Tracer();
11 }
12
13 public void execute(int a, int b) {
14 trace.write("void Add.execute(int, int
)");
15 result = a + b;
16 calcDisplay.update(result);
17 }
18
19 public int getLastResult() {
20 trace.write("int Add.getLastResult()")
;
21 return result;
22 }
23 }
(a) Addition
1 public class CalcDisplay {
2 private Tracer trace;
3
4 public CalcDisplay() {
5 trace = new Tracer();
6 }
7
8 public void update(int value){
9 trace.write("void CalcDisplay.update(
int)");
10 System.out.println("Printing new value
of calculation: "+value);
11 }
12 }
(b) CalcDisplay
Listing 1.1: Modeling addition, display, and logging without using aspects
1.2 Traditional Approach
Consider an application containing an object Add and an object CalcDisplay. Add inherits from
the abstract class Calculation and implements its method execute(a, b). It performs the
addition of two integers. CalcDisplay receives an update from Add if a calculation is finished
and prints the result to screen. Suppose all method calls need to be traced. The objects use a
Tracer object to write messages about the program execution to screen. This is implemented
by a method called write. Three concerns can be recognized: addition, display, and tracing.
The implementation might look something like Listing 1.1.
From our example, we recognize two forms of crosscutting: code tangling and code scattering.
The addition and display concerns are implemented in classes Add and CalcDisplay respec-
tively. Tracing is implemented in the class Tracer, but also contains code in the other two
classes (lines 5, 10, 14, and 20 in (a) and 2, 5, and 9 in (b)). If a concern is implemented across
several classes it is said to be scattered. In the example of Listing 1.1 the tracing concern is
scattered.
Usually a scattered concern involves code replication. That is, the same code is implemented
a number of times. In our example the classes Add and CalcDisplay contain similar tracing
code.
In class Add the code for the addition and tracing concerns are intermixed. In class
M.D.W. van Oudheusden 3
1.3 AOP Approach
CalcDisplay the code for the display and tracing concerns are intermixed. If more then
one concern is implemented in a single class they are said to be tangled. In our example the
addition and tracing concerns are tangled. Also display and tracing concerns are tangled.
Crosscutting code has the following consequences:
Code is difficult to change
Changing a scattered concern requires us to modify the code in several places. Making
modifications to a tangled concern class requires checking for side-effects with all existing
crosscutting concerns;
Code is harder to reuse
To reuse an object in another system, it is necessary to either remove the tracing code or
reuse the (same) tracer object in the new system;
Code is harder to understand
Tangled code makes it difficult to see which code belongs to which concern.
1.3 AOP Approach
To solve the problems with crosscutting, several techniques are being researched that attempt
to increase the expressiveness of the OO paradigm. Aspect-Oriented Programming (AOP) in-
troduces a modular structure, the aspect, to capture the location and behavior of crosscutting
concerns. Examples of Aspect-Oriented languages are Sina, AspectJ, Hyper/J, and Compose .
A special syntax is used to specify aspects and the way in which they are combined with reg-
ular objects. The fundamental goals of AOP are twofold [34]: first to provide a mechanism to
express concerns that crosscut other components. Second to use this description to allow for
the separation of concerns.
Join points are well-defined places in the structure or execution flow of a program where ad-
ditional behavior can be attached. The most common join points are method calls. Pointcuts
describe a set of join points. This allows us to execute behavior at many places in a program by
one expression. Advice is the behavior executed at a join point.
In the example of Listing 1.2 the class Add does not contain any tracing code and only imple-
ments the addition concern. Class CalcDisplay also does not contain tracing code. In our
example the tracing aspect contains all the tracing code. The pointcut tracedCalls specifies
at which locations tracing code is executed.
The crosscutting concern is explicitly captured in aspects instead of being embedded within
the code of other objects. This has several advantages over the previous code.
Aspect code can be changed
Changing aspect code does not influence other concerns;
Aspect code can be reused
The coupling of aspects is done by defining pointcuts. In theory, this low coupling allows
for reuse. In practice reuse is still difficult;
Aspect code is easier to understand
A concern can be understood independent of other concerns;
Aspect pluggability
Enabling or disabling concerns becomes possible.
4 Automatic Derivation of Semantic Properties in .NET
1.3 AOP Approach
1 public class Add extends Calculation{
2 private int result;
3 private CalcDisplay calcDisplay;
4
5 Add() {
6 result = 0;
7 calcDisplay = new CalcDisplay();
8 }
9
10 public void execute(int a, int b) {
11 result = a + b;
12 calcDisplay.update(result);
13 }
14
15 public int getLastResult() {
16 return result;
17 }
18 }
(a) Addition concern
1 aspect Tracing {
2 Tracer trace = new Tracer();
3
4 pointcut tracedCalls():
5 call(* (Calculation+).*(..)) ||
6 call(* CalcDisplay.*(..));
7
8 before(): tracedCalls() {
9 trace.write(thisJoinPoint.getSignature()
.toString());
10 }
11 }
(b) Tracing concern
Listing 1.2: Modeling addition, display, and logging with aspects
1.3.1 AOP Composition
AOP composition can be either symmetric or asymmetric. In the symmetric approach every
component can be composed with any other component. This approach is followed by e.g.
Hyper/J.
In the asymmetric approach, the base program and aspects are distinguished. The base pro-
gram is composed with the aspects. This approach is followed by e.g. AspectJ (covered in more
detail in the next section).
1.3.2 Aspect Weaving
The integration of components and aspects is called aspect weaving. There are three approaches
to aspect weaving. The first and second approach rely on adding behavior in the program,
either by weaving the aspect in the source code, or by weaving directly in the target language.
The target language can be intermediate language (IL) or machine code. Examples of IL are Java
byte code and Common Intermediate Language (CIL). The remainder of this chapter considers
only intermediate language targets. The third approach relies on adapting the virtual machine.
Each method is explained briefly in the following sections.
1.3.2.1 Source Code Weaving
The source code weaver combines the original source with aspect code. It interprets the defined
aspects and combines them with the original source, generating input for the native compiler.
For the native compiler there is no difference between source code with and without aspects.
M.D.W. van Oudheusden 5
1.3 AOP Approach
Hereafter the compiler generates an intermediate or machine language output (depending on
the compiler-type).
The advantages of using source code weaving are:
High-level source modification
Since all modifications are done at source code level, there is no need to know the target
(output) language of the native compiler;
Aspect and original source optimization
First the aspects are woven into the source code and hereafter compiled by the native
compiler. The produced target language has all the benefits of the native compiler opti-
mization passes. However, optimizations specific to exploiting aspect knowledge are not
possible;
Native compiler portability
The native compiler can be replaced by any other compiler as long as it has the same
input language. Replacing the compiler with a newer version or another target language
can be done with little or no modification to the aspect weaver.
However, the drawbacks of source code weaving are:
Language dependency
Source code weaving is written explicitly for the syntax of the input language;
Limited expressiveness
Aspects are limited to the expressive power of the source language. For example, when
using source code weaving, it is not possible to add multiple inheritance to a single in-
heritance language.
1.3.2.2 Intermediate Language Weaving
Weaving aspects through an intermediate language gives more control over the executable
program and solves some issues as identified in subsubsection 1.3.2.1 on source code weaving.
Weaving at this level allows for creating combinations of intermediate language constructs
that can not be expressed at the source code level. Although IL can be hard to understand, IL
weaving has several advantages over source code weaving:
Programming language independence
All compilers generating the target IL output can be used;
More expressiveness
It is possible to create IL constructs that are not possible in the original programming
language;
Source code independence
Can add aspects to programs and libraries without using the source code (which may not
be available);
Adding aspects at load- or runtime
A special class loader or runtime environment can decide and do dynamic weaving. The
aspect weaver adds a runtime environment into the program. How and when aspects
can be added to the program depend on the implementation of the runtime environment.
However, IL weaving also has drawbacks that do not exist for source code weaving:
6 Automatic Derivation of Semantic Properties in .NET
1.4 AOP Solutions
Hard to understand
Specific knowledge about the IL is needed;
More error-prone
Compiler optimization may cause unexpected results. Compiler can remove code that
breaks the attached aspect (e.g., inlining of methods).
1.3.2.3 Adapting the Virtual Machine
Adapting the virtual machine (VM) removes the need to weave aspects. This technique has the
same advantages as intermediate language weaving and can also overcome some of its disad-
vantages as mentioned in subsubsection 1.3.2.2. Aspects can be added without recompilation,
redeployment, and restart of the application [63, 64].
Modifying the virtual machine also has its disadvantages:
Dependency on adapted virtual machines
Using an adapted virtual machine requires that every system should be upgraded to that
version;
Virtual machine optimization
People have spend a lot of time optimizing virtual machines. By modifying the virtual
machine these optimizations should be revisited. Reintegrating changes introduced by
newer versions of the original virtual machine, might have substantial impact.
1.4 AOP Solutions
As the concept of AOP has been embraced as a useful extension to classic programming, dif-
ferent AOP solutions have been developed. Each solution has one or more implementations to
demonstrate how the solution is to be used. As described by [26] these differ primarily in:
How aspects are specified
Each technique uses its own aspect language to describe the concerns;
Composition mechanism
Each technique provides its own composition mechanisms;
Implementation mechanism
Whether components are determined statically at compile time or dynamically at run
time, the support for verification of compositions, and the type of weaving.
Use of decoupling
Should the writer of the main code be aware that aspects are applied to his code;
Supported software processes
The overall process, techniques for reusability, analyzing aspect performance of aspects,
is it possible to monitor performance, and is it possible to debug the aspects.
This section will give a short introduction to AspectJ [46] and Hyperspaces [62], which together
with Composition Filters [8] are three main AOP approaches.
M.D.W. van Oudheusden 7
1.4 AOP Solutions
1 aspect DynamicCrosscuttingExample {
2 Log log = new Log();
3
4 pointcut traceMethods():
5 execution(edu.utwente.trese.*.*(..));
6
7 before() : traceMethods {
8 log.write("Entering " + thisJointPoint.getSignature());
9 }
10
11 after() : traceMethods {
12 log.write("Exiting " + thisJointPoint.getSignature());
13 }
14 }
Listing 1.3: Example of dynamic crosscutting in AspectJ
1.4.1 AspectJ Approach
AspectJ [46] is an aspect-oriented extension to the Java programming language. It is probably
the most popular approach to AOP at the moment, and it is finding its way into the industrial
software development. AspectJ has been developed by Gregor Kiczales at Xerox’s PARC (Palo
Alto Research Center). To encourage the growth of the AspectJ technology and community,
PARC transferred AspectJ to an open Eclipse project. The popularity of AspectJ comes partly
from the various extensions based on it, build by several research groups. There are various
projects that are porting AspectJ to other languages, resulting in tools such as AspectR and
AspectC.
One of the main goals in the design of AspectJ is to make it a compatible extension to Java.
AspectJ tries to be compatible in four ways:
Upward compatibility
All legal Java programs must be legal AspectJ programs;
Platform compatibility
All legal AspectJ programs must run on standard Java virtual machines;
Tool compatibility
It must be possible to extend existing tools to support AspectJ in a natural way; this
includes IDEs, documentation tools and design tools;
Programmer compatibility
Programming with AspectJ must feel like a natural extension of programming with Java.
AspectJ extends Java with support for two kinds of crosscutting functionality. The first allows
defining additional behavior to run at certain well-defined points in the execution of the pro-
gram and is called the dynamic crosscutting mechanism. The other is called the static crosscutting
mechanism and allows modifying the static structure of classes (methods and relationships be-
tween classes). The units of crosscutting implementation are called aspects. An example of an
aspect specified in AspectJ is shown in Listing 1.3.
The points in the execution of a program where the crosscutting behavior is inserted are called
join points. A pointcut has a set of join points. In Listing 1.3 is traceMethods an example of
a pointcut definition. The pointcut includes all executions of any method that is in a class
contained by package edu.utwente.trese.
8 Automatic Derivation of Semantic Properties in .NET
1.4 AOP Solutions
1 aspect StaticCrosscuttingExample {
2 private int Log.trace(String traceMsg) {
3 Log.write(" --- MARK --- " + traceMsg);
4 }
5 }
Listing 1.4: Example of static crosscutting in AspectJ
The code that should execute at a given join point is declared in an advice. Advice is a method-
like code body associated with a certain pointcut. AspectJ supports before, after and around
advice that specifies where the additional code is to be inserted. In the example both before
and after advice are declared to run at the join points specified by the traceMethods pointcut.
Aspects can contain anything permitted in class declarations including definitions of pointcuts,
advice and static crosscutting. For example, static crosscutting allows a programmer to add
fields and methods to certain classes as shown in Listing 1.4.
The shown construct is called inter-type member declaration and adds a method trace to class
Log. Other forms of inter-type declarations allow developers to declare the parents of classes
(superclasses and realized interfaces), declare where exceptions need to be thrown, and allow
a developer to define the precedence among aspects.
With its variety of possibilities AspectJ can be considered a useful approach for realizing soft-
ware requirements.
1.4.2 Hyperspaces Approach
The Hyperspaces approach is developed by H. Ossher and P. Tarr at the IBM T.J. Watson Research
Center. The Hyperspaces approach adopts the principle of multi-dimensional separation of
concerns [62], which involves:
• Multiple, arbitrary dimensions of concerns;
• Simultaneous separation along these dimensions;
• Ability to dynamically handle new concerns and new dimensions of concern as they arise
throughout the software life cycle;
• Overlapping and interacting concerns. It is appealing to think of many concerns as inde-
pendent or orthogonal, but they rarely are in practice.
We explain the Hyperspaces approach by an example written in the Hyper/J language. Hyper/J
is an implementation of the Hyperspaces approach for Java. It provides the ability to identify
concerns, specify modules in terms of those concerns, and synthesize systems and components
by integrating those modules. Hyper/J uses bytecode weaving on binary Java class files and
generates new class files to be used for execution. Although the Hyper/J project seems aban-
doned and there has not been any update in the code or documentation for a while, we still
mention it because the Hyperspaces approach offers a unique AOP solution.
As a first step, developers create hyperspaces by specifying a set of Java class files that contain
the code units that populate the hyperspace. To do this is, you create a hyperspace specification,
as demonstrated in Listing 1.5.
M.D.W. van Oudheusden 9
1.4 AOP Solutions
1 Hyperspace Pacman
2 class edu.utwente.trese.pacman.*;
Listing 1.5: Creation of a hyperspace
Hyper/J will automatically create a hyperspace with one dimension—the class file dimension.
A dimension of concern is a set of concerns that are disjoint. The initial hyperspace will con-
tain all units within the specified package. To create a new dimension you can specify concern
mappings, which describe how existing units in the hyperspace relate to concerns in that di-
mension, as demonstrated in Listing 1.6.
The first line indicates that, by default, all of the units contained within the package edu.
utwente.trese.pacman address the kernel concern of the feature dimension. The other map-
pings specify that any method named trace or debug address the logging and debugging
concern respectively. These later mappings override the first one.
Hypermodules are based on concerns and consist of two parts. The first part specifies a set of
hyperslices in terms of the concerns identified in the concern matrix. The second part specifies
the integration relationships between the hyperslices. A hyperspace can contain several hyper-
modules realizing different modularizations of the same units. Systems can be composed in
many ways from these hypermodules.
Listing 1.7 shows a hypermodule with two concerns, kernel and logging. They are related
by a mergeByName integration relationship. This means that units in the different concerns
correspond if they have the same name (ByName) and that these corresponding units are to be
combined (merge). For example, all members of the corresponding classes are brought together
into the composed class. The hypermodule results in a hyperslice that contains all the classes
without the debugging feature; thus no debug methods will be present.
The most important feature of the hyperspaces approach is the support for on-demand remod-
ularisation: the ability to extract hyperslices to encapsulate concerns that were not separated
in the original code. Which makes hyperspaces especially useful for evolution of existing soft-
ware.
1.4.3 Composition Filters
Composition Filters is developed by M. Aks¸it and L. Bergmans at the TRESE group, which is
a part of the Department of Computer Science of the University of Twente, The Netherlands.
The composition filters (CF) model predates aspect-oriented programming. It started out as an
extension to the object-oriented model and evolved into an aspect-oriented model. The current
implementation of CF is Compose , which covers .NET, Java, and C.
1 package edu.utwente.trese.pacman: Feature.Kernel
2 operation trace: Feature.Logging
3 operation debug: Feature.Debugging
Listing 1.6: Specification of concern mappings
10 Automatic Derivation of Semantic Properties in .NET
1.4 AOP Solutions
1 hypermodule Pacman_Without_Debugging
2 hyperslices: Feature.Kernel, Feature.Logging;
3 relationships: mergeByName;
4 end hypermodule;
Listing 1.7: Defining a hypermodule
One of the key elements of CF is the message, a message is the interaction between objects, for
instance a method call. In object-oriented programming the message is considered an abstract
concept. In the implementations of CF it is therefore necessary to reify the message. This reified
message contains properties, like where it is send to and where it came from.
The concept of CF is that messages that enter and exit an object can be intercepted and manip-
ulated, modifying the original flow of the message. To do so, a layer called the interface part is
introduced in the CF model, this layer can have several properties. The interface part can be
placed on an object, which behavior needs to be altered, and this object is referred to as inner.
There are three key elements in CF: messages, filters, and superimposition. Messages are sent
from one object to another, if there is an interface part placed on the receiver, then the message
that is sent goes through the input filters. In the filters the message can be manipulated before
it reaches the inner part, the message can even be sent to another object. How the message
will be handled depends on the filter type. An output filter is similar to an input filter, the only
difference is that it manipulates messages that originate from the inner part. The latest addition
to CF is superimposition, which is used to specify which interfaces needs to be superimposed
on which inner objects.
M.D.W. van Oudheusden 11
CHAPTER 2
Compose
Compose is an implementation of the composition filters approach. There are three target
environments: the .NET, Java, and C. This chapter is organized as follows, first the evolution
of Composition Filters and its implementations are described, followed by an explanation of
the Compose language and a demonstrating example. In the third section, the Compose
architecture is explained, followed by a description of the features specific to Compose .
2.1 Evolution of Composition Filters
Compose is the result of many years of research and experimentation. The following time
line gives an overview of what has been done in the years before and during the Compose
project.
1985 The first version of Sina is developed by Mehmet Aks¸it. This version of Sina contains a
preliminary version of the composition filters concept called semantic networks. The
semantic network construction serves as an extension to objects, such as classes, mes-
sages, or instances. These objects can be configured to form other objects such as
classes from which instances can be created. The object manager takes care of syn-
chronization and message processing of an object. The semantic network construction
can express key concepts like delegation, reflection, and synchronization [47].
1987 Together with Anand Tripathi of the University of Minnesota the Sina language is
further developed. The semantic network approach is replaced by declarative specifi-
cations and the interface predicate construct is added.
1991 The interface predicates are replaced by the dispatch filter, and the wait filter manages
the synchronization functions of the object manager. Message reflection and real-time
specifications are handled by the meta filter and the real-time filter [7].
1995 The Sina language with Composition Filters is implemented using Smalltalk [47]. The
implementation supports most of the filter types. In the same year, a preprocessor
12
2.2 Composition Filters in Compose
1 filtermodule{
2 internals
3 externals
4 conditions
5 inputfilters
6 outputfilters
7 }
8
9 superimposition{
10 selectors
11 filtermodules
12 annotations
13 constraints
14 }
15
16 implementation
17 }
Listing 2.1: Abstract concern template
providing C++ with support for Composition Filters is implemented [33].
1999 The composition filters language ComposeJ [85] is developed and implemented. The
implementation consists of a preprocessor capable of translating composition filter
specifications into the Java language.
2001 ConcernJ is implemented as part of a M. Sc. thesis [70]. ConcernJ adds the notion of
superimposition to Composition Filters. This allows for reuse of the filter modules
and to facilitate crosscutting concerns.
2003 The start of the Compose project, the project is described in further detail in this
chapter.
2004 The first release of Compose , based on .NET.
2005 The start of the Java port of Compose .
2006 Porting Compose to C is started.
2.2 Composition Filters in Compose
A Compose application consists of concerns that can be divided in three parts: filter module
specification, superimposition, and implementation. A filter module contains the filter logic
to filter on messages that are incoming or outgoing the superimposed object. A message has
a target, which is an object reference, and a selector, which is a method name. The superim-
position part specifies which filter modules, annotations, conditions, and methods need to be
superimposed on which objects. The implementation part contains the class implementation
of the concern. How these parts are placed in a concern is shown in Listing 2.1.
The working of the filter module is shown in Figure 2.1. A filter module can contain input and
output filters. The difference between these two sets of filters is that the first is used to filter on
incoming messages and the second filter set is used on the outgoing messages. A return of a
method is not considered as an outgoing message. A filter has three parts: the filter identifier,
the filter type, and one or more filter elements. The filter element exist out of an optional
condition part, a matching part, and a substitution part. These parts are shown below:
M.D.W. van Oudheusden 13
2.2 Composition Filters in Compose
Figure 2.1: Components of the composition filters model
identifier
stalker filter :
filter type
Dispatch = {
condition part
!pacmanIsEvil =>
matching part
[∗.getNextMove]
substitution part
stalk strategy.getNextMove }
The filter identifier is the unique name for a filter in a filter module. A filter matches when
both the condition as the matching provide the boolean value true. In the demonstrated filter
it matches on every message where the selector is getNextMove, the ‘*’ in the target means that
every target matches. When the condition part and the matching part are true, the message
is substituted with the values of the substitution part. How these values are substituted and
how the message continues depends on the filter type. At the moment there are four basic filter
types in Compose ; it is possible to write custom filter types.
Dispatch
If the message is accepted, it is dispatched to the specified target of the message, other-
wise the message continues to the subsequent filter. This filter type can only be used for
input filters;
Send
If the message is accepted, it is sent to the specified target of the message, otherwise the
message continues to the subsequent filter. This filter type can only be used for output
filters;
Error
If the filter rejects the message, it raises an exception, otherwise the message continues to
the next filter in the set;
Meta
If the message is accepted, the message is sent as a parameter of another meta message to
an internal or external object, otherwise the message just continues to the next filter. The
object that receives the meta message can observe and manipulate the message and can
re-activate the execution of the message.
14 Automatic Derivation of Semantic Properties in .NET
2.3 Demonstrating Example
The pacmanIsEvil used in the condition part must be declared in the conditions section of
a filtermodule. The targets that are used in a filter must declared as internals or externals.
Internals are objects which are unique for each instance of a filter module and externals are
shared between filter modules.
The filter modules can be superimposed on classes with filter module binding, this binding
has a selection of objects on one side and a filter module on the other side. The selection is de-
fined with a selector definition. The selector uses predicates, such as isClassWithNameInList,
isNamespaceWithName, and namespaceHasClass, to select objects. It is also possible to bind
conditions, methods, and annotations to classes with the use of superimposition.
The last part of the concern is the implementation part. In the implementation part we can
define the object behavior of the concern, so for example in a logging concern, we can define
specific log functions.
2.3 Demonstrating Example
To illustrate the Compose toolset, this section introduces a Pacman example. The Pacman
game is a classic arcade game in which the user, represented by pacman, moves in a maze to
eat vitamins. Meanwhile, a number of ghosts try to catch and eat pacman. There are, however,
four mega vitamins in the maze that make pacman evil. In its evil state, pacman can eat ghosts.
A simple list of requirements for the Pacman game is briefly discussed here:
• The number of lives taken from pacman when eaten by a ghost;
• A game should end when pacman has no more lives;
• The score of a game should increase when pacman eats a vitamin or a ghost;
• A user should be able to use a keyboard to move pacman around the maze;
• Ghosts should know whether pacman is evil or not;
• Ghosts should know where pacman is located;
• Ghosts should, depending on the state of pacman, hunt or flee from pacman.
2.3.1 Initial Object-Oriented Design
Figure 2.2 shows an initial object-oriented design for the Pacman game. Note that this UML
class diagram does not show the trivial accessors. The classes in this diagram are:
Game
This class encapsulates the control flow and controls the state of a game;
Ghost
This class is a representation of a ghost chasing pacman. Its main attribute is a property
that indicates whether it is scared or not (depending on the evil state of pacman);
GhostView
This class is responsible for painting ghosts;
Glyph
This is the superclass of all mobile objects (pacman and ghosts). It contains common
information like direction and speed;
Keyboard
This class accepts all keyboard input and makes it available to pacman;
M.D.W. van Oudheusden 15
2.3 Demonstrating Example
Figure 2.2: UML class diagram of the object-oriented Pacman game
16 Automatic Derivation of Semantic Properties in .NET
2.3 Demonstrating Example
Main
This is the entry point of a game;
Pacman
This is a representation of the user controlled element in the game. Its main attribute is a
property that indicates whether pacman is evil or not;
PacmanView
This class is responsible for painting pacman;
RandomStrategy
By using this strategy, ghosts move in random directions;
View
This class is responsible for painting a maze;
World
This class has all the information about a maze. It knows where the vitamins, mega
vitamins and most importantly the walls are. Every class derived from class Glyph checks
whether movement in the desired direction is possible.
2.3.2 Completing the Pacman Example
The initial object-oriented design, described in the previous section, does not implement all the
stated system requirements. The missing requirements are:
• The application does not maintain a score for the user;
• Ghosts move in random directions instead of chasing or fleeing from pacman.
In the next sections, we describe why and how to implement these requirements in the
Compose language.
2.3.2.1 Implementation of Scoring
The first system requirement that we need to add to the existing Pacman game is scoring. This
concern involves a number of events. First, the score should be set to zero when a game starts.
Second, the score should be updated whenever pacman eats a vitamin, mega vitamin or ghost.
And finally, the score itself has to be painted on the maze canvas to relay it back to the user.
These events scatter over multiple classes: Game (initializing score), World (updating score),
Main (painting score). Thus scoring is an example of a crosscutting concern.
To implement scoring in the Compose language, we divide the implementation into two parts.
The first part is a Compose concern definition stating which filter modules to superimpose.
Listing 2.2 shows an example Compose concern definition of scoring.
This concern definition is called DynamicScoring (line 1) and contains two parts. The first part
is the declaration of a filter module called dynamicscoring (lines 2–11). This filter module
contains one meta filter called score_filter (line 6). This filter intercepts five relevant calls
and sends the message in a reified form to an instance of class Score. The final part of the
concern definition is the superimposition part (lines 12–18). This part defines that the filter
module dynamicscoring is to be superimposed on the classes World, Game and Main.
The final part of the scoring concern is the so-called implementation part. This part is defined by
a class Score. Listing 2.3 shows an example implementation of class Score. Instances of this
M.D.W. van Oudheusden 17
2.4 Compose Architecture
1 concern DynamicScoring in pacman {
2 filtermodule dynamicscoring {
3 externals
4 score : pacman.Score = pacman.Score.instance();
5 inputfilters
6 score_filter : Meta = {[*.eatFood] score.eatFood,
7 [*.eatGhost] score.eatGhost,
8 [*.eatVitamin] score.eatVitamin,
9 [*.gameInit] score.initScore,
10 [*.setForeground] score.setupLabel}
11 }
12 superimposition {
13 selectors
14 scoring = { C | isClassWithNameInList(C, [’pacman.World’,
15 ’pacman.Game’, ’pacman.Main’]) };
16 filtermodules
17 scoring <- dynamicscoring;
18 }
19 }
Listing 2.2: DynamicScoring concern in Compose
class receive the messages sent by score_filter and subsequently perform the events related
to the scoring concern. In this way, all scoring events are encapsulated in one class and one
Compose concern definition.
2.3.2.2 Implementation of Dynamic Strategy
The last system requirement that we need to implement is the dynamic strategy of ghosts. This
means that a ghost should, depending on the state of pacman, hunt or flee from pacman. We
can implement this concern by using the strategy design pattern. However, in this way, we
need to modify the existing code. This is not the case when we use Compose dispatch filters.
Listing 2.4 demonstrates this.
This concern uses dispatch filters to intercept calls to method RandomStrategy.getNextMove
and redirect them to either StalkerStrategy.getNextMove or FleeStrategy.getNextMove.
If pacman is not evil, the intercepted call matches the first filter, which dispatches the inter-
cepted call to method StalkerStrategy.getNextMove (line 9). Otherwise, the intercepted
call matches the second filter, which dispatches the intercepted call to method FleeStrategy.
getNextMove (line 11).
2.4 Compose Architecture
An overview of the Compose architecture is illustrated in Figure 2.3. The Compose archi-
tecture can be divided in four layers [60]: IDE, compile time, adaptation, and runtime.
18 Automatic Derivation of Semantic Properties in .NET
2.4 Compose Architecture
1 import Composestar.Runtime.FLIRT.message.*;
2 import java.awt.*;
3
4 public class Score
5 {
6 private int score = -100;
7 private static Score theScore = null;
8 private Label label = new java.awt.Label("Score: 0");
9
10 private Score() {}
11
12 public static Score instance() {
13 if(theScore == null) {
14 theScore = new Score();
15 }
16 return theScore;
17 }
18
19 public void initScore(ReifiedMessage rm) {
20 this.score = 0;
21 label.setText("Score: "+score);
22 }
23
24 public void eatGhost(ReifiedMessage rm) {
25 score += 25;
26 label.setText("Score: "+score);
27 }
28
29 public void eatVitamin(ReifiedMessage rm) {
30 score += 15;
31 label.setText("Score: "+score);
32 }
33
34 public void eatFood(ReifiedMessage rm) {
35 score += 5;
36 label.setText("Score: "+score);
37 }
38
39 public void setupLabel(ReifiedMessage rm) {
40 rm.proceed();
41 label = new Label("Score: 0");
42 label.setSize(15*View.BLOCKSIZE+20,15*View.BLOCKSIZE);
43 Main main = (Main)Composestar.Runtime.FLIRT.message.MessageInfo
44 .getMessageInfo().getTarget();
45 main.add(label,BorderLayout.SOUTH);
46 }
47 }
Listing 2.3: Implementation of class Score
M.D.W. van Oudheusden 19
2.4 Compose Architecture
1 concern DynamicStrategy in pacman {
2 filtermodule dynamicstrategy {
3 internals
4 stalk_strategy : pacman.Strategies.StalkerStrategy;
5 flee_strategy : pacman.Strategies.FleeStrategy;
6 conditions
7 pacmanIsEvil : pacman.Pacman.isEvil();
8 inputfilters
9 stalker_filter : Dispatch = {!pacmanIsEvil =>
10 [*.getNextMove] stalk_strategy.getNextMove};
11 flee_filter : Dispatch = {
12 [*.getNextMove] flee_strategy.getNextMove}
13 }
14 superimposition {
15 selectors
16 random = { C | isClassWithName(C,
17 ’pacman.Strategies.RandomStrategy’) };
18 filtermodules
19 random <- dynamicstrategy;
20 }
21 }
Listing 2.4: DynamicStrategy concern in Compose
Figure 2.3: Overview of the Compose architecture
20 Automatic Derivation of Semantic Properties in .NET
2.4 Compose Architecture
2.4.1 Integrated Development Environment
Some of the purposes of the Integrated Development Environment (IDE) layer are to interface
with the native IDE and to create a build configuration. In the build configuration it is specified
which source files and settings are required to build a Compose application. After creating
the build configuration the compile time is started.
The creation of a build configuration can be done manually or by using a plug-in. Examples
of these plug-ins are the Visual Studio add-in for Compose /.NET and the Eclipse plug-in for
Compose /J and Compose /C.
2.4.2 Compile Time
The compile time layer is platform independent and reasons about the correctness of the com-
position filter implementation with respect to the program which allows the target program to
be build by the adaptation.
The compile time ‘pre-processes’ the composition filter specifications by parsing the specifica-
tion, resolving the references, and checking its consistency. To provide an extensible architec-
ture to facilitate this process a blackboard architecture is chosen. This means that the compile
time uses a general knowledgebase that is called the ‘repository’. This knowledgebase contains
the structure and metadata of the program which different modules can execute their activities
on. Examples of modules within analysis and validation are the three modules SANE, LOLA
and FILTH. These three modules are responsible for (some) of the analysis and validation of
the super imposition and its selectors.
2.4.3 Adaptation
The adaptation layer consists of the program manipulation, harvester, and code generator.
These components connect the platform independent compile time to the target platform. The
harvester is responsible for gathering the structure and the annotations within the source pro-
gram and adding this information to the knowledgebase. The code generation generates a
reduced copy of the knowledgebase and the weaving specification. This weaving specification
is then used by the weaver contained by the program manipulation to weave in the calls to the
runtime into the target program. The end result of the adaptation the target program which
interfaces wit the runtime.
2.4.4 Runtime
The runtime layer is responsible for executing the concern code at the join points. It is acti-
vated at the join points by function calls that are woven in by the weaver. A reduced copy of
the knowledgebase containing the necessary information for filter evaluation and execution is
enclosed with the runtime. When the function is filtered the filter is evaluated. Depending on
if the the condition part evaluates to true, and the matching part matches the accept or reject
behavior of the filter is executed. The runtime also facilitates the debugging of the composition
filter implementations.
M.D.W. van Oudheusden 21
2.5 Platforms
2.5 Platforms
Compose can in theory be applied to any programming language given certain assumptions
are met. Currently Compose has three platforms.
2.5.1 Java
Compose /J, the Java platform of Compose , uses different compiling and weaving tools then
the other platforms. For the use of Compose /J an Eclipse plug-in is provided.
2.5.2 C
Compose /C, the C platform of Compose , is different from its Java and .NET counterparts
because it does not have a runtime interpreter. This implies that the filters implementation
of Compose /C uses generated composition filter code that is weaved directly in the source
code. Because the programming language C does not have the concept of objects the reasoning
within Compose is based on sets of functions. Like the Java platform, Compose /C provides
a plug-in for Eclipse.
2.5.3 .NET
The .NET platform called Compose /.NET of Compose is the oldest implementation of
Compose . Because Compose /.NET works with CIL code, it is programming language inde-
pendent as long as the programming language can be compiled to CIL code. The .NET platform
uses a Visual Studio add-in for ease of development.
2.6 Features Specific to Compose
The Composition Filters approach uses a restricted (pattern matching) language to define fil-
ters. This language makes it possible to reason about the semantics of the concern. Compose
offers three features that use this possibility, which originate in more control and correctness
over an application under construction. These features are:
Ordering of filter modules
It is possible to specify how the superimposition of filter modules should be ordered.
Ordering constraints can be specified in a fixed, conditional, or partial manner. A fixed
ordering can be calculated exactly, whereas a conditional ordering is dependent on the re-
sult of filter execution and therefore evaluated at runtime. When there are multiple valid
orderings of filtermodules on a join point, partial ordering constraints can be applied to
reduce this number. These constraints can be declared in the concern definition;
Filter consistency checking
When superimposition is applied, Compose is able to detect if the ordering and con-
junction of filters creates a conflict. For example, imagine a set of filters where the first
filter only evaluates method m and another filter only evaluates methods a and b. In this
22 Automatic Derivation of Semantic Properties in .NET
2.6 Features Specific to Compose
case the latter filter is only reached with method m; this is consequently rejected and as a
result the superimposition may never be executed. There are different scenarios that lead
to these kinds of problems, e.g., conditions that exclude each other;
Reason about semantic problems
When multiple pieces of advice are added to the same join point, Compose can reason
about problems that may occur. An example of such a conflict is the situation where a
real-time filter is followed by a wait filter. Because the wait filter can wait indefinitely, the
real-time property imposed by the real-time filter may be violated.
The above mentioned conflict analyzers all work on the assumption that the behavior of every
filter is well-defined. This is not the case for the meta filter, its user-undefined, and therefore
unpredictable, behavior poses a problem to the analysis tools.
Furthermore, Compose is extended with features that enhance the usability. These features
are briefly described below:
Integrated Development Environment support
The Compose implementations all have a IDE plug-in; Compose /.NET for Visual Stu-
dio, Compose /J and Compose /C for Eclipse;
Debugging support
The debugger shows the flow of messages through the filters. It is possible to place break-
points to view the state of the filters;
Incremental building process
When a project is build and not all the modules are changed, incremental building saves
time.
Some language properties of Compose can also be seen as features, being:
Language independent concerns
A Compose concern can be used for all the Compose platforms, because the composi-
tion filters approach is language independent;
Reusable concerns
The concerns are easy to reuse, through the dynamic filter modules and the selector lan-
guage;
Expressive selector language
Program elements of an implementation language can be used to select a set of objects to
superimpose on;
Support for annotations
Using the selector, annotations can be woven at program elements. At the moment anno-
tations can be used for superimposition.
M.D.W. van Oudheusden 23
CHAPTER 3
Introduction to the .NET Framework
This chapter gives an introduction to the .NET Framework of Microsoft. First, the architecture
of the .NET Framework is introduced. This section includes terms like the Common Language
Runtime, the .NET Class Library, the Common Language Infrastructure and the Intermediate
Language. These are discussed in more detail in the sections following the architecture.
3.1 Introduction
Microsoft defines [57] .NET as follows; “.NET is the Microsoft Web services strategy to con-
nect information, people, systems, and devices through software.”. There are different .NET
technologies in various Microsoft products providing the capabilities to create solutions using
web services. Web services are small, reusable applications that help computers from many
different operating system platforms work together by exchanging messages. Based on indus-
try standards like XML (Extensible Markup Language), SOAP (Simple Object Access Protocol),
and WSDL (Web Services Description Language) they provide a platform and language inde-
pendent way to communicate.
Microsoft products, such as Windows Server System (providing web services) or Office Sys-
tem (using web services) are some of the .NET technologies. The technology described in this
chapter is the .NET Framework. Together with Visual Studio, an integrated development envi-
ronment, they provide the developer tools to create programs for .NET.
Many companies are largely dependent on the .NET Framework, but need or want to use AOP.
Currently there is no direct support for this in the Framework. The Compose /.NET project
is addressing these needs with its implementation of the Composition Filters approach for the
.NET Framework.
This specific Compose version for .NET has two main goals. First, it combines the .NET
Framework with AOP through Composition Filters. Second, Compose offers superimposition
24
3.2 Architecture of the .NET Framework
in a language independent manner. The .NET Framework supports multiple languages and is,
as such, suitable for this purpose. Composition Filters are an extension of the object-oriented
mechanism as offered by .NET, hence the implementation is not restricted to any specific object-
oriented language.
3.2 Architecture of the .NET Framework
The .NET Framework is Microsoft’s platform for building, deploying, and running Web Ser-
vices and applications. It is designed from scratch and has a consistent API providing support
for component-based programs and Internet programming. This new Application Program-
ming Interface (API) has become an integral component of Windows. The .NET Framework
was designed to fulfill the following objectives [54]:
Consistency
Allow object code to be stored and executed locally, executed locally but Internet-
distributed, or executed remotely and to make the developer experience consistent across
a wide variety of types of applications, such as Windows-based applications and Web-
based applications;
Operability
The ease of operation is enhanced by minimizing versioning conflicts and providing bet-
ter software deployment support;
Security
All the code is executed safely, including code created by an unknown or semi-trusted
third party;
Efficiency
The .NET Framework compiles applications to machine code before running thus elimi-
nating the performance problems of scripted or interpreted environments;
Interoperability
Code based on the .NET Framework can integrate with other code because all communi-
cation is built on industry standards.
The .NET Framework consists of two main components [54]: the Common Language Run-
time (CLR, simply called the .NET Runtime or Runtime for short) and the .NET Framework
Class Library (FCL). The CLR is the foundation of the .NET Framework, executing the code
and providing the core services such as memory management, thread management and ex-
ception handling. The CLR is described in more detail in Section 3.3. The class library, the
other main component of the .NET Framework, is a comprehensive, object-oriented collection
of reusable types that can be used to develop applications ranging from traditional command-
line or graphical user interface (GUI) applications to applications such as Web Forms and XML
Web services. Section 3.5 describes the class libraries in more detail.
The code run by the runtime is in a format called Common Intermediate Language (CIL), fur-
ther explained in Section 3.6. The Common Language Infrastructure (CLI) is an open specifi-
cation that describes the executable code and runtime environment that form the core of the
Microsoft .NET Framework. Section 3.4 tells more about this specification.
Figure 3.1 shows the relationship of the .NET Framework to other applications and to the com-
plete system. The two parts, the class library and the runtime, are managed, i.e., applications
M.D.W. van Oudheusden 25
3.2 Architecture of the .NET Framework
Figure 3.1: Context of the .NET Framework (Modified) [54]
managed during execution. The operating system is in the core, managed and unmanaged
applications operate on the hardware. The runtime can us other object libraries and the class
library, but the other libraries can use the same class library them self.
Besides the Framework, Microsoft also provides a developer tool called the Visual Studio. This
is an IDE with functionality across a wide range of areas allowing developers to build appli-
cations with decreased development time in comparison with developing applications using
command line compilers.
3.2.1 Version 2.0 of .NET
In November 2005, Microsoft released a successor of the .NET Framework. Major changes are
the support for generics, the addition of nullable types, 64 bit support, improvements in the
garbage collector, new security features and more network functionality.
Generics make it possible to declare and define classes, structures, interfaces, methods and del-
egates with unspecified or generic type parameters instead of specific types. When the generic
is used, the actual type is specified. This allows for type-safety at compile-time. Without gener-
ics, the use of casting or boxing and unboxing decreases performance. By using a generic type,
the risks and costs of these operations is reduced.
Nullable types allow a value type to have a normal value or a null value. This null value can
be useful for indicating that a variable has no defined value because the information is not
currently available.
Besides changes in the Framework, there are also improvements in the four main Microsoft
.NET programming languages (C#, VB.NET, J# and C++). The language elements are now
almost equal for all languages. For instance, additions to the Visual Basic language are the
support for unsigned values and new operators and additions to the C# language include the
ability to define anonymous methods thus eliminating the need to create a separate method.
26 Automatic Derivation of Semantic Properties in .NET
3.3 Common Language Runtime
A new Visual Studio 2005 edition was released to support the new Framework and functional-
ities to create various types of applications.
3.3 Common Language Runtime
The Common Language Runtime executes code and provides core services. These core services
are memory management, thread execution, code safety verification and compilation. Apart
from providing services, the CLR also enforces code access security and code robustness. Code
access security is enforced by providing varying degrees of trust to components, based on a
number of factors, e.g., the origin of a component. This way, a managed component might
or might not be able to perform sensitive functions, like file-access or registry-access. By im-
plementing a strict type-and-code-verification infrastructure, called the Common Type System
(CTS), the CLR enforces code robustness. Basically there are two types of code;
Managed
Managed code is code, which has its memory handled and its types validated at execu-
tion by the CLR. It has to conform to the Common Type Specification (CTS Section 3.4). If
interoperability with components written in other languages is required, managed code
has to conform to an even more strict set of specifications, the Common Language Spec-
ification (CLS). The code is run by the CLR and is typically stored in an intermediate
language format. This platform independent intermediate language is officially known
as Common Intermediate Language (CIL Section 3.6) [82].
Unmanaged
Unmanaged code is not managed by the CLR. It is stored in the native machine language
and is not run by the runtime but directly by the processor.
All language compilers (targeting the CLR) generate managed code (CIL) that conforms to the
CTS.
At runtime, the CLR is responsible for generating platform specific code, which can actually
be executed on the target platform. Compiling from CIL to the native machine language of
the platform is executed by the just-in-time (JIT) compiler. Because of this language indepen-
dent layer it allows the development of CLRs for any platform, creating a true interoperability
infrastructure [82]. The .NET Runtime from Microsoft is actually a specific CLR implementa-
tion for the Windows platform. Microsoft has released the .NET Compact Framework especially
for devices such as personal digital assistants (PDAs) and mobile phones. The .NET Com-
pact Framework contains a subset of the normal .NET Framework and allows .NET developer
to write mobile applications. Components can be exchanged and web services can be used
so an easier interoperability between mobile devices and workstations/servers can be imple-
mented [56].
At the time of writing, the .NET Framework is the only advanced Common Language Infras-
tructure (CLI) implementation available. A shared-source1 implementation of the CLI for re-
search and teaching purposes was made available by Microsoft in 2002 under the name Ro-
tor [73]. In 2006 Microsoft released an updated version of Rotor for the .NET platform version
two. Also Ximian is working on an open source implementation of the CLI under the name
1
Only non-commercial purposes are allowed.
M.D.W. van Oudheusden 27
3.4 Common Language Infrastructure
Mono1, targeting both Unix/Linux and Windows platforms. Another, somewhat different ap-
proach, is called Plataforma.NET2 and aims to be a hardware implementation of the CLR, so
that CIL code can be run natively.
3.3.1 Java VM vs .NET CLR
There are many similarities between Java and .NET technology. This is not strange, because
both products serve the same market.
Both Java and .NET are based on a runtime environment and an extensive development frame-
work. These development frameworks provide largely the same functionality for both Java
and .NET. The most obvious difference between them is lack of language independence in
Java. While Java’s strategy is ‘One language for all platforms’ the .NET philosophy is ‘All lan-
guages on one platform’. However these philosophies are not as strict as they seem. As noted
in Section 3.5 there is no technical obstacle for other platforms to implement the .NET Frame-
work. There are compilers for non-Java languages like Jython (Python) [45] and WebADA [1]
available for the JVM. Thus, the JVM in its current state, has difficulties supporting such a vast
array of languages as the CLR. However, the multiple language support in .NET is not optimal
and has been the target of some criticism.
Although the JVM and the CLR provide the same basic features they differ in some ways. While
both CLR and the modern JVM use JIT (Just In Time) compilation the CLR can directly access
native functions. This means that with the JVM an indirect mapping is needed to interface
directly with the operating system.
3.4 Common Language Infrastructure
The entire CLI has been documented, standardized and approved [43] by the European associ-
ation for standardizing information and communication systems, Ecma International3. Benefits
of this CLI for developers and end-users are:
• Most high level programming languages can easily be mapped onto the Common Type
System (CTS);
• The same application will run on different CLI implementations;
• Cross-programming language integration, if the code strictly conforms to the Common
Language Specification (CLS);
• Different CLI implementations can communicate with each other, providing applications
with easy cross-platform communication means.
This interoperability and portability is, for instance, achieved by using a standardized meta
data and intermediate language (CIL) scheme as the storage and distribution format for appli-
cations. In other words, (almost) any programming language can be mapped to CIL, which in
turn can be mapped to any native machine language.
1
http://www.go-mono.com/
2
http://personals.ac.upc.edu/enric/PFC/Plataforma.NET/p.net.html
3
An European industry association founded in 1961 and dedicated to the standardization of Information and
Communication Technology (ICT) Systems. Their website can be found at http://www.ecma-international.
org/.
28 Automatic Derivation of Semantic Properties in .NET
3.5 Framework Class Library
Figure 3.2: Relationships in the CTS
The Common Language Specification is a subset of the Common Type System, and defines the
basic set of language features that all .NET languages should adhere to. In this way, the CLS
helps to enhance and ensure language interoperability by defining a set of features that are
available in a wide variety of languages. The CLS was designed to include all the language
constructs that are commonly needed by developers (e.g., naming conventions, common prim-
itive types), but no more than most languages are able to support [55]. Figure 3.2 shows the
relationships between the CTS, the CLS, and the types available in C++ and C#. In this way the
standardized CLI provides, in theory1, a true cross-language and cross-platform development
and runtime environment.
To attract a large number of developers for the .NET Framework, Microsoft has released CIL
compilers for C++, C#, J#, and VB.NET. In addition, third-party vendors and open-source
projects also released compilers targeting the .NET Framework, such as Delphi.NET, Perl.NET,
IronPython, and Eiffel.NET. These programming languages cover a wide-range of different
programming paradigms, such as classic imperative, object-oriented, scripting, and declara-
tive languages. This wide coverage demonstrates the power of the standardized CLI.
Figure 3.3 shows the relationships between all the main components of the CLI. The top of the
figure shows the different programming languages with compiler support for the CLI. Because
the compiled code is stored and distributed in the Common Intermediate Language format,
the code can run on any CLR. For cross-language usage this code has to comply with the CLS.
Any application can use the class library (the FCL) for common and specialized programming
tasks.
3.5 Framework Class Library
The .NET Framework class library is a comprehensive collection of object-oriented reusable
types for the CLR. This library is the foundation on which all the .NET applications are built.
It is object oriented and provides integration of third-party components with the classes in the
.NET Framework. Developers can use components provided by the .NET Framework, other
1
Unfortunately Microsoft did not submit all the framework classes for approval and at the time of writing only
the .NET Framework implementation is stable.
M.D.W. van Oudheusden 29
3.5 Framework Class Library
Figure 3.3: Main components of the CLI and their relationships. The right hand side of the
figure shows the difference between managed code and unmanaged code.
30 Automatic Derivation of Semantic Properties in .NET
3.6 Common Intermediate Language
Figure 3.4: From source code to machine code
developers and their own components. A wide range of common programming tasks (e.g.,
string management, data collection, reflection, graphics, database connectivity or file access)
can be accomplished easily by using the class library. Also a great number of specialized de-
velopment tasks are extensively supported, like:
• Console applications;
• Windows GUI applications (Windows Forms);
• Web applications (Web Forms);
• XML Web services;
• Windows services.
All the types in this framework are CLS compliant and can therefore be used from any pro-
gramming language whose compiler conforms to the Common Language Specification (CLS).
3.6 Common Intermediate Language
The Common Intermediate Language (CIL) has already been mentioned briefly in the sections
before, but this section will describe the IL in more detail. All the languages targeting the .NET
Framework compile to this CIL (see Figure 3.4).
M.D.W. van Oudheusden 31
3.6 Common Intermediate Language
A .NET compiler generates a managed module which is an executable designed to be run by the
CLR [65]. There are four main elements inside a managed module:
• A Windows Portable Executable (PE) file header;
• A CLR header containing important information about the module, such as the location
of its CIL and metadata;
• Metadata describing everything inside the module and its external dependencies;
• The CIL instructions generated from the source code.
The Portable Executable file header allows the user to start the executable. This small piece of
code will initiate the just-in-time compiler which compiles the CIL instructions to native code
when needed, while using the metadata for extra information about the program. This native
code is machine dependent while the original IL code is still machine independent. This way
the same IL code can be JIT-compiled and executed on any supported architecture. The CLR
cannot use the managed module directly but needs an assembly.
An assembly is the fundamental unit of security, versioning, and deployment in the .NET
Framework and is a collection of one or more files grouped together to form a logical unit [65].
Besides managed modules inside an assembly, it is also possible to include resources like im-
ages or text. A manifest file is contained in the assembly describing not only the name, culture
and version of the assembly but also the references to other files in the assembly and security
requests.
The CIL is an object oriented assembly language with around 100 different instructions called
OpCodes. It is stack-based, meaning objects are placed on an evaluation stack before the ex-
ecution of an operation, and when applicable, the result can be found on the stack after the
operation. For instance, when adding two numbers, first those numbers have to be placed onto
the stack, second the add operation is called and finally the result can be retrieved from the
stack.
1 .assembly AddExample {}
2
3 .method static public void main() il managed
4 {
5 .entrypoint // entry point of the application
6 .maxstack 2
7
8 ldc.i4 3 // Place a 32-bit (i4) 3 onto the stack
9 ldc.i4 7 // Place a 32-bit (i4) 7 onto the stack
10
11 add // Add the two and
12 // leave the sum on the stack
13
14 // Call static System.Console.Writeline function
15 // (function pops integer from the stack)
16 call void [mscorlib]System.Console::WriteLine(int32)
17
18 ret
19 }
Listing 3.1: Adding example in IL code
To illustrate how to create a .NET program in IL code we use the previous example of adding
two numbers and show the result. In Listing 3.1 a new assembly is created with the name
AddExample. In this assembly a function main is declared as the starting point (entrypoint)
32 Automatic Derivation of Semantic Properties in .NET
3.6 Common Intermediate Language
of this assembly. The maxstack command indicates there can be a maximum of two objects
on the stack and this is enough for the example method. Next, the values 3 and 7 are placed
onto the stack. The add operation is called and the results stays on the stack. The method
WriteLine from the .NET Framework Class Library is called. This method resides inside the
Console class placed in the System assembly. It expects one parameter with a int32 as its type
that will be retrieved from the stack. The call operation will transfer the control flow to this
method passing along the parameters as objects on the stack. The WriteLine method does not
return a value. The ret operation returns the control flow from the main method to the calling
method, in this case the runtime. This will exit the program.
To be able to run this example, we need to compile the IL code to bytecode where each OpCode
is represented as one byte. To compile this example, save it as a text file and run the ILASM
compiler with as parameter the filename. This will produce an executable runnable on all the
platforms where the .NET Framework is installed.
This example was written directly in IL code, but we could have used a higher level language
such as C# or VB.NET. For instance, the same example in C# code is shown in Listing 3.2 and
the VB.NET version is listed in Listing 3.3. When this code is compiled to IL, it will look like
the code in Listing 3.1.
1 public static void main()
2 {
3 Console.WriteLine((int) (3 + 7));
4 }
Listing 3.2: Adding example in the C# language
1 Public Shared Sub main()
2 Console.WriteLine(CType((3 + 7), Integer))
3 End Sub
Listing 3.3: Adding example in the VB.NET language
M.D.W. van Oudheusden 33
CHAPTER 4
Motivation
This chapter describes the motivation for designing and implementing a system for the
automatic derivation of semantic properties in .NET languages. The current state of the
Compose /.NET project is explained in the first section. How this system can be extended is
discussed in the second section. The last section mentions the general design goals.
4.1 Current State of Compose /.NET
The Compose /.NET project offers aspect-oriented programming for the Microsoft .NET
Framework through the composition filters model. An introduction to Compose is given
in Chapter 2 and information about the .NET Framework can be found in Chapter 3. Most of
the information discussed below can also be applied to other Aspect Oriented Programming
(AOP) implementations.
4.1.1 Selecting Match Points
With composition filters, the incoming and outgoing messages on an object can be intercepted
through the use of input filters and output filters. A filter has three parts: the filter identifier, the
filter type, and one or more filter elements. The filter element exist out of an optional condition
part, a matching part, and a substitution part. When a filter is evaluated, the matching part is
checked with the current message. A filter matches when both the condition as the matching
provide the boolean value true and at that point the message gets substituted with the values
of the substitution part.
The filters are superimposed on classes with a filter module binding. This binding has a se-
lection of objects on one side and a filter module on the other side. The selection is made
with a selector definition in the form of a selector language, based on the logical programming
34
4.1 Current State of Compose /.NET
language Prolog. By using elements like isClassWithNameInList, isNamespaceWithName or
isMethod the developer can specify the points in the code the filter applies to.
This selection is based on syntactical properties, like naming and coding conventions or struc-
tural properties, such as coding patterns and hierarchical relations. This approach is used in
almost all the current AOSD techniques and has the following problems [78, 60, 3, 13]:
• Coding conventions are not always used or used incorrectly. There are multiple reasons
for this; the complexity and evolution of the application, refactoring of code or the lack of
documentation. The pointcut definitions are fragile, changes in the code can easily break
pointcut semantics, a problem which is hard to detect [48];
• Method names are sometimes extended to be used as an identifier for join points. To
provide a correct naming convention, they can become to long and this leads to name
explosion;
• Using specific naming conventions for the use of identifying join points violates the in-
tention revealing naming rule in which methods names should represent the intention of
the method code.
The result of these problems is that it is by no means certain that the selected join point is the
one intended to be found. It is also possible that some join points are not selected, while they
should be selected;
To illustrate the use of naming conventions, consider the following example. In most pro-
gramming languages there are Get and Set methods allowing another object to read from and
write to a private variable or field. Examples of such a getter and setter method are given in
Listing 4.1.
1 private string _stringValue;
2
3 public string GetStringValue
4 {
5 return _stringValue;
6 }
7
8 public void SetStringValue(string value)
9 {
10 _stringValue = value;
11 }
Listing 4.1: Getter and Setter examples in C# .NET
To select all the methods setting a value, as in assigning a value to a variable, we can use a Set*
pointcut. This will select all the methods beginning with the word Set. However, this will also
match any methods called, for instance, Setup, Settings or Settle. In addition, methods
actually assigning a value, but not having a name starting with the word Set are not selected.
On the other hand, we might find Set methods with an implementation part doing something
completely different than actually setting a value.
In this case, the selection is performed on the syntactical level instead of the semantical level.
There is no knowledge about the actual implementation and the assumed purpose of the
method is retrieved by using (parts of) the signature, a unique identifier of the method. Us-
ing coding and naming conventions to match points in the code does not give the best results.
There are some possible solutions to this problem [35];
M.D.W. van Oudheusden 35
4.1 Current State of Compose /.NET
• Refactor the original code so coding and naming conventions can be used to define as-
pects more correctly. However, refactoring for the sake of aspects is a bad idea and should
actually only be performed to increase the quality of the code. Furthermore, the original
source code should be, to a degree, a black box to the aspect designer and refactoring
violates the goal of separation of concerns and AOP;
• Use a list to enumerate all the join points by name. This requires knowledge about the
source program and can lead to long enumerations in large software systems. This tech-
nique is also not robust to changes in the original code;
• Pattern matching, as used in Compose , provides more possibilities. Using wild cards
and structural conditions (like is in class or has interface) the selector part is more robust.
There is still a great dependency on naming conventions as shown in the previous exam-
ple;
• By annotating methods with special tags, a developer can provide more information
about the intended behavior of the method. Naming conventions do not have to be used,
but the major drawback is the necessity to place annotations in the source code.
The main problem is the use of structure based properties and syntactic conventions [3]. The
selection of join points should be applied based on the behavior of a method and not on the
name of the method.
4.1.2 Program Analysis
Compose has some basic information about the source of the program. Information about
the types, their relations to other types and properties of these types are collected. This is all
syntactical information, i.e. it describes the structure of the program.
There is almost no information about the behavior of the program. Two methods can use the
same resources without any problems, but might give resource conflicts when an aspect is used.
If a condition is used by a concern, can this condition be changed by other pieces of code? Are
there any side effects while using an aspect on a method [52]?
There is a partial solution to these questions in the form of SECRET (see [24] and [72] for more
information). This module in Compose reasons about possible semantic conflicts due to the
composition of filter modules and it analyzes the resource usage of the filters. One type of filter,
the meta filter, passes a ReifiedMessage object to the selected method as a parameter. The object
that receives the meta message can observe and manipulate the message, then re-activate its
execution. This can lead to unexpected behavior of the ReifiedMessage and as a result to possible
conflicts between filters. A developer can annotate methods, using this ReifiedMessage, with a
textual description of the intended behavior of the message. The SECRET module uses this
information to detect possible conflicts between aspects. A major requirement is the need for
the developer to specify the behavior beforehand inside the code.
If there is semantical information about the code in the Compose repository, besides the cur-
rently available syntactical data, then more extensive code analysis can be performed to detect
conflicts, side effects, and so on.
36 Automatic Derivation of Semantic Properties in .NET
4.2 Providing Semantical Information
4.1.3 Fine Grained Join Points
The current Composition Filter model in Compose works at the level of message call events. It
could be interesting to expand this model to a statement and expression level model [77]. This
way it is possible to select points inside the code as matching points for the filters. Possible
applications of this technique are code coverage analysis 1 [66] or code optimization [36].
Currently Compose does not support this type of fine grained join points, but operates on
the level of object interfaces. There is however work in progress to support this at a certain
level [16]. An issue here is the need for (semantical) information about the code itself, the
instructions or statements. Compose does not have the necessary information about the mes-
sage body implementation.
4.2 Providing Semantical Information
The three main issues described in the previous section (match point selection, program analy-
sis and fine grained join points) all suffer from the same problem; there is almost no semantical
information available. The behavior of the source code is not known. With more informa-
tion about the meaning of the code it is possible to solve some of the shortcomings mentioned
before.
One of the solutions used by Compose is the use of annotations to describe the semantics of a
function. There are three major problems with this approach;
• The developer must specify the semantics manually for each function. A time consuming
process and easely skipped because it is not enforced.
• The current annotation representation is not powerful. For instance, it lacks the ability to
provide control flow information of the instructions inside the function.
• It is possible the annotations are not consistent with the actual implementation due to an
incorrect description of the semantics or by changes to the code.
This assignment is called the automatic derivation of semantic properties in .NET and is an attempt
to extract semantical information from a .NET assembly using existing tools. Thus providing a
way to detect possible conflicts and give more information to Compose .
Semantical information can not only be used for Compose but also by other applications
wanting to do source code analysis. For example, finding design patterns in the source code,
reverse engineer design documentation [69], generating pre- and postconditions [53], verifying
software contracts [9], checking behavioral subtyping [30], or any other type of statical analy-
sis.
4.3 General Design Goals
As stated in the previous section, the goal is to design and implement a system to automatically
derive semantic properties from .NET assemblies. To get an idea what type of behavior we are
1
Code coverage analysis is a technique to determine whether a set of test cases satisfy an adequacy criterion.
M.D.W. van Oudheusden 37
4.3 General Design Goals
interested in we first have to look at the different type of semantics that can occur in typical
programming languages (see Chapter 5).
The context of this assignment is the Compose /NET platform, so the sources to analyze are in
the .NET format called Intermediate Language. In Chapter 6 a detailed description is given of
the IL and how the semantical elements, described in Chapter 5, are represented in this format.
Although the assignment is for the .NET platform it would be an advantage if the semanti-
cal representation is language independent so it can also be used with other object-oriented
languages like Java or C++. One requirement however is the need to conserve the type infor-
mation. This will discussed in the design chapter (Chapter 7).
After a code analysis, the semantical information should be stored in some sort of model. This
model must contain enough information to reason about the behavior of the original program.
Not only what it is suppose to do, but also when it performs certain actions. For this, flow
information, like control flow, is important to save and this will threated in Chapter 5.
Storing the data in a metamodel is not sufficient. There must be a system to query this model
for the required information. The possible options and the implementation can be found in
Section 7.4. Examples of the use of this search mechanism are mentioned in Chapter 8.
38 Automatic Derivation of Semantic Properties in .NET
CHAPTER 5
Semantics
Before a model can be created to store semantical information about a program we must first
understand what semantic is and what type of behavior can be found in the sources of a pro-
gram. The first section provides a definition of semantic. The next sections describes how
semantic is represented in source code.
5.1 What is Semantics
Semantics is the study of meaning [15]. It comes from the Greek σηµαντικoς, semantikos, which
stands for significant meaning where sema is sign. Semantics is the branch of semiotics, the study
of signs and meaning [20]. The other two branches of semiotics are syntactics (the arrangement
of signs) and pragmatics (the relationship between the speaker and the signs). In discussing
natural and computer languages, the distinction is sometimes made between syntax (for ex-
ample, the order of computer commands) and semantics (which functions are requested in the
command).
Syntax, coming from the Greek words συν, together, and ταξις, sequence or order, is the study of
the formation of sentences and the relationship of their component parts. Looking at computer
languages, syntax is the grammar of the language statements where semantics is the meaning
of these statements. The statements are the so called signs used in semiotics.
Keep in mind that there is a distinct difference between syntax and semantics. Syntax is the
relation between signs, and semantics is the meaning of these signs. For computer languages
the syntax is very clearly defined as a grammar the developer has to use. This grammar is, for
instance, used to create an abstract syntax tree (AST), in which each node of the tree represents
a syntactic construct. The compiler uses this AST to create the actual program. Each element
of the grammar has a certain semantic and the composition of those elements in a certain order
forms the semantics of a program.
39
5.2 Semantics of Software
5.2 Semantics of Software
Understanding a software product is difficult. Usually the behavior is described in the docu-
mentation of the product but is often not complete or up-to-date [67]. There are tools which
aid in program understanding, like debuggers, metric calculators, program visualization and
animation techniques. Basically these tools reverse engineer the software and represent this at
a higher level of abstraction than that of the information which is directly extracted from the
code [68]. They differ in the way how the data is retrieved, a higher level model is created and
the information is presented.
5.2.1 Static and Dynamic Program Analysis
The collection of relevant data for an analyzer is performed by either static or dynamic analysis.
It is possible to combine those two techniques to get more precise data [19].
In static program analysis, only the code of the software is considered without actually execut-
ing the program built from this software. Usually the analysis is performed on the source code
of the software, but other representations like object code or byte code can also be used. This
type of analysis is suited to retrieve structural information, like the elements in the source such
as the classes, methods, and so forth.
Analyzing classes is hard, due to polymorphism and dynamic binding. Also information
about the references to objects (pointers) are difficult to catch with static analysis. Conditional
branches and iterations are only known at runtime, so the exact control flow information, the
sequence of executed commands, is difficult to extract when the software is not executed. Most
of the time there are multiple possible executing paths. Static analysis is limited due to the
amount of data available about the structure of the source and the parse tree of the statements
inside the methods of a class.
Dynamic program analysis uses program execution traces to gather information about the soft-
ware. This means the program is analyzed while it is running, usually by attaching a separate
program to it, called a profiler. The runtime behavior retrieved, provides information about the
actual usage of the program. The intended methods of classes are called and the types of the
objects are known. It also provides timing and object instance information, and the real values
of operands, which can not be retrieved using static program analysis.
However, the gathered information is the result of certain user input and can change during
a different program execution. Not all the paths in the control flow could be executed, thus
information can be missing.
Static and dynamic analysis can be combined so all the possible paths are known beforehand
using static analysis and the runtime behavior can be analyzed using dynamic analysis [27].
For instance, static analysis is used to retrieve all possible control flow paths in the software
that can be used to generate different input for the software at runtime. Dynamic analysis is
then used to retrieve the information during runtime using the path information found by the
static analysis.
40 Automatic Derivation of Semantic Properties in .NET
5.3 Semantical Statements
5.2.2 Software Models
We can distinguish two models of software, namely structural and behavioral [32]. In the
structural model the organization of the software is described, like inheritance and the rela-
tions between components. The behavioral model describes how the elements in the software
operate to carry out functions. Usually static analysis is used to get the structural model while
dynamic analysis collects the behavioral model during an execution run.
The system designed for this assignment tries to create a software model consisting of both
structural and behavioral information using static analysis. The reason for using static instead
of dynamic analysis is explained in the design chapter (see Chapter 7). Extracting a structural
model is easier then extracting a behavioral model. We have to give meaning, semantics, to
parts of the code without using dynamic analysis. This means we have to look at the statement
level, in other words, the instructions inside the methods, to extract their meaning. Combining
the behavior of the individual statements and the control flow could tell us more about the
meaning of the complete method.
5.3 Semantical Statements
To extract the behavior of a program we can start bottom-up, which means we have to look at
the finest and lowest part of the program, the statements, before working our way to the whole
program [49]. Statements are the instructions executed inside a method. There are various
types of statements. Some return a value, we then call those statements expressions. Statements
can operate on zero or more operands. An operand can, for instance, be a value, a memory
address, a function name or a constant literal. A statement operating on only one operand is
called unary. If the statement works on two operands, it is called binary.
The next sections show the major generic kinds of statements.
5.3.1 Value Assignment
A commonly used statement is the assignment of values. This can be a unary or a binary
statement. With an assignment statement a value is assigned to a variable, a symbol denoting
a quantity or symbolic representation. A value can be a single value, but also the result of
an operation such as the binary operations adding, multiplying, subtracting, dividing, and so
on. A unary assignment statement is the negative operation. In Listing 5.1 some examples are
shown of assignments in the C# language.
1 a = 4; // Assign the value 4 to the variable a
2 b = a + 5; // Add 5 to the value of a and assign to b
3 s = "Hello World"; // Place the text in string s
Listing 5.1: Assignment examples in C#
Semantically, the assignment statement changes the state of variables. If there is an expression,
like adding, the result of this expression is evaluated and assigned to the variable. A prior
value, stored in the variable, is replaced by the new value.
M.D.W. van Oudheusden 41
5.3 Semantical Statements
5.3.2 Comparison of Values
Comparing two values is an operation which is frequently used for controlling the flow of the
program. A simple example is conditional branching like the If...Then...Else construction, but
also loops (do...while, do...until, while...loop constructions), for...next loops and switch opera-
tions. Basically a condition is checked determining if the loop must be exited or continued. An
example of some comparison statements can be seen in Listing 5.2.
1 if (x > 4)
2 {
3 // perform action when x has a value greater then 4
4
5 while (x < 10)
6 {
7 x = x + 1;
8 }
9 }
10 else
11 {
12 // else, perform another action
13
14 for (int i; i < 10; i++)
15 {
16 x = x + i;
17 }
18 }
Listing 5.2: Comparison examples in C#
A comparison always works on two values and the result is stored in a destination operand.
This result is either true or false. There are different kinds of comparison as shown in Table 5.1.
Description Sign Abbr.
A is equal to B A = B EQ
A is not equal to B A = B NE
A is less then B A < B LT
A is less or equal to B A ≤ B LTE
A is greater then B A > B GT
A is greater or equal to B A ≥ B GTE
Table 5.1: Comparison operators
Together with branching, discussed in the next section, the comparison statements are an im-
portant element for the control flow. Based on the relation of one value to another value certain
actions are performed.
5.3.3 Branching Statements
Branching statements change the flow of control to another point in the code. This point is
identified with a label, like a line number or a legal identifier. We can distinguish two types of
branching;
42 Automatic Derivation of Semantic Properties in .NET
5.3 Semantical Statements
Conditional A conditional branch occurs only if a certain condition expression evaluates to
true.
Unconditional With unconditional branching, the control flow is directly transfered to a new
location without any conditions.
In Listing 5.2, two branches are visible for the first condition (x > 4). If the value of x is greater
then 4, it moves the control flow to the statements directly after the if statement. If x is equal
or less then 4, the control flow is moved to the statements after the else statement.
Typical unconditional branching commands are the continue or break statements which explic-
itly jump to a location in the code without checking for a condition. These statements can, of
course, exist inside another condition and are as such conditional.
Branching is an important semantic because the flow of the program is controlled with these
statements. Together with the conditions, branching makes it possible to use iteration state-
ments, such as while, for, or foreach loops.
5.3.4 Method Calling
Because branching only moves the control flow inside a method, we need a special statement
to indicate the move of the control flow to another method. When this method has finished
processing its statements, the flow will be returned to the calling method. In most program-
ming languages it is possible to specify input for the new method in the form of parameters
and the called method can also return a value.
Inside the method, a special statement is available to return to the calling method. Usually this
is the return statement and if the method should return a value, this value can be returned
directly.
5.3.5 Exception Handling
It can be necessary to apply exception handling to certain statement. When an exception is
thrown by one of those guarded instructions, a special block of code can handle this exception.
1 try
2 {
3 x = 4 / y;
4 }
5 catch (Exception ex)
6 {
7
8 }
Listing 5.3: Exception handling example in C#
In Listing 5.3 the combined assignment and division statement are placed inside a guarded
block. if, for instance, y is zero, a division by zero exception will occur. This exception will
be handled by the statements inside the catch block. If there is no exception handling, the
exceptions will be thrown upwards to the calling method until eventually the runtime itself.
We are interested in this information because it provides insight in the capability of the code to
handle exceptions. A division by zero and a not initialized value of x or y are the only exceptions
M.D.W. van Oudheusden 43
5.3 Semantical Statements
which can occur in example 5.3. However, all the exceptions are catched using the general
Exception class. By using the information about the possible exceptions, we can chose to catch
more specific types of exceptions instead of all the exceptions.
5.3.6 Instantiation
The instantiation of a new object or variable can be important to detect. If an object is not
instantiated its internal functionality can not be accessed. Not only an object can be created,
also an array or a value type. A type indicates a set of values that have the same sort of generic
meaning or intended purpose. In object-oriented languages, an object is an individual unit
which is used as the basic building block of programs. An object is created from a class and is
called an instance of that class.
Most object-oriented languages divide types into reference types and value types. Value types
are stored on the stack. The variable contains the data itself. Reference types consist of two
parts. A reference or handle is stored on the stack. The object itself is stored on the heap, also
called the managed heap in .NET languages [44].
Value types tend to be simply, like a character or a number, while reference types are more
complex. Every object is a reference type and must be explicitly created by the developer
using a special new statement. Value types can be accessed directly and do not need to be
created, however they need to be initialized to a default value, e.g., 0. Usually this is handled
automatically by the runtime.
The creation of a new object or the (re)initialization of variables is also important to detect. If
an object is not instantiated it can not be used and could generate errors are runtime. Knowing
when a value has a certain value, even if this is the default value, can be interpreted as an
assignment with a default value.
5.3.7 Type Checking
Although types are directly related to a certain language, it is still important semantic infor-
mation. Adding a string to an integer is probably syntactically correct, but semantically it is
incorrect. We need to know what type of data we are talking about. A value being a string has
as such a different meaning than when the value has a numeric type.
Type checking is the process of verifying and enforcing the constraints of types. This can occur
at compile time, called static checking, or at runtime, called dynamic checking. Usually both
techniques are used. When a compiler performs static checking of the types in the source code,
it is performing a semantical analysis. Semantical information is added to the parse tree and
used to check for inconsistencies [38].
For the purpose of this assignment, we can distinguish two different kinds of semantical type
information.
Compile time
At compile time, the types of all the variables must be known. When the analyzer de-
signed for this assignment is run on the code, it knows the types of all the elements.
44 Automatic Derivation of Semantic Properties in .NET
5.4 Program Semantics
Runtime
During runtime, the type information of a variable can change. This is called type casting
and we need to know the new type the variable will become.
5.3.8 Data Conversion
As explained in the previous section, we need to store type information. However at runtime
the type of a variable can change because of casting or (un)boxing. Boxing is a mechanism for
converting value types to reference types. The value type is placed inside a box so it can be
used when an object reference is needed [44].
Data conversions change the type and thus the meaning of a variable. As such, it is interesting
semantical information we need to be able to reason about the contents of variables.
5.4 Program Semantics
In the previous section the lowest level of the code, the statements, were described. Multiple
statements are grouped together inside a method to perform a specific action. Sometimes a
method has a set of input1 parameters to parameterize those actions, and possibly an output
value (called return value) of some type. In object-oriented languages, a method resides in a
class and is providing a service for this particular object. A method is used to perform some
task of the object. For example; a class called Car can provide the services Accelerate and Brake
to control the inner state of the Car object.
Not only the class and methods tell something about the semantics of a program, also the rela-
tions between the different classes provide added information about the behavior [69]. Com-
ponents have to work together to execute the tasks of the complete program. Detecting and
recognizing these interactions in the source code using static analysis is not an easy task. Be-
cause of polymorphism, allowing a single definition to be used with different types of data, it is
difficult to determine which method is actually executed at runtime. Inheritance, the ability to
create new classes using existing classes, introduces the problem that the behavior of a subclass
is not solely defined in the class itself, but spread over multiple classes.
Semantically analyzing the source code of a program using static analysis techniques is thus a
difficult process [81]. Gamma [31] says the following about the relation between run-time and
compile-time:
“An object-oriented program’s run-time structure often bears little resemblance
to its code structure. The code structure is frozen at compile-time; it consists of
classes in fixed inheritance relationships. A programs’ run-time structure consists
of rapidly changing networks of communicating objects. In fact, the two structures
are largely independent.”
1
Some languages also allow output parameters, where the storage location of the variable specified on the
invocation is used.
M.D.W. van Oudheusden 45
5.4 Program Semantics
5.4.1 Slicing
If we cannot give a definite description of the semantics of the whole program, we can at least
try to describe the behavior of the individual methods. A useful technique is called slicing, in-
troduced by Weiser [84]. Slicing is used to highlight statements that are relevant to a particular
computation and are as such semantically related [76]. Again, we can make a distinction be-
tween static and dynamic slicing. With static slicing no assumptions are made, while dynamic
slicing depends on specific test data.
Slicing depends on the control and data flow of the statements inside a method. Control flow is
the order in which the individual statements are executed. Data flow follows the trail of a data
item, such as a variable, as it is created or handled by a program [29]. With slicing we can find
out which statements contain variables that can be affected by another variable. This is called
a backward static slice as it is computed by a backwards traversal of the statements beginning
by the variable we are interested in.
5.4.2 Method Example
If we want to create a method with a specific purpose, we usually specify this method first
in some sort of formal requirements specification, a high level view of the functionality. For
instance, we need a method called AssignWhenMoreThenOne with the following requirement:
“Method AssignWhenMoreTheOne must assign a value of 1 to the global variable moreThenOne
if its first parameter is greater then or equal to 2.”.
This single sentence describing the method AssignWhenMoreThenOne can be split into multiple
semantical elements:
• reading the value of the first parameter;
• reading of a constant value of 2;
• comparison of two values;
• the use of the greater then or equal to operator;
• branching based on the comparison;
• assignment of a value 1 to global variable moreThenOne if the condition holds.
We can implement this method in various ways and in different programming languages as
shown in Listing 5.4, Listing 5.5 and Listing 5.6.
1 public int moreThenOne;
2
3 public void AssignWhenMoreThenOne(int stockAmount)
4 {
5 if (stockAmount >= 2)
6 moreThenOne = 1;
7 }
Listing 5.4: Method AssignWhenMoreThenOne in C# .NET
1 Public moreThenOne As Integer
2
3 Public Sub AssignWhenMoreThenOne(ByVal stockAmount As Integer)
4 If stockAmount >= 2 Then
5 moreThenOne = 1
6 End If
46 Automatic Derivation of Semantic Properties in .NET
5.4 Program Semantics
7 End Sub
Listing 5.5: Method AssignWhenMoreThenOne in VB .NET
1 procedure Module1.AssignWhenMoreThenOne(stockAmount: Integer);
2 begin
3 if (stockAmount >= 2) then
4 Module1.moreThenOne := 1
5 end;
Listing 5.6: Method AssignWhenMoreThenOne in Borland Delphi
While the previous examples differ in syntax they have the same semantics. The C# and
VB .NET examples both compile to the Common Intermediate Language as shown in List-
ing 5.7.
1 .method public static void AssignWhenMoreThenOne(int32 stockAmount) cil managed
2 {
3 // Code Size: 21 byte(s)
4 .maxstack 2
5 .locals init (bool flag1)
6
7 L_0000: nop
8 L_0001: ldarg.0
9 L_0002: ldc.i4.2
10 L_0003: clt
11 L_0005: ldc.i4.0
12 L_0006: ceq
13 L_0008: stloc.0
14 L_0009: ldloc.0
15 L_000a: brfalse.s L_0012
16 L_000c: ldc.i4.1
17 L_000d: stsfld int32 ConsoleApplication2.Module1::moreThenOne
18 L_0012: nop
19 L_0013: nop
20 L_0014: ret
21 }
Listing 5.7: Method AssignWhenMoreThenOne in Common Intermediate Language
From this IL code it is still possible to determine the semantics we mentioned before. The
ldarg.0 loads the first parameter on the stack; the ldc.i4.2 OpCode puts the value of 2 on
the stack. Both values are used by the compare less then (clt) assignment where the result is
also put on the stack. A zero value is loaded (ldc.i4.0) and a compare for equal is performed
(ceq) where the result is stored (stloc.0) in a variable. Based on this value, which is loaded
back onto the stack (ldloc.0) a branch action is performed (brfalse.s) to label L 0012 when
the value is false. If the value of the variable on the stack was true, the constant value of 1 is
placed on the stack (ldc.i4.0) and stored in the variable moreThenOne (stsfld).
Although the C#, VB and Delphi code samples where practically the same, the IL code is some-
what different. The compiler introduced two comparisons and a new local variable to hold
the result of one of these comparisons. Also the branching is reversed; the check is now for a
negative instead of a positive value. Furthermore the IL is a stack based language. Still this
piece of code behaves as indicated by the definition of method AssignWhenMoreThenOne.
With the information described in this chapter, we know what type of semantical constructions
we are interested in. Not only the constructions are important semantical information, also
M.D.W. van Oudheusden 47
5.4 Program Semantics
the control flow and the operands play an important part in the behavior of a function. The
next chapter discusses how these constructions and the related data are represented in the
Intermediate Language and how this information can be extracted from the code.
48 Automatic Derivation of Semantic Properties in .NET
CHAPTER 6
Analyzing the Intermediate Language
In Chapter 3 the .NET Framework is introduced and a brief introduction to the Common In-
termediate Language (IL) is presented. This assignment uses the Compose /.NET project and
hence the .NET languages, so we need access to the languages represented in the Intermediate
Language. This chapter provides more details of the IL and explains how to access this IL.
6.1 Inside the IL
An intermediate language is a CPU-independent instruction set, which resides between the
source code and the machine code. This has several advantages;
• The code can be optimized just before it is executed;
• Allows for a platform independent layer before generating a platform dependent version.
Optimization can occur per platform;
• Interoperability of other languages compiling to the same IL. Functionality in the IL can
be shared;
• Multiple different kinds of higher level languages can compile to this intermediate lan-
guage, so a large number of languages can be supported.
Two major Intermediate Languages are Java byte code and the .NET IL. In this section we will
only discuss the .NET IL.
6.1.1 Modules
A .NET application consists of one or more managed executables, each of which carries meta-
data and (optionally) managed code [51]. Managed executables are called modules and they
basically contain two major components; metadata and IL code. Modules are being used by
two components in the CLR (see Section 3.3); a loader and the just-in-time (JIT) compiler.
49
6.1 Inside the IL
The loader is responsible for reading the metadata and creating an internal representation and
layout of the classes and their members. Only when a class is needed, it is loaded. When
loading a class, the loader runs a series of consistency checks on the related metadata.
The JIT compiler compiles the methods encoded in IL into the native code of the underlying
platform. The runtime does not execute the IL code, but the IL code is compiled in memory into
the native code and this native code is executed. Only when a method is called, it is compiled.
It is possible to precompile modules to native code for faster execution. The original file must
still be present since it contains the metadata.
When the IL code is compiled, it is also optimized by the JIT compiler. This means that the
original IL code is almost not optimized because the target architecture is only know at run-
time. The JIT compiler performs optimization algorithms like method inlining, constant fold-
ing, dead code elimination, loop unrolling, constant and copy propagation, and so one.
The file format of a managed module is based on the standard Microsoft Windows Portable
Executable and Common Object File Format (PE/COFF). As such, a managed module is exe-
cutable by the operating system. Figure 6.1 shows the structure of a managed module. When
the module is invoked by the operating system, the .NET runtime can seize control over the
execution.
Figure 6.1: Structure of a managed executable module
50 Automatic Derivation of Semantic Properties in .NET
6.1 Inside the IL
6.1.2 Metadata
Metadata is data that describes data. In the context of the common language runtime, metadata
means a system of descriptors of all items that are declared or referenced in a module [51].
In a module there is a collection of tables containing different kinds of metadata. One table,
the TypeDef table, lists all the types defined in the module. Another table lists the methods
implemented by those types, another lists the fields, another lists the properties, and so on [65].
Additional tables hold collections of external type references (types and type members in other
modules that are used by this module), the assemblies containing the external types, and so on.
Metadata also describes the structural layout of all the types in the module. Besides the tables
to store the metadata, there are also heaps in which a sequence of items is stored. For instance,
a list of strings, binary objects, and so on. The runtime loader analyzes the consistency and
protects the metadata headers and the metadata itself, making sure it can not be changed and
pose security risks.
6.1.3 Assemblies
The CLR can not use managed modules directly, but requires an assembly. An assembly is a
deployment unit and contains metadata, (optionally) managed code and sometimes resources.
The metadata is a system of descriptors of all the structural items of the application. It describes
the classes, their members and attributes, relations, etcetera. Part of the metadata is the man-
ifest. It provides information about the assembly itself like the name, version, culture, public
key and so on. Furthermore it describes the relations to other files, like resources and other
assemblies, and contains security demands. An example of an assembly is shown in Figure 6.2.
Figure 6.2: Assembly containing multiple files
An assembly contains one or more namespaces. A namespace is a collection of types that
are semantically related. Apart from some syntax restrictions, developers can define their own
M.D.W. van Oudheusden 51
6.1 Inside the IL
namespace. The main purpose is to allow (meta) items to be unambiguously identifiable. How-
ever, namespaces are not metadata items and hence are not stored in a metadata table.
The .NET Framework uses an object-oriented programming model in which types are an im-
portant concept. The type of an item, such as a variable, constant, parameter, and so on, de-
fines both data representation and the behavioral features of the item. The Common Language
Infrastructure standard (see Section 3.4) defines two kinds of types, namely value types and
reference types as further explained in Section 5.3.6. The Framework supports only a single
type inheritance and thus creats a hierarchical object model. At the top is the System.Object
type, from which all the types are derived.
In .NET the types can be divided in five categories; class, interface, structure, delegate and enu-
merations. Types, fields and methods are the three important elements in managed program-
ming [51]. The other elements are using metadata to provide additional information about
these three.
6.1.4 Classes
A class defines the operations an object can perform (methods, events, properties) and defines
values that hold the state of the object (fields). A class also contains specific class metadata
which can be divided into two concepts; type reference (TypeRef) and type definition (TypeDef).
TypeDefs describe the types defined in the class whereas TypeRefs describe references to types
that are declared somewhere else. In its simplest form we only get the TypeDef metadata table
which contains flags about the visibility (like a public or private class) and a class reference
to another type. There is more information if, for instance the class, implements other classes,
uses custom attributes, is an enumeration or a value, et cetera.
A class contains items like methods, properties, events or fields and these items are character-
ized by signatures. A signature is a binary object containing one or more encoded types and
resides in the metadata [51]. The first byte of a signature defines the calling convention and, in
turn, identifies the type of the signature. Possible calling conventions are field, property, local
variable, instance method or a method call identifiers.
At line 3 in Listing 6.1 the syntax of a class definition is shown. The dotted_name defines
the namespace this class belongs to. simple_name is the name of the class (the TypeDef) and
class_ref is the name of the class it extends from. There are numerous flags to specify
specific options for the type definition of the class. For instance, if the type is visible outside of
the assembly or if it is an interface.
1 .namespace <dotted_name>
2 {
3 .class <flags> <simple_name> extends <class_ref>
4 {
5
6 }
7 }
Listing 6.1: Syntax of a class definition
52 Automatic Derivation of Semantic Properties in .NET
6.1 Inside the IL
6.1.5 Fields
Fields are, together with local variables inside a method, data locations. Information about
the fields is stored in a fields metadata table. Additional tables specify data like the layout,
mapping and constant values of the fields. The syntax for a field is listed in Listing 6.2.
1 .field <flags> <type> <name>
Listing 6.2: Syntax of a field definition
The owner of a field is the class or value type in the lexical scope of which the field is defined.
The flags are used to specify extra options, the type defines the type of the field and finally
the name indicates the name of the field. An example is given in Listing 6.3.
1 .field public string s
2 .field private int32 i
Listing 6.3: Example of a field definition
There are two types of fields;
Instance fields These are created every time a type instance is created and they belong to this
type instance.
Static fields Fields which are shared by all instances of the type and are created when the type
is loaded.
The field signature does not contain an option to specify whether the field is static or instance.
However, the compiler keeps separate tables for the two kinds and can classify the desired
field. To load or store a field, there are two sets of instructions in the IL; one for static and one
for instance fields.
Fields can have default values and these are stored in the constant metadata table. Besides the
default values for fields, this table can also contain default values for parameters of methods
and properties. The syntax is listed in Listing 6.4 and an example is given in Listing 6.5. If the
const_type is a null reference, a value is not mandatory.
1 .field <flags> <type> <name> = <const_type> [( <value> )]
Listing 6.4: Syntax of a field definition with default value
1 .field private int32 i = int32(1234)
Listing 6.5: Example of a field definition with default value
A field declared outside a class is called a global field and belongs to the module in which it is
declared. This module is represented by a special TypeDef record under the name <Module>.
A global field is by definition static since only one instance of the module exists and no other
instance can be created.
6.1.6 Methods
A method has a couple of related metadata tables like the definition and reference information,
implementation, security, semantics, and interoperability. The syntax for a method is listed
in Listing 6.6.
M.D.W. van Oudheusden 53
6.1 Inside the IL
1 .method <flags> <call_conv> <ret_type> <name>(<arg_list>) <impl> {
2 <method_body>
3 }
Listing 6.6: Syntax of a method definition
The flags define the options for the method such as the accessibility (private, public and so
forth). The call_conv, ret_type, and arg_list are the method calling convention, the return
type, and the argument list defining the method signature. The impl specifies additional im-
plementation flags of the method. For example like whether the method is managed, in CIL or
native format, if the method must be executed in single-threaded mode only, and so on.
The name of the method is a name or one of the two keywords .ctor or .cctor. The instance
constructor method (.ctor) is executed when a new instance of the type is created, the class
constructor (.cctor) is executed after the type is loaded and before any one of the type members
is accessed. The global .cctor can be used to initialize global fields.
There are different kinds of methods as depicted in Figure 6.3. Static methods are shared by all
instances of a type and do not require an instance pointer (referred to as this). They also cannot
access instance members unless the instance pointer is provided explicitly. An instance method
is instance-specific and has as its first argument the this instance pointer.
Figure 6.3: Different kinds of methods
Virtual methods can be overridden in derived classes, whereas non-virtual methods can still be
overridden but have nothing to do with the method declared in the base class. This method is
hidden, but can be called when the class name is specified.
6.1.7 Method Body
The method body itself holds three parts, namely a header, IL code and an optional structured
exception handling (SEH) table, see Figure 6.4. Currently, there are two types of headers: a fat
and a tiny version indicated by the first two bits in the header. A tiny header is created by the
54 Automatic Derivation of Semantic Properties in .NET
6.1 Inside the IL
compiler when the method does not use SEH nor local variables, has a default stack space of
eight and its size is less then 64 bytes.
Figure 6.4: Method body structure
Local variables are declared and have their scope inside the method. Local variables only have
names if the source code is compiled in debug mode. They are referenced in IL code by their
zero based ordinals1. Unlike fields and method names, the names of local variables are not part
of the metadata. Their names are stored inside a (separate) debug file, the program database
(PDB file).
If the keyword init has been added to the local’s declaration, the JIT compiler must initialize
the local variables before the execution of the method. This means that for all the value types
the constructor is called and all variables of object reference types are set to null. If the init
flag has not been set, the code is regarded as unverifiable and can only run from a local drive
with verification disabled.
It should be noted that methods, just like global fields, can reside outside any class scope. These
so called global methods are static and the same accessibility flags as for a global field apply.
6.1.8 IL Instructions
Inside a method we find the method body which contains a header, IL code, and an optionally
structured exception handling (SEH) table. IL code contains IL instructions, which are made
up of an operation code (OpCode) and are sometimes followed by an instruction parameter.
A list of all the available operational codes can be found in Appendix A. There are long and
short parameter instructions. The long form requires a four byte integer, the short form only
one byte. This can reduce the amount of space in the assembly, but the short form can only be
used when the value of the parameter is in the range of 0 to 255 (for unsigned parameters).
1
numbers used to denote the position in an ordered sequence.
M.D.W. van Oudheusden 55
6.1 Inside the IL
IL is a stack based language, meaning that operands must be pushed onto the stack before an
operation can use them. An operator grabs the values from the stack, performs the operation
and (optionally) places the result back onto the stack. More generally, instructions take all
required arguments from the stack and put the results onto the stack. If, for instance, a local
variable is required by an instruction, then another instruction has to load this variable onto
the stack. An exception to this rule are the instructions of the load and store group which are
responsible for the pushing and popping of values onto and of the stack.
Elements on the evaluation stack are called slots and can be one of the types listed in Ap-
pendix B. Besides the evaluation stack, a method also contains an argument table and local
variable table, both having slots with static types.
We can distinguish different kinds of instructions which are listed in the sections below.
6.1.8.1 Flow Control
Labels can be placed between the IL instructions to mark the first instruction that follows it.
Labels are used by the control flow instructions to jump to a predefined part in the code. It
is much easier and safer1 to use labels then offsets. The instructions dealing with control flow
inside a method are the following;
• Branching;
• Switching;
• Exception handling;
• Returning.
The branching instruction can be divided into three types of branching. First, we have uncondi-
tional branching where the control flow jumps to another part of the code. Second, conditional
branching where the control flow is directed to another location based on a true or false value
on the stack. The third branching type is the comparative version where two values from the
stack are compared according to the condition specified by the OpCode. This condition can be
a greater then, not equal, less or equal and so forth.
The switch instruction uses a jump table to determine where to jump to. It takes an unsigned
integer from the stack and uses that number to jump to the target offset in the jump table
sequence. A value of zero on the stack instructs the switch to jump to the first target offset in
the list.
In Listing 6.7 an example of an unconditional branching instruction and a switch instruction
are listed.
1 Loop:
2
3 br Loop
4
5 switch(Label1, Label2, ,LabelN)
6 // Default case
7 Label1:
8
9 Label2:
1
Safer in the sense that when creating IL code by hand, it is safer to use labels instead of calculating, possibly
incorrect, offsets.
56 Automatic Derivation of Semantic Properties in .NET
6.1 Inside the IL
10
11 LabelN:
Listing 6.7: Control flow examples
The exception handling instructions are divided into an exiting and an ending instruction.
A block of code inside an exception handling clause cannot be entered or exited by simply
branching. There are special state requirements prohibiting this. So there is a leave instruction
to exit an exception handling block, which clears the stack space before branching. To indicate
the end of an exception handling block a special endfinally instruction is used. This also
clears the stack but does not jump.
A method always ends with one or more ret instructions which returns the control flow to the
call site. If there is a (single) value on the stack, the ret instruction will retrieve this value and
push it back on the stack to be used by the calling method.
6.1.8.2 Arithmetical Instructions
Arithmetical instructions are used for numeric data processing, stack manipulation, loading
and storing of constants, arithmetical operations, bitwise operations, data conversion opera-
tions, and logical condition check operations.
Stack manipulation instructions perform an action on the evaluation stack and do not have a
parameter.
nop Performs no operation on the stack. Although not a stack manipulation instruction it is
included in this list for lack of a better category;
dup Duplicates the value on the top of the stack;
pop Removes the value from the top of the stack.
Constant loading instructions place the parameter with the constant value on the stack. There
are instructions which directly specify the value to load so a parameter is not needed, as shown
at line 2 in Listing 6.8.
1 ldc.i4 16 // Place the constant 16 onto the stack
2 ldc.i4.7 // Load the value 7 on the stack. Note; there is no parameter
Listing 6.8: Constant loading instructions
It is possible to load and store values using pointers. The value on top of the stack refers to a
specific address where the value can be loaded from or stored to. Table 6.1 lists all the possible
arithmetical operations. The overflow instructions raises an exception when the result does not
fit the target type.
Bitwise and shift operations have no parameters. They take one or two values from the stack,
perform there action and place the result back onto the stack. Table 6.2 provides a summary of
the bitwise and shift operations.
The conversion instructions take a value from the top of the stack, convert it to the type spec-
ified by the instruction and put the result back onto the stack. There are of course some rules;
not every type can be converted to another type and information can get lost when converting
M.D.W. van Oudheusden 57
6.1 Inside the IL
OpCode Description
add Addition
sub Subtraction
mul Multiplication
div Division
div.un Unsigned division
rem Remainder, modulo
rem.un The remainder of unsigned operands
neg Negate thus invert the sign
add.ovf Addition with overflow
add.ovf.un Addition of unsigned operands with overflow
sub.ovf Subtraction with overflow
sub.ovf.un Subtraction of unsigned operands with overflow
mul.ovf Multiplication with overflow
mul.ovf.un Multiplication of unsigned operands with overflow
Table 6.1: Aritmetical operations in IL
to a narrowing type (for example converting a 32 bit integer to a 16 bit integer). Special over-
flow conversion opcodes are available when there is a need to throw an exception whenever
the value must be truncated to fit the target type.
Logical condition check instructions are used to compare two values based on a certain opera-
tor. The result is placed on the stack and is not directly used to branch to another location. A
separate conditional branching instruction (branch true at line 3) can follow a logical condition
check instruction (check equal at line 2) as shown in Listing 6.9.
1 Loop:
2 ceq
3 brtrue loop
Listing 6.9: Condition check followed by a branching instruction
6.1.8.3 Loading and Storing
Almost all the instructions in the CIL operate on values on the stack except for the loading
and storing instructions. This group of instructions is used to load values from local variables,
fields, and method arguments onto the stack and to store items from the stack into these local
variables, fields, and arguments.
The ldarg and starg instructions handle the argument loading and storing while the ldloc
and stloc are used for local variables. The parameter indicates the ordinal of the argument or
variable to load or store. Remember; the first (zero based) argument of an instance method is
the object pointer.
The ldfld and stfld are the instructions for field loading and storing. A field signature does
not indicate whether the field is static or instance so there are special instructions to load and
store field values (or a pointer to) from or onto the stack. For each static load/store function
there is also an instance load/store function.
58 Automatic Derivation of Semantic Properties in .NET
6.1 Inside the IL
OpCode Description
and Bitwise AND (binary)
or Bitwise OR (binary)
xor Bitwise exclusive OR (binary)
not Bitwise inversion (unary)
shl Shift left
shr Shift right
shr.un Shift right, treating the shifted value as unsigned
Table 6.2: Bitwise and shift operations in IL
6.1.8.4 Method Calling
From within a method body it is possible to call other methods (see Listing 6.10). There are a
number of instructions to use for method calling. These call instructions use a token as param-
eter which contains either a MethodDef or a MethodRef of the method being called.
Parameters of the calling method should be put into the stack in order of their appearance in
the method signature before the actual call to the method. In case an instance method is being
called, the instance pointer should be placed on the stack first. If the called method does not
return void, it will place its return value back on the stack when returning.
Methods can be called directly or indirectly. With an indirect call, not the method name, but a
pointer to the method is used.
1 ldstr "Enter a number"
2 call void [mscorlib]System.Console::WriteLine(string)
3 call string [mscorlib]System.Console::ReadLine()
Listing 6.10: Method call example
6.1.8.5 Exception Handling
Exception handling is a feature of the managed runtime. The runtime is capable of detecting
exceptions and finding a corresponding exception handler. SEH information is stored in a table
after the IL code of the method body. There are two forms of SEH declaration; a label form as
shown in Listing 6.11 with an example in Listing 6.12 and a scope form, shown in Listing 6.13.
1 .try <label> to <label> <EH_type_specific> handler <label> to <label>
Listing 6.11: Exception handling in label form
The EH_type_specific is either a catch, filter, fault or a finally type. It uses labels to define
a guarded block of code and a block of code which handles the exception. This is called the
labeled form of exception handling. With the scope form, a try instruction is placed before the
actual instruction block and is followed by the exception handling catch block, see Listing 6.13.
1 BeginTry:
2
3 leave KeepGoing
4 BeginHandler:
5
M.D.W. van Oudheusden 59
6.1 Inside the IL
6 leave KeepGoing
7 KeepGoing:
8
9 ret
10 .try BeginTry to BeginHandler catch [mscorlib]System.Exception
11 handler BeginHandler to KeepGoing
Listing 6.12: Exception handling in label form example
1 .try {
2 // Guarded code
3 leave KeepGoing
4 }
5 catch [mscorlib]System.StackOverflowException {
6 // The exception handler 1 code
7 leave KeepGoing
8 }
9 catch [mscorlib]System.Exception {
10 // The exception handler 2 code
11 leave KeepGoing
12 }
Listing 6.13: Exception handling in scope form
Exception handling structures can be nested either using the labeled or the scope form. It is
illegal to branch in or out a guarded block of code. This type of block can only be entered from
the top (where the guarded block is defined by a TryOffSet) and code inside the handler blocks
can only be invoked by the exception handling system. Guarded and handler blocks can only
be exited by using the leave, throw, or rethrow instruction. The evaluation stack will be cleared
before the branching.
6.1.9 Normalization
Because the IL is stack based, a simple expression can become very complex. Take for example
the expression y := (x + 2) − 1. Adding the constant value 2 to the value of x and subtract the
constant 1. The result is stored in the variable y. Listing 6.14 shows this expression in IL code.
1 ldloc x
2 ldc 2
3 add
4 ldc 1
5 sub
6 stloc y
Listing 6.14: Stack based expression in the Common Intermediate Language
Recognizing the expression is difficult. Temporary results are placed on the stack and are re-
trieved by the next operation. For instance, the result of the add operation is not stored in a
variable. To convert a stack based program to a semantical representation, it is necessary to
perform a normalization step. The example in Listing 6.14 can be normalized by introducing a
temporary variable. As such there is a single assignment for each statement.
temp = x + 2
y = temp - 1
60 Automatic Derivation of Semantic Properties in .NET
6.1 Inside the IL
This is only a simple example, but expression with multiple sequential operators can be nor-
malized in more temporary assignments.
6.1.10 Custom Attributes
The .NET Framework can be extended using custom attributes, special metadata items. Cus-
tom attributes cannot change the metadata tables because those tables are hard-coded and part
of the runtime. The information in the custom attributes can be used by the program itself,
but also by compiler or debugger. A major disadvantage of custom attributes is the amount of
resources they occupy in the source code and the fact that the IL is not able to access custom
attributes directly so a reflection technique must be used, which is a relatively slow mechanism.
Custom attributes can be attached to any item in the metatables except for custom attributes
itself. Attaching an attribute to an instance is not possible, only an assignment to the type itself
is allowed. The value of a custom attribute is a BLOB, a binary large object and contains the
arguments of the attribute constructor and optionally a list of fields with values.
1 .custom instance void <class_ref>::.ctor(<arg_list>) [ = ( <hexbytes> ) ]
Listing 6.15: Custom attribute syntax in IL
Listing 6.15 shows the declaration of a custom attribute in IL code. The class_ref is the name
of the class implementing the attribute, arg_list are the arguments for the constructor of the
attribute and hexbytes contains the BLOB representation of the argument values. An example
is shown in Listing 6.16.
1 .custom instance void MyAttribute::.ctor(bool) = (01 00 01 00 00)
Listing 6.16: Custom attribute example in IL
The position of the custom attribute in the code defines the owner of the attribute. All the
attributes declared in the scope of an item, belong to that item. If there is no scope, then the
attributes declared after a item belong to that item. This is the opposite of the use of custom
attributes in a higher level programming language like C# or VB .NET where the custom at-
tribute precedes the item it belongs to (see the example in Listing 6.17). There is another form
of custom attribute declaration where the owner can be explicitly specified for metadata items,
which are declared implicitly such as TypeRefs and MemberRefs. This form can appear anywhere
in the source since the owner is specified.
1 public class ExampleClass
2 {
3 [MyAttribute(true)]
4 public void ExampleMethod( )
5 {
6 //
7 }
8 }
Listing 6.17: Example of a custom attribute in C#
M.D.W. van Oudheusden 61
6.2 Access the IL
6.2 Access the IL
In the previous section the Intermediate Language used by Microsoft is described. Used as an
extra layer before compiling the instructions to machine code, it provides an accessible pro-
gram representation of the software created by a higher level programming language like C#
or VB.NET. However, the instructions together with the metadata is stored in a byte code for-
mat. A format optimized for speed and efficiency, but not direct readability. This section lists a
number of the ways to read the contents of the byte code so we have access to the instructions
and data.
6.2.1 How to Read IL
Basically to access IL code we have to parse the byte code and convert the code to their IL
representation. For instance, the hexadecimal value 5A stands for the IL instruction Add. Using
the ECMA 335 specification [25], the standard for the Common Language Infrastructure and
thus the Intermediate Language, we can read and parse the byte code. However, it is not as
simple as it looks. Information is contained in different metadata tables and must be associated
with the correct elements. Instead of writing our own CIL byte code parser, it is more efficient
to use an existing tool. Called IL readers, they parse the byte code and create a higher level
representation of the instructions. How this is represented differs by the used tool.
The process of analyzing a subject system to create representations of the system at a higher
level of abstraction is called reverse engineering [14]. A tool to reverse engineer a program
is called a decompiler, performing the reverse operation to that of a compiler. Microsoft even
provides a free decompiler called ILDASM (Intermediate Language Disassembler), which is
part of the Software Development Kit (SDK). It is the counterpart of ILASM, used to convert
plain text IL code to byte code. Using both tools you can even perform a round trip compila-
tion, so it is possible to decompile an assembly with ILDASM and recompile with ILASM and
get a correct assembly again. ILDASM provides a Graphical User Interface (GUI), displayed
in Figure 6.5, to show the contents of the .NET file in a tree structure. With special options it
allows access to the metadata and the CIL code of the selected methods.
Figure 6.5: Graphical User Interface of ILDASM
62 Automatic Derivation of Semantic Properties in .NET
6.2 Access the IL
Besides the GUI, it allows output to a text file. By specifying the correct options, ILDASM
generates a .IL file containing the Intermediate Language code. An example of this format is
shown in Listing 3.1. Since this is a much easier representation to read than the byte code
version it can be used as an input for an IL parser. We can even change this representation and
recompile it using the ILASM program. A technique currently used by the Compose /.NET
project [10].
Reflector1, a tool created by Lutz Roeder, is a commonly used tool to inspect .NET assemblies.
Figure 6.6 shows a screen shot of this program. This tool can convert the analyzed IL code to
a higher level language like C#, VB.NET or Delphi. With the use of plugins it is possible to
perform analysis on the code or even to completely retrieve the source of an assembly in the
language of your choice.
Figure 6.6: Lutz Roeder’s Reflector
With the help of these decompiler tools it is very easy for anybody to reverse engineer assem-
blies back into readable source code. Malicious people can now crack the programs, exploit se-
curity flaws and steal ideas. If it is imperative to protect your source code, obfuscation should
be applied. Code obfuscation is the generation of code, which is still understandable by the
compiler, but is very difficult for humans to comprehend. Some of the techniques used with
obfuscation are removing nonessential metadata, control flow obfuscation, string encryption,
reordering of elements, and size reduction. This is not applied on the source code, but on the
assemblies. Although it is not a foolproof security for your assemblies, it makes it very difficult
to reverse engineer an application.
1
http://www.aisto.com/roeder/dotnet/
M.D.W. van Oudheusden 63
6.2 Access the IL
Besides tools for visualization of the source code, there are tools to perform specific analysis
tasks on the code. For instance, the Fugue protocol checker1. By specifying custom attributes
inside the source code, Fugue checks if the code conforms to the declarative specification. Pre-
and postcondition, resource usage, database queries, and so on can be specified, as well as the
different states a program must adhere to. Fugue performs a static analysis to find possible
problems which could occur at runtime[22].
FxCop2 is another static analysis tool that checks .NET managed code assemblies for confor-
mance to the Microsoft .NET Framework Design Guidelines. It generates reports on possible
design, localization, performance, and security improvements based on the design guidelines
written by Microsoft. Targets, the managed assemblies, are checked by rules. If a rule fails, a
descriptive message is displayed revealing relevant programming and design issues. Figure 6.7
displays a screenshot of FxCop.
Figure 6.7: Microsoft FxCop
There are a lot of other tools using a representation of the IL byte code to perform an analysis
task. However, the purpose of this assignment is to perform our own analysis. We not only
need an IL reader, but also a program interface to access the IL representation so semantic
analysis can be performed. The next sections list some possible tools which can be used for this
goal.
6.2.2 Reflection
Reflection is a method to inspect the metadata to get information about an assembly, module,
or type [65, 4]. The .NET Framework Class Library contains special functionality in the Sys-
tem.Reflection namespace to access the metadata without converting the byte code to another
format. This is internally handled by the reflection classes. Reflection is used at runtime by
1
http://research.microsoft.com/˜maf/projects/fugue/index.htm
2
http://www.gotdotnet.com/team/fxcop/
64 Automatic Derivation of Semantic Properties in .NET
6.2 Access the IL
a program to analyze itself or other components. The structure of the analyzed component is
converted to an object representation. There are three main classes;
• System.Reflection.Assembly, which represents assemblies;
• System.Reflection.Module, which represents managed modules;
• System.Type, representing types.
These classes contain properties and methods to get further information about the assemblies.
For instance, the Assembly class contains functionality to get all the types in the assembly as
System.Type objects. This type information is the root of all reflection operations and allows
access to methods, fields, parameters, and so on. Besides the method in the Assembly class,
there are other ways to dynamically load types. This is called late binding, binding to a type at
run time rather than compile time.
If we have a Type object, we can also invoke methods on this type, such as activating or execut-
ing the method. In Listing 6.18 an example of using reflection to load an assembly, get the class
ImageProcessing and the method GetImage is presented. This method is invoked and returns an
Image object.
1 Assembly assm = Assembly.Load ("assembly.dll");
2 Type type = assm.GetType ("ImageProcessing");
3 MethodInfo method = type.GetMethod ("GetImage");
4 Object obj = Activator.CreateInstance (type);
5 Image image = (Image) method.Invoke (obj, null);
Listing 6.18: Reflection example
With reflection it is possible to access the custom attributes, the elements providing extra infor-
mation about other items. Because it is not possible to directly access the custom attributes in
the IL code, reflection is the only way to retrieve these items.
The reflection library has another component named Emit and it allows a compiler or tool to
emit metadata and Intermediate Language code and execute this code or save it to a file.
Reflection is a powerful way to perform static analysis on assemblies. However there are two
major problems. The first one is speed. Reflection takes place at runtime and must parse and
process the assembly before it can build a representation. This is a time consuming process.
Speed improvements are included in .NET version 2.0. A more serious problem is the lack of
method body information. Reflection can reveal almost all the information needed except the
contents, the body, of a method with the actual instructions inside. This is the information we
are mostly interested in. Again, in version 2.0, the reflection classes have been enhanced with
a function to get the method body contents. However, this will only give the byte code, so we
still have to parse and convert this to another representation.
Reflection gives us a partial implemented tool to perform code analysis. If the reflection possi-
bilities regarding speed and the method body are improved then it will be a good candidate to
use.
M.D.W. van Oudheusden 65
6.2 Access the IL
6.2.3 Mono Cecil
Cecil1 is a .NET library to inspect and generate .NET assemblies. It provides more or less the
same functionalities as reflection. You can load an assembly and browse through all the types.
In addition to reflection, it can read and parse IL instructions and has functionality to change
the code and save it back to disk.
Listing 6.19 gives an example of opening an assembly and reading the types inside this assem-
bly. If we have access to a type object, we can use that object to retrieve the fields, methods,
constructors, and so on.
1 // Creates an AssemblyDefinition from the "assembly.dll" assembly
2 AssemblyDefinition assembly = AssemblyFactory.GetAssembly ("assembly.dll");
3
4 // Gets all types which are declared in the Main Module
5 foreach (TypeDefinition type in assembly.MainModule.Types) {
6 // Writes the full name of a type
7 Console.WriteLine (type.FullName);
8 }
Listing 6.19: Cecil get types example
A method object has a property called Body which returns a MethodBody instance. We can use
this object to access all the local variables, the exception handling information and the IL in-
structions. Besides from directly accessing the instructions, we can also use the analysis tools of
Cecil. The flow analysis class create basic instruction blocks in which instructions are grouped
together in a block. A control flow instruction, like branching, exception throwing and return-
ing, creates a new block and blocks are connected to each other. If the last instruction in a block
is a branching instruction to another block, then the current block has a link to that block. This
gives us the opportunity to trace the control flow of the instructions in the method.
Inside the blocks are instructions and each instruction has an OpCode, indicating the type of
instruction, and an operand, the parameter of the instruction (when available). The offset of
the instruction, indicating the placement of the instruction inside the method, is also stored in
the instruction. A next and previous method allows for navigation between the instructions
and a visitor pattern can be used the visit all the different kinds of instructions.
Not only can Cecil read IL instructions, it can also add or change instructions and save the
changed assembly. Cecil is used in a number of analysis tools, for instance, tools which checks
if code is type safe or for code optimization.
Although Cecil had the ability to access the IL instructions, it is limited in its abilities. An
instruction does not contain specific information about the data it is working on as specified
by the operand and there is no direct link to this operand. We have to determine the type of
the operand ourself. Support for .NET Framework version 2.0 was not available in the version
of Cecil examined for the assignment. It is possible that newer versions can handle the next
version of the Framework without any problems.
1
http://www.mono-project.com/Cecil
66 Automatic Derivation of Semantic Properties in .NET
6.2 Access the IL
6.2.4 PostSharp
A similar program to Cecil is PostSharp1. This tool reads .NET assemblies, represents them
as a Code Object Model, lets plug-ins analyze and transforms this model and writes it back
to binary form. The two main purposes of this application are program analysis and program
transformation.
PostSharp is designed for .NET version 2.0 and supports the new language constructs in the
CIL. Working in combination with the reflection capabilities of .NET it creates its own repre-
sentation of the instructions inside a method body. Listing 6.20 gives an example of reading an
assembly and printing all the instruction to the console.
1 // Get the assembly
2 System.Reflection.Assembly assembly =
3 System.Reflection.Assembly.LoadFrom("assembly.dll");
4 System.Reflection.Module[] modules = assembly.GetModules();
5
6 // Get all the modules
7 foreach (Module mod in modules)
8 {
9 // Open a module with PostSharp
10 PostSharp.ModuleReader.ModuleReader mr =
11 new PostSharp.ModuleReader.ModuleReader(mod);
12 PostSharp.CodeModel.ModuleDeclaration md = mr.ReadModule();
13
14 // Get the types
15 foreach (TypeDeclaration t in md.Types)
16 {
17 // Get all the methods in the type
18 foreach (MethodDeclaration method in t.Methods)
19 {
20 // Get the body of the method
21 MethodBodyDeclaration b = method.Body;
22
23 // Print the method name
24 Console.WriteLine(method.Name);
25
26 // Enumerate through all the instructions
27 b.ForEachInstruction(delegate(InstructionReader instructionReader)
28 {
29 Console.WriteLine("Read instruction {0} as {1}",
30 instructionReader.OpCodeNumber, instructionReader.OperandType);
31 });
32
33 method.ReleaseBody();
34 }
35 }
36 }
Listing 6.20: PostSharp get body instruction
Just like Cecil, PostSharp has the ability to split the instructions into blocks for a representation
of the control flow. Instructions are represented in the code model with detailed information
about the instruction and their operands. Although the operands type information must still
be resolved.
1
http://www.postsharp.org
M.D.W. van Oudheusden 67
6.2 Access the IL
PostSharp is more mature than Cecil, but is still under heavy development. At the time of
implementation, PostSharp was not production ready.
6.2.5 RAIL
The Runtime Assembly Instrumentation Library, called RAIL1, is a project of the University of
Coimbra, Portugal. Like Cecil and PostSharp, it allows .NET assemblies to be manipulated and
instrumented before they are executed. It fills the gap between .NET reflection and the emitting
functionality. Its primary use is the transformation of assemblies by changing types, methods,
fields or IL instructions [12].
RAIL creates an object-oriented representation of the assembly which can be manipulated to
make changes in the code. Besides the structured view, RAIL also creates tables to hold the
sequence of objects and object references that represent the applications IL code and all the
exception handling related information.
Applications of RAIL are runtime analysis tools, security verification, MSIL optimization, As-
pect Oriented Programming, and so on. However at the time of writing, RAIL was immature
and could not be used for even simple analysis tasks.
6.2.6 Microsoft Phoenix
Phoenix2 is a framework for building compilers and a wide range of tools for program analysis,
optimization, and testing [50]. Phoenix is a joint project between Microsoft Research and the
Developer Division and is the basis for all future Microsoft compiler technology. It supports a
wide range of hardware architectures and languages.
Building blocks form the core of Phoenix, implemented around a common intermediate rep-
resentation (IR). Those blocks are called phases and are executed one by one in a predefined
order. Phases are used to build, modify or analyze the IR and in most cases, the final phase will
write the IR to a specific output format.
Figure 6.8 shows the components of the Phoenix platform with the IR as the main structure
that the phases can use to interact with the data. Readers are used to process different types of
input, like AST, C intermediate language, Common IL (MSIL), PE files and other binaries. The
input is read into the IR that represent an instruction stream of a function as a series of dataflow
operations and the phases are executed in sequence. Each phase performs some operations on
the IR and hence the IR can be in different levels of abstraction. For instance, during a compi-
lation with Phoenix, the IR is transformed from a high level IR, which is machine independent,
to the final instructions and addresses, a low level IR, which is machine dependent. Finally, the
writers are used to build an executable or library.
The list of phases can be changed to include or replace other phases at someplace in the se-
quence. If an analysis phase needs access to a high level representation of the code then include
the phase at the start of the sequence.
1
http://rail.dei.uc.pt/
2
http://research.microsoft.com/phoenix
68 Automatic Derivation of Semantic Properties in .NET
6.2 Access the IL
Figure 6.8: Platform of Phoenix
Besides the representation of the functions in the IR, there is also an API to perform analysis
on the data, like data flow, control flow, graphs (inheritance, call and interference), exception
handling, Static Single Assignment1, and so on. The IR can also be modified by adding or
changing instructions of a function or by changing functions. This is ideal for instrumentation
and profiling, but also for AOP code weaving.
There are two ways to use Phoenix, either as a compiler back-end or as a standalone tool. As
a compiler back-end it uses the Phoenix framework and the input and output modules with
the custom phases to do a compilation. As a standalone tool it is possible to directly call the
Phoenix API and implement your own phases and place those at the right place in the phase
list.
Each phase implements an Execute function. A Unit is passed as a parameter to this function
and can contain any of the unit types listed in Table 6.3. It is up to the phase to determine
Unit Description
FuncUnit Encapsulates the information required during compilation of a single
function or method.
DataUnit Represents a collection of related data, such as a set of initialized vari-
ables or the result of encoding a FuncUnit.
ModuleUnit Represents a collection of functions.
PEModuleUnit Represents a portable executable (PE) image, such as an EXE or a DLL.
AssemblyUnit Represents a .NET Framework assembly compilation unit.
ProgramUnit Represents an executable image compilation unit, an EXE or a DLL.
GlobalUnit Represents the outermost compilation unit.
Table 6.3: Phoenix unit hierarchy
if the unit is of the type it is interested in. The FuncUnit is the fundamental compilation unit
in Phoenix and contains all the information necessary to compile a single function. For code
1
An intermediate representation in which every variable is assigned exactly once.
M.D.W. van Oudheusden 69
6.2 Access the IL
Figure 6.9: Control flow graph in Phoenix [59]
analysis at the instruction level, this is the most interesting unit to analyze. It does not only
contain the instructions, but also graphs, exception handling information and type information.
Each phase has access to the intermediate representation (IR), which represent an instruction
stream of a function as a series of data-flow and/or control-flow operations. Instructions in
the IR have an operator, a list of source operands, and list of destination operands. The IR is
also strongly typed, meaning the types on the operands that reference data are stored. If, for
instance, an integer is placed onto the stack, then the source operand of the load instruction
is of the type integer. The type is determined by the operator and by the types of the source
operands.
The control flow in the IR is explicit, instructions are in sequence. Using the flows graphs
functionality it is possible to create basic blocks representing a sequence of instructions with
flow edges to other blocks. Each block starts with a unique label and ends with a branching
instruction or, optionally, an exception raising instruction. Figure 6.9 shows an example of the
control flow graph in Phoenix.
The IR instructions can be divided in two different kinds;
70 Automatic Derivation of Semantic Properties in .NET
6.2 Access the IL
Real
Describes instructions that represent operations with dataflow or control flow, most of
which map to one or more machine instructions.
Pseudo
Instructions that represent things such as labels and statically allocated data.
Table 6.4 shows the different forms of instructions available in the Phoenix IR. The last three
items are pseudo instructions, the rest are real instructions.
Instruction Description
ValueInstr Any arithmetic or logical operation that produces a value
CallInstr Function invocation, either direct or indirect
CmpInstr A compare instruction that generates a condition code
BranchInstr Control flow for a branching, condition/unconditional and returning
SwitchInstr Control flow for a switching, multi-way computed branch
OutlineInstr Defines an outline summary instruction in the IR for code moved out of
the main instruction stream: e.g. asm blocks
LabelInstr User-defined labels and control flow merge points in the code stream.
PragmaInstr Arbitrary user supplied directives.
DataInstr Statically allocated data.
Table 6.4: Phoenix instruction forms
Each instruction object contains properties such as the OpCode indicating the kind of opera-
tion, a source operand list, and a destination operand list. Based on the type of the instructions,
more properties can be available. The BranchInstr will contain properties with links to the La-
belInstr for a true and a false value of a condition. The CallInstr has a property indicating the
name of the function being called.
In Listing 6.21 part of a phase is listed. The execute function checks if the type of the Unit is
a FuncUnit. If this is the case, it will build a flow graph and go through all the instructions in
each block, while printing the OpCode value to the console.
1 protected override void Execute(Phx.Unit unit)
2 {
3 // Try casting to a FuncUnit
4 Phx.FuncUnit func = unit as Phx.FuncUnit;
5 if (func == null)
6 {
7 // Only interested in FuncUnits
8 return;
9 }
10
11 bool noPrevFlowGraph = func.FlowGraph == null;
12 if (noPrevFlowGraph)
13 {
14 // Build a control flow graph
15 func.BuildFlowGraphWithStyle(Phx.Graphs.BlockStyle.SplitEachHandlerEdge);
16 }
17
18 Phx.IR.Instr instr;
19 Phx.Graphs.BasicBlock basicBlock;
20 basicBlock = func.FlowGraph.StartBlock;
21
M.D.W. van Oudheusden 71
6.2 Access the IL
22 // Loop through all the basic blocks
23 while (basicBlock != func.FlowGraph.LastBlock)
24 {
25
26 // Begin the per-block instruction traversal.
27 instr = basicBlock.FirstInstr;
28 for (uint i = 0; i < basicBlock.InstrCount; i++)
29 {
30 // Write the OpCode to the console
31 Console.WriteLine(instr.Opcode);
32
33 // Get the next instruction in this block
34 instr = instr.Next;
35 }
36
37 // Get the next block
38 basicBlock = basicBlock.Next;
39 }
40
41 // Clean up the flowgraph
42 if (func.FlowGraph != null) func.DeleteFlowGraph();
43 }
Listing 6.21: Phoenix phase execute example
Phoenix is the most extensive IL reader and writer discussed in this chapter. It supports the
.NET Framework version 2.0, gives detailed information about the operands of an instruction,
and has flow graphs capabilities. It is more mature than Cecil and PostSharp, but is still under
development. However, it has the support of Microsoft and it is used by Microsoft to compile
applications like Windows, PowerPoint, certain games, and for building test tools. There is
more documentation than the other IL readers discussed, but still information and samples are
scarce.
72 Automatic Derivation of Semantic Properties in .NET
CHAPTER 7
Design and Implementation
Chapter 5 discussed the semantics of programming languages and Chapter 6 described the in-
ner workings of the language our target source code is in. This chapter brings the two together
by showing the design of the Semantic Analyzer.
7.1 General Design
A high level overview of the system is needed before we can elaborate on the more specific
parts. This section presents a general overview of the complete system, describes the limita-
tions we have to take into account, and the flow of the program and its various components. It
also specifies the coding guidelines used for the implementation.
7.1.1 Introduction
Basically, we want to read an assembly, perform an analysis task on it, and produce some
output. To perform the querying of data, we store the semantic representation of the input
assembly in a semantical metamodel first (more details in Section 7.2). We can use this model
to reason about the behavior of the elements in the source. To archive this, we perform a
number of steps;
1. Read a .NET assembly into memory;
2. Parse the intermediate language byte code to a readable representation;
3. Retrieve the structure of the assembly;
4. Build a semantic model based on the structure;
5. Convert IL instructions to a semantical representation and store in the model;
6. Provide a querying mechanism to search and reason about the program.
73
7.1 General Design
The reading and parsing of an assembly is handled by an existing tool. There are multiple IL
readers available as seen in Section 6.2 and since Phoenix is best suitable for the job, we use this
tool to access the assemblies.
The source code is converted to a metamodel, a higher level representation of the code, which
provides information to determine the behavior of the program. An interface allows for the
searching and working with the data inside this model. Two types of applications are created
to test the semantical extractor. One is a command line utility where the input is one or more
assemblies and the output is determined by the supplemented plugins. The other application
is a Windows Forms application, which provides a GUI to browse the metamodel and see all
its properties and graphs.
7.1.2 Design Limitations
Before the semantic analyzer is designed, a number of design requirements have to be taken in
consideration.
1. Because of the structure of the Compose compilation process, we do not have access
to the full .NET assemblies at the start of the compilation (thus not before the actual
weaving) [39]. The reason for this is the method introduction, which introduces new
signatures making it impossible for the compiler to compile the source. By using dummy
files and custom compilation the Compose compiler can create the assemblies.
2. The analyzer should be language independent. Compose is available for multiple plat-
forms and although the focus is on the .NET version of Compose , it would be wise to
consider a model capable of expressing multiple source languages.
To take these points into account there are a number of possible solutions.
Compose Compilation Process Looking at item one, we have to redesign the way the compila-
tion process is working. If we have a .NET assembly at the first stage of the building process we
can run the semantic analyzer at that point. To be able to do this we must directly compile the
source and the dummy files so we have an assembly. However the dummy files only contain
signatures and lack method body implementations. The analyzer needs the implementations,
thus we cannot use these assemblies at this point in time.
Another solution is to use the assemblies modified by ILICIT [10]. However, at that point the
selection for the placement of the filters has already been completed. New selection points
based on the semantics of methods are too late to be introduced. We can still use the analyzed
assembly to perform other tasks, like resource usage checking, side effects determination, and
so on.
The third option is to only analyze the source files, as long as they are present as an IL repre-
sentation. They do not contain the added aspect code, but can be used as a source.
The most elegant solution is the possibility to switch off type checking so an assembly can be
created by the standard .NET compilers as the first step in the Compose compilation process.
This is not possible with the default implementation of the .NET compilers (writing our own
compiler is an option to overcome this, but requires too much effort for each .NET language).
74 Automatic Derivation of Semantic Properties in .NET
7.1 General Design
Because of this limitation, the first version of the semantic analyzer will primary be used for
resource extraction and providing extra information to other Compose modules. This is an
action which can be performed after (or separated from) the main Compose compilation pro-
cess. The Semantic Analyzer will be placed after the ILICIT module.
Language Independancy Regarding issue two, the language independency of the analyzer, we
have to make sure we can store the semantics of any object oriented language. This means we
have to distinguish the language specific parts from the common parts and only store the com-
mon behavioral information. Type information, for instance, is language dependent (the type
system differs in the different OO implementations) and must be stored in a special manner.
Behavior is still very generic and not directly connected to the source. There are differences
between the naming in Java and C# such as the ToLowerCase in Java and the Lower in C#,
but the operations act the same. Section 7.2 gives more details about the implementation of a
language independent system.
Figure 7.1 provides a general overview of the system. Different Semantic Extractors can be used
for specific source languages. They each use their own reader and parser to access the source
code and convert the code to a semantical representation, which is stored in the Semantical
Model. The Semantic Database is used to access this model and contains the querying mecha-
nism. Plugins perform the specific tasks to get the required behavioral information.
Figure 7.1: Process overview
Since the Semantic Model is language independent it can theoretically handle all types of source
M.D.W. van Oudheusden 75
7.1 General Design
languages as long as they are object-oriented. Besides the Java elements, we could also add
a Borland Delphi extractor to process Object Pascal source files and store the semantics in the
model.
7.1.3 Control Flow
Because of the design limitations discussed earlier there is a specific control flow the analyzer
will use when we integrate the system into Compose .NET. The analyzer will be implemented
as a stand-alone console tool. The main analyzer is in a separate assembly so in the future it
can also be called from another front-end.
The Semantic Extractor will return a metamodel of the analyzed assembly. This model is passed
by the console application to the plugins, which can query this model using the Semantic
Database API. This way there is no need to save the metamodel to file and read it again.
Figure 7.2: Control flow of the console application with a plugin
Figure 7.2 shows this flow with the Resource Checker plugin. It is possible to add multiple
plugins to the system by using command switches.
This pipeline flow of the analyzer is not the only way the application can be used. Since all the
functionality is spread in different components, it is also possible to only use an extractor and
work directly on the metamodel, not using the database or plugins.
76 Automatic Derivation of Semantic Properties in .NET
7.1 General Design
7.1.4 Building Blocks
The requirements ensure a separation of functionality in different components. For instance,
the extractor using the Phoenix library is in a different assembly so the plugins have no de-
pendency on the Phoenix subsystem (or any other language dependent system). The next
paragraphs describe the different components of the system. The whole system is called the
Semantic Analyzer.
7.1.4.1 Semantic Extractor
This component is responsible for the extraction of the semantics from a source file. This source
file can be a .NET assembly or a Jar file for the Java version. Currently there only is a .NET
Semantic Extractor. The Extractor uses the Microsoft Phoenix library for the actual reading and
parsing of the IL code. The resulted semantics will be stored in the metamodel and returned to
the caller. Another component, like a command line application, calls the extractor and passes
the returned metamodel to the plugins.
When it is possible to create IL code directly at the start of the Compose compilation process,
the extractor can, in theory, be placed before the other Compose components to determine
semantic weaving points.
Besides Phoenix, it is also possible to use other IL readers for the same task. Using the provider
design pattern [40], we can chose which specific extractor must handle the calls issued to the
abstract base class, the SemanticExtractor class. The available providers are listed in a con-
figuration file and only one provider provide the actual implementation of the abstract class.
Support for the provider design pattern is part of the .NET Framework version 2.0 Class Li-
brary.
1 // Change the default provider to a new provider
2 SemanticExtractor.SetProvider("phoenix");
3
4 // Call the Semantic Extractor
5 IList<SemanticItem> semanticItems = SemanticExtractor.Analyse(assemblyName);
Listing 7.1: Calling the SemanticExtractor
Listing 7.1 shows how to change the default provider to another provider and how to start
the actual analysis process. The value phoenix is one of the values defined in the applications
configuration file and can be found in Appendix C.
7.1.4.2 Semantic Metamodel
The Semantic Metamodel is an object model, created and filled by the Semantic Extractor and
passed to the plugins as data source. It is contained in memory, but a representation of the
model can be written to an xml format. Recreating the metamodel from the xml file is currently
not supported.
This model is language independent. It is a higher level view of the source code with the same
structure (classes, methods, and so forth) as the original code. The behavior of a method body
M.D.W. van Oudheusden 77
7.1 General Design
is represented in a special way, using actions and blocks. More information about this model is
given in Section 7.2.
7.1.4.3 Semantic Database
The database allows for storing the model and to search for specific data in this model. It is
called by the plugins and works directly on the model (it does not need the Phoenix library or
the original source). To do this, it supplies a SemanticDatabaseContainer object. Not only
is the metamodel stored in this database, it also provides a number of ways to access the data
and to search in the database.
Although this is written in a .NET language (namely C# version 2.0), it uses functionality which
can be ported to Java (version 2 required). More information in Section 7.4.
7.1.4.4 Plugins
A plugin is a piece of code which uses the semantic model for a specific purpose. The Semantic
Analyzer is very general as where the plugins perform specific detailed actions using these
general functions.
For instance, resource usage will be collected by the Resource Checker plugin. This plugin pro-
vides information for the SECRET module and needs to get all the methods with a parameter of
type ReifiedMessage. Each method in the resulting method list is examined by determining the
behavior of the parameter containing a ReifiedMessage. It is possible the parameter is assigned
a different value or a method of the object is called. It is the task of the plugin to perform these
service orientated queries while the Semantic Database and Metamodel perform and contain
the general queries and data source.
7.1.5 Guidelines
A general guideline is presented to which the application will adhere. By using a specified
programming model the application will be consistent.
Coding guidelines For the .NET platform there already is an exhausted design guideline created
by Microsoft [58]. This guideline describes the naming of variables, methods, parameters etc.
The API designed for this assignment will follow that guideline.
Naming Conventions and Assemblies To provide a consistent programming model we have to
place the code inside namespaces. Microsoft advises to use the following standard:
CompanyName.TechnologyName[.Feature][.Design]
In this case we will use UTwente for the company name and SemanticAnalyzer as the tech-
nology.
78 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
The metamodel and its API will be used from within the tools. The developers of these tools
must program against the metamodel and not the extractors which depends heavily on the un-
derlying supporting tools like the Phoenix framework. For this reason we place the metamodel
in a separate assembly. Table 7.1 lists all the assemblies and their purposes. Each name begins
with the UTwente.SemanticAnalyzer namespace1.
Assembly name Purpose
SemanticExtractorRail Provider using RAIL
SemanticExtractorPostSharp Provider using PostSharp
SemanticExtractorCecil Provider using Cecil
SemanticExtractorPhoenix Provider using Phoenix
SemanticModel Contains the metamodel and graph algorithms
SemanticDatabase Database container and API to query the model
SemanticLibrary Shared functionality, such as plugin interface and
provider model
SemanticPlugins Standard plugins which will process the metamodel
SemanticComposeStarPlugins Specific plugins for Compose
SemanticExtractorConsole Console application
SemanticWinForms Windows Forms Application
Table 7.1: Assembly naming
7.2 Semantic Model
Information extracted by the Semantical Extractors is stored in a metamodel. This is an object
oriented model containing classes representing the structure and the semantical information of
the source code. The model can be stored in the SemanticalDatabaseContainer, which also
provides an interface for searching through this model.
Because multiple source languages can be converted to this model, we can not store any lan-
guage specific information such as types. However, we do not want to lose that kind of infor-
mation, so we have to convert this to a more general representation. The metamodel consists
of elements used by most object oriented languages and it is up to the extractor to convert the
specific language elements to the correct corresponding semantical elements in the model.
7.2.1 Overall Structure
To make effective use of the model, we not only have to store the semantics, but also the location
of these items in the original hierarchy. Meaning, if we have a function with certain behavior,
we have to place this function in its context. A function will be in a class, and a class is in some
sort of a container like an assembly (.NET) or JAR2 file (Java).
We can distinguish three main elements in the metamodel;
1
Note; the current implemented code does not yet completely adhere to this standard. Also some class and
function names are implemented using UK English instead of US English, used in this thesis.
2
JAR file (or Java ARchive) is a compressed file used to distribute a set of compiled Java classes and metadata.
M.D.W. van Oudheusden 79
7.2 Semantic Model
Unit
This is the base for all the semantic items providing structure for the model. Such as
classes, methods, and so on.
Operand
The element on which a mathematical or logical operation is performed. Like a field, a
parameter, a local variable, or a constant value.
Type
Contains a language independent view to store type information.
Besides these three core elements, there are elements to track source reference (the line numbers
the action came from), attribute information, and so on.
In the Section 7.2.5 a more detailed view of the model is given.
7.2.2 From Instructions to Actions
An extractor not only builds the structure of the metamodel, like the layout of all the classes
with their functions, it also converts the instructions, such as IL opcodes, to their semantical
representations. Code is converted to actions and each action performs some sort of task. An
action is represented in the model as a SemanticAction object and is placed inside a Semantical-
Block. Both are elements of a function.
Blocks The blocks are used for the control flow. Each block contains one or more actions and
are linked together. Each block has a direct link to its previous and next block in the control
flow. If the extractor supports exception handling extraction, we can also request the exception
handling block for a specific block. That is the block being called when there was an exception.
A simple example of the use of blocks is found in Listing 7.2. This for loop checks for a condition,
performs an action, and returns to the condition check part.
1 int j = 0;
2
3 for (int i = 0; i < 10; i++)
4 {
5 j = j + i;
6 }
7
8 return;
Listing 7.2: For loop in C#.NET
Figure 7.3 shows the corresponding blocks for the code in Listing 7.2. The extractor is respon-
sible for creating the blocks and connecting them to each other. If the information is available,
the extractor can also indicate the start and end line number of the source code corresponding
to a block. Blocks with no actions should be removed from the list of blocks and the links be-
tween the blocks must be updated. It is possible an extractor has introduced more blocks than
needed using its own control flow algorithm.
80 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
Figure 7.3: Loop represented as blocks
Actions If we look at the code in Listing 7.3, we see a number of different kind of actions. First
the value of x is increased by two, second, the value of y is multiplied with three. A compari-
son is performed on those two results and based on this comparison a branching operation is
executed, either to label l1 or label l2.
1 if ((x + 2) > (y * 3)) {
2 // l1
3 }
4 else {
5 // l2
6 }
Listing 7.3: Expression in C#.NET
It is human readable in the source language C#.NET, but converted to IL code, shown in List-
ing 7.4, it is more difficult to understand.
1 ldloc 0
2 ldc 2
3 add
4 ldarg 1
5 ldc 3
6 mul
7 cgt
8 brtrue l1
9 br l2
Listing 7.4: Expression in IL code
We do not know what type of local variable the statement ldloc is loading. The result of the
add operation is not stored in a variable, but only on the stack. Same with the result of the
multiplication. Both values of the stack are used by the compare operation (cgt). The branching
is more complex since it uses two different branch operations, one conditional branch and one
unconditional jump.
We would prefer a semantical representation of this expression in the way depicted by List-
ing 7.5.
1 t1$ (loc) = add x(loc), 2(con)
2 t2$ (loc) = mul y(arg), 3(con)
3 t3$ (loc) = cmp t1$, t2$, gt
4 = branch (t3$, true), l1, l2
Listing 7.5: Semantical representation of the expression
M.D.W. van Oudheusden 81
7.2 Semantic Model
We can see directly the basic actions performed; adding, multiplying, comparing, and branch-
ing. We also see the operands the actions are working on and the kind of operands, like a
variable, argument, or constant value. The actions store their results in temporary variables,
introduced by the Semantic Extractor. We can always trace the usage of an operand and where
it originated from. Some actions do not only contain the source and destination operands, but
also additional options. The compare action has information about the kind of comparison,
like greater then (gt). The branch action has direct information where a true and false value of
the corresponding comparison leads to.
Since the Semantic Extractors are language dependent and thus know the language they are rea-
soning about, they are responsible for converting one or more instructions to a corresponding
action. Not all the instructions provide a meaningful action and as such they do not intro-
duce new actions. On the other hand, multiple instructions can form one action. Loading two
operands onto the stack, adding the values, and storing the results is represented as one add
action.
In Appendix D you can find all the available kinds of semantic actions the model can store. The
arguments are listed with each item. Besides these arguments, an action can also have a link
back to the original source line number.
7.2.3 Dealing with Operands
An operand is the data used by the IL operation. An operation can use one operand, called a
unary operation, or two operands, called binary. Some OpCodes do not use any operands, for
instance a call to another function with no return value and no parameters.
In our model we have four kinds of operands;
Argument
This is the argument, also called the parameter, of a function.
Variable
A local variable and it exists only inside a function.
Field
A variable defined outside the function or even outside a class.
Constant
A constant value, a value that does not change, such as a number or a text.
The semantic actions can use these operands as their source or destination operands. It is
obvious that a constant operand cannot be used as a destination as it is a read-only operand.
We can access all the operands so we can follow them through a function. We might see a
variable operand get a default value such as a constant operand, being used by another action
and finally getting the value of an argument operand assigned to it. This is called data flow
analysis.
Each operand has a name and a type. The Semantic Extractor is responsible for assigning a
unique name to each operand and determining a correct semantical type. A constant operand
also contains the value it is assigned to, such as a number or a text.
82 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
7.2.4 Type Information
Although types are directly related to a specific language, we still want to maintain type infor-
mation in our model. This means we store type information in two ways; as a textual represen-
tation and as one of a list of common types. It is up to the Semantic Extractor to map a language
type to a common type representation in the model. Appendix E list all the possible common
types the extractor can use. The original type is still conserved as text so applications capable of
reasoning about the source language can use this representation. For example, the Compose
plugin wanting to find the ReifiedMessage object can use the full name of the object type.
A Semantic Type also contains metadata regarding the type, like if it is a class, an interface, a
pointer, an array, and so on. Every operand has a Semantic Type, just as almost every unit in the
model.
7.2.5 Model Layout
The previous paragraphs gave some insight in the design of the model, more details are given
in this section. Figure 7.4 shows a simplified view of the structure of the metamodel. This is
a tree like structure and at the top there is a Semantic Container object. In .NET this is called
an assembly, in Java JAR. A container holds zero or more classes. Each class can contain fields
and operations. Operations are the methods or functions of a programming language. In .NET
languages there are also properties, special functions which act as accessors for private fields.
This kind of functions are actually normal functions with a prefix (get_ or set_) and a special
reference in a metadata table.
An operation has zero or more arguments, the parameters of a function. Inside the operation
there are zero or more local variables and constants. If there are instructions in the function,
then there will be one or more blocks with actions.
The main unit blocks of the model all inherit from the SemanticItem class. This class has a
collection of SemanticAttributes objects so custom attributes can be applied to any kind of
item in the model.
Figure 7.5 shows a class diagram of the SemanticItem class and its direct derived classes.
The child class SemanticUnit contains the structural elements of the model. In Figure 7.6 a
diagram is given of this class and its children. All the child classes have an interface and this
interface is used by all the other components.
Each unit also implements the interface IVisitable which contains an accept function with
an IVisitor argument (see Figure 7.7). This visitor design pattern [31] can be used to visit all
the elements in the model and process each element individual. Currently this is used for the
search mechanism and the xml exporter.
7.2.5.1 SemanticContainer
The SemanticContainer is the root element for the actual model and is like an assembly for
.NET or a JAR file for Java. It has a name and filename of the original analyzed element. A
strong typed collection of SemanticClass objects holds all the class information of the con-
tainer.
M.D.W. van Oudheusden 83
7.2 Semantic Model
Figure 7.4: Structure of the metamodel. Blue is a unit, red is an operand and green are the parts
actually providing semantical information.
7.2.5.2 SemanticClass
A class is a collection of encapsulated variables (fields) and operations. Classes exists in a
SemanticContainer and have a unique fully qualified name, type information, a scope and a
collection of SemanticField and SemanticOperation objects. The scope indicates if this class
is public or private accessible or if it is a shared class.
Figure 7.8 shows the SemanticContainer and SemanticClass classes.
7.2.5.3 SemanticOperation
A SemanticOperation represents a function containing a sequence of instructions. An oper-
ation is contained inside a class and has a unique name in that class. If the operation returns
data, then the type of this data is known. An operation has a number of collections;
SemanticArguments
The arguments or parameters of the function;
SemanticVariables
All the used local variables in the operation including the ,by the Semantic Extractor intro-
duced, variables;
SemanticConstants
The constant values used in the operation;
SemanticBlocks
The blocks with the actions.
84 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
Figure 7.5: Semantic Item and direct derived classes
M.D.W. van Oudheusden 85
7.2 Semantic Model
Figure 7.6: SemanticUnit and child classes
86 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
Figure 7.7: Visitor pattern in the model
Figure 7.8: SemanticContainer and SemanticClass classes
M.D.W. van Oudheusden 87
7.2 Semantic Model
Figure 7.9: SemanticOperation class
88 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
Figure 7.10: SemanticOperand class and derived classes
Figure 7.9 represents the SemanticOperation class with the associated collections.
7.2.5.4 SemanticOperand and Subclasses
As indicated in Section 7.2.3 there are four kinds of operands; arguments, fields, local variables,
and constants. They all inherit from the base class SemanticOperand which contains a name
and type for the operand.
Figure 7.10 shows the base class for the operands, the SemanticOperand class, and the child
classes. The SemanticConstant class has the ability to store the constant value. The Se-
mantic Extractor is responsible for suppling the correct value. The SemanticArgument and
SemanticVariable allow for the specification of the sequence in the list since not always the
name of the operand is available. In IL, these operands are addressed by their ordinal.
M.D.W. van Oudheusden 89
7.2 Semantic Model
Figure 7.11: SemanticBlock class with collection of SemanticAction
7.2.5.5 SemanticBlock
The SemanticOperation class has a collection of SemanticBlock objects, which contain
the SemanticAction objects. Figure 7.11 shows the SemanticBlock class with its col-
lection of SemanticAction. Each block has a unique name inside the operation and a
SourceLineNumber class provides a link back to the original source code. This can be useful
when you want to add or replace code contained in the block.
The class also contains functions to navigate to other blocks, such as the next and previous
block in the sequence. This does not imply that the next block is also the next block in the
control flow. If there is no control flow action (branching, returning, switching) as the last
action in the block, then the next block is also the next one in the sequence. If the extractor
has exception handling information, then it is possible to jump to the block which handles the
exception for the current block.
7.2.5.6 SemanticAction
This class contains the actual semantical action performed by one or more instruction in the
source. A graphical representation of this class is found in Figure 7.12.
Each SemanticAction object has an ActionType from the SemanticActionType enumeration.
Based on the ActionType some additional properties have meaning. For instance, if the type is
Branch, then the true and false label information makes sense.
Also present are the source and destination operands. Only two source operands can exists, so
90 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
Figure 7.12: The SemanticAction class with supporting types
M.D.W. van Oudheusden 91
7.2 Semantic Model
Figure 7.13: SemanticType class
expressions with multiple operands have to be normalized to a binary form.
7.2.5.7 SemanticType
The SemanticType class is the class used to store the type information. The extractor has to
select a general type from the list in Appendix E (Table E.1) to map the source type to the
corresponding common type. The extractor can also indicate if the type has a base type and/or
implements interfaces. Figure 7.13 shows a graphical representation of this class.
92 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
Figure 7.14: SemanticAttribute class
7.2.5.8 SemanticAttributes
As described in Section 6.1.10 the IL support the concept of custom attributes, special metadata
which can be applied to almost every item in the language. Other languages, such as Java
where it is called annotations, also supports a similar system. The Semantic Metamodel supports
this concept and allows the extractor to add multiple SemanticAttribute objects to all the
SemanticItem types.
As shown in Figure 7.14, the class has a SemanticType and a list of ISemanticOperand ob-
jects. The operands are normally constant values used for setting the properties of the custom
attribute.
7.2.6 Flow graphs
The Semantic Metamodel does not only contain the data, it also provides graph func-
tionality to work with the data. The SemanticOperation class has a function called
RetrieveSemanticControlFlow, which generates a control flow graph for the actions in-
side the operation. It uses a class called SemanticFlowGenerator which contains all the flow
operations. We can also call this class directly by supplementing a SemanticOperation as
parameter in the constructor.
The function GenerateSemanticControlFlow in this class generates the control flow and re-
turns this graph in the form of a SemanticControlFlow object. The SemanticControlFlow
has a collection of FlowBlock objects and a start and end FlowBlock. Each FlowBlock has
a unique name, a collection of SemanticBlock objects with the actions, and three lists. One
has all the successors and the other one has the predecessors. The third list contains all the
FlowBlock objects, which are control dependent for the current flowblock, e.g., these flow-
blocks contain branching and can thus control if the flowblock is reached.
The successors and predecessors are represented as FlowEdge objects. They indicate the target
flowblock and the reason for the link. This reason can be conditional, unconditional, exception
or fall through. The first two are used when the successor block is the result of a branch (con-
ditional) or a jump (unconditional) action. The exception is used when an exception is raised
before getting to the flowblock. The fall through simply connects blocks which follow each
other in the normal sequence.
M.D.W. van Oudheusden 93
7.2 Semantic Model
Figure 7.15: The flow classes
94 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
All the classes for the generation of flow graphs are depicted in Figure 7.15. The
GenerateSemanticControlFlow functions uses various algorithms to generate the Semantic
FlowGraph. It begins by splitting the SemanticBlock objects in the operation into flow blocks.
The splitting is based on the control flow actions as described in algorith 1.
input : A SemanticBlocks collection
output: A collection of FlowBlocks
semanticBlock ← SemanticBlocks[0];1
while semanticBlock = null do2
foreach semanticAction ∈ semanticBlock do3
hasLink ← false;4
switch actionType do5
case Jump6
Add unconditional flowedge;7
hasLink ← true;8
break;9
case Branch10
Add conditional flowedge to true block;11
Add conditional flowedge to false block;12
hasLink ← true;13
break;14
case Switch15
Add conditional flowedge for all switch labels;16
hasLink ← true;17
break;18
end19
end20
end21
if has exception handler link & different exception handler then22
hasLink ← true;23
Add exception flowedge;24
if next SementicBlock exists then25
Add fall through flowedge;26
end27
end28
if hasLink last SemanticBlock then29
Add flowBlock to output collection;30
else31
semanticBlock ← next block;32
end33
end34
Algorithm 1: GenerateSemanticControlFlow
The next step is to connect the successors and predecessors. Algorithm 2 shows how this is
M.D.W. van Oudheusden 95
7.2 Semantic Model
executed.
input : A flowBlock collection
output: A flowBlock collection with connected successors and predecessors.
// Connect successors;1
foreach flowBlock ∈ flowBlocks do2
foreach flowEdge ∈ flowBlock.Successors do3
Find SemanticBlock with target name;4
Set successor flowblock for flowEdge to found block;5
end6
end7
// Set the predecessors;8
foreach flowBlock ∈ flowBlocks do9
foreach flowEdge ∈ flowBlock.Successors do10
Add flowBlock to list of predecessors of the successor flowblock;11
end12
end13
Algorithm 2: Connect flow edges
We now have the successors and predecessors of all the FlowBlocks and can create the control
dependencies. Algorithm 3 shows how the DetermineControlDependency function is started
and algorithm 4 describes the actual function.
input : A flowBlock collection with connected successors and predecessors.
output: A flowBlock collection with control dependency information
foreach flowBlock ∈ flowBlocks do1
DetermineControlDependency (flowBlock);2
end3
Algorithm 3: Start DetermineControlDependency
input : A flowBlock collection with connected successors and predecessors.
output: A flowBlock collection with control dependency information
// Search through all the predecessors;1
foreach flowBlock ∈ flowBlock.Predecessor do2
foreach flowEdge ∈ flowBlock.Successors do3
if flowEdge = conditional then4
Add to dependency list;5
end6
end7
if predecessor not visited before then8
DetermineControlDependency (flowBlock);9
end10
end11
Algorithm 4: DetermineControlDependency
We also have a function to determine the flow paths, all the possible paths through the control
96 Automatic Derivation of Semantic Properties in .NET
7.2 Semantic Model
flow. Algorithm 5 shows how these are calculated using the internal function BuildFlowPath.
input : A flowBlock and a previousStack with flowblocks
output: A stack with FlowBlocks that forms a control sequence
if flowBlock.Successors.Count = 0 then1
return previousStack2
else3
foreach flowEdge ∈ flowBlock.Successors do4
Create temporaryStack;5
temporaryStack ← previousStack;6
if flowBlock ∈ temporaryStack then7
if flowEdge.Successor /∈ temporaryStack then8
Add flowBlock to temporaryStack;9
temporaryStack ← BuildFlowPath (flowEdge.Successor, temporaryStack);10
end11
else12
Add flowBlock to temporaryStack;13
temporaryStack ← BuildFlowPath (flowEdge.Successor, temporaryStack);14
end15
Add temporaryStack to global flow path collections;16
end17
end18
Algorithm 5: Determine Flow Paths
We now have the flow paths and the control dependencies so we can add the access levels to
the flow. An access level on a FlowBlock indicates the number of times the block is accessed.
Some blocks are only accessed once, some are always accessed. A block in a loop with a con-
dition at the start can be accessed multiple times, but a block in a loop with a condition at the
end is accessed at least once and maybe more. Algorithm 6 describes how the access level is
M.D.W. van Oudheusden 97
7.2 Semantic Model
determined.
input : A startFlowBlock, an endFlowBlock, flowPaths collection
output: SemanticControlFlow with access levels
foreach flowPath ∈ flowPaths do1
index ← flowPath.Count − 1;2
while index ≥ 0 do3
flowBlock ← flowPath [index ];4
if flowBlock = startFlowBlock flowBlock = endFlowBlock then5
accessLevel ← AtLeastOnce ;6
else7
Determine if flowBlock has itself as a parent;8
if HasParent then9
accessLevel ← MaybeMoreThenOnce ;10
else11
// Determine if flowPath is in a loop;12
loopBlock ← loop FlowBlock;13
if IsInLoop then14
if loopBlock has no control dependencies then15
accessLevel ← OnceOrMore ;16
else17
accessLevel ← MaybeMoreThenOnce ;18
end19
else20
if flowBlock has no control dependencies then21
accessLevel ← MaybeOnce ;22
end23
end24
end25
end26
index ← index − 1;27
end28
end29
Algorithm 6: Determine Access Levels
Plugins can use the flow graph capabilities of the model directly in their own analysis. Each
FlowBlock contains the Semantic Blocks and Actions so all the information is preserved. Al-
though SemanticBlocks are usually also split on control flow conditions, the FlowBlocks pro-
vide an optimized and more detailed representation of the control flow.
The flow analysis classes can be extended with more flow generators such as data dependency
graphs or call graphs. These are currently not implemented and the algorithms are also not
optimized.
98 Automatic Derivation of Semantic Properties in .NET
7.3 Extracting Semantics
7.3 Extracting Semantics
Extracting the semantics is the job of the Semantic Extractor. It reads and parses the source
language, builds the metamodel and converts the statements to corresponding actions.
7.3.1 Semantic Extractor Class
The SemanticExtractor class is responsible for the conversion. As described in section 7.1.4.1,
the extraction system is designed as a provider pattern with SemanticExtractor as the base
class. Applications wanting to convert code to the semantic model call this class, which on its
turn uses a specific provider to do the actual transformation.
This allows for the creation of multiple providers and switching between the providers is han-
dled by specifying the provider name. The providers use their own tools to read the source
code. Not only for .NET, but also providers for other source code like Java, Delphi, and so on
can be created.
Each provider must implement the SemanticExtractorProvider class and register itself in
the configuration file (see Appendix C). The SemanticExtractor will then load this configu-
ration file, initialize the providers and select the default provider.
Figure 7.16: Semantic Extractor Classes
Figure 7.16 shows the classes needed for the Semantic Extractor. An application can call the
Analyze function with the assembly name as parameter. This is passed to the correct provider
which returns a list of SemanticItem objects. Usually this will be a SemanticContainer
as it is the root of the metamodel tree structure, but a provider can also chose to return a
SemanticOperation or SemanticClass if this is more suitable.
M.D.W. van Oudheusden 99
7.3 Extracting Semantics
The application requesting the metamodel can work directly on the SemanticItem list or store
the objects in the SemanticDatabaseContainer. By using the database container, the devel-
oper has extra functionality to search and retrieve elements from the model. More about this
container in Section 7.4.
In Section 6.2, four different tools were introduced. These tools were used to create four differ-
ent providers, which are discussed in the next sections.
7.3.2 Mono Cecil Provider
The provider using Cecil, called SemanticProviderCecil, was relatively easy to create.
Opening and reading an assembly is an one line statement. The Cecil library returns an
AssemblyDefinition object where we can inquire for all the available types. By browsing
the types and their properties we can generate the structure of the semantic metamodel.
The important part is the method body. It provides direct access to the local variables de-
clared inside the method. To help us iterate over the IL instructions, Cecil provides an analysis
framework. This framework returns a control flow in the form of blocks with instructions.
These blocks are mapped to SemanticBlocks and the instructions of a block are given to an
AbstractInstructionVisitor.
This visitor has a method for each OpCode and holds an operand stack. Stack loading instruc-
tions, like loading constants, variables and so on, will place the operand onto the local stack
after converting it to a SemanticOperand object. When, for instance, an And instruction is vis-
ited, two operands will be popped from the stack and used as the source operands. The And
instruction places a result on the stack so we store the SemanticAction object in a temporary
variable. If a store instruction is visited, then it will find this previously stored action and assign
the operand to store as the destination operand for the action. See Listing 7.6 for an example.
1 Stack<ISemanticOperand> operandStack;
2 private ISemanticAction prevAction;
3
4 public override void OnLdarg(IInstruction instruction)
5 {
6 // Load the argument indexed by the operand onto the stack
7 LoadArgument((int)(instruction.Operand));
8 }
9
10 public override void OnAnd(IInstruction instruction)
11 {
12 // Create an Add action
13 ISemanticAction action = GetAction(SemanticActionType.And);
14
15 // Pop the two operands from the stack
16 action.SourceOperand2 = operandStack.Pop();
17 action.SourceOperand1 = operandStack.Pop();
18
19 // Store this action as the previous action
20 prevAction = action;
21 }
22
23 public override void OnStarg(IInstruction instruction)
24 {
25 // See if a previous action exists and assign the argument
100 Automatic Derivation of Semantic Properties in .NET
7.3 Extracting Semantics
26 // operand to the DestinationOperand property
27 if (prevAction != null)
28 {
29 prevAction.DestinationOperand = GetArgument((int)(instruction.Operand),
_operation.SemanticArguments);
30 prevAction = null;
31 }
32 }
Listing 7.6: Part of the Cecil Instruction Visitor
Although a provider using Cecil could be created with little effort, it has some problems. We
still have to retrieve the type of the operands. Sometimes this can be induced from the operator,
such as OnLdc_I4 which loads a 32 bit integer onto the stack. However, in most cases it is
difficult to get the correct type since this is not specified.
The flow analysis library of Cecil had problems with more complex method bodies. For in-
stance, methods with exception handling could not be converted to a block representation.
Cecil could not cope with the new language elements of .NET version 2.0 such as generics. It is
possible that this is corrected in a newer version of Cecil.
7.3.3 PostSharp Provider
The PostSharp provider is not very different from the Cecil provider. It also has the possibility
to create blocks separating the control flow. Instead of a Visitor pattern to visit each instruction,
it uses an instruction reader, a stream of instructions. As long as there are instructions in the
stream, we can read the current instruction, convert it and read the next one.
1 InstructionReader ireader = method.Body.GetInstructionReader();
2 ForEachInstructionProcessBlock(b.RootInstructionBlock, ireader);
3
4 InstructionBlock iblock = b.RootInstructionBlock;
5
6 while (iblock != null)
7 {
8 ireader.EnterInstructionBlock(iblock);
9 InstructionSequence iseq = iblock.GetFirstInstructionSequence();
10 ireader.EnterInstructionSequence(iseq);
11
12 while (ireader.ReadInstruction())
13 {
14 Console.WriteLine("Read instruction {0}", ireader.OpCodeNumber);
15 if (ireader.OpCodeNumber == OpCodeNumber.Ldloc_0)
16 {
17 // perform actions based on OpCode
18 }
19 }
20 iblock = iblock.NextSiblingBlock;
21 }
22 method.ReleaseBody();
Listing 7.7: Using the instruction stream in PostSharp
Listing 7.7 lists the statements needed to read the instructions in a method body. The Post-
Sharp provider is not further developed. PostSharp was still in its early phases of development
M.D.W. van Oudheusden 101
7.3 Extracting Semantics
and did not always read assemblies correctly. Spending more time on this provider was not
advisable at the time.
At the time of writing of this thesis, PostSharp appears to be more mature and does support
the .NET version 2.0 assemblies.
7.3.4 RAIL Provider
RAIL, the Runtime Assembly Instrumentation Library, could not even load an assembly. The
documentation and samples were, at the time, very limited and the program crashed fre-
quently. The provider is thus not further implemented.
7.3.5 Microsoft Phoenix Provider
Because Microsoft Phoenix was the best analysis tool to use at the time of implementation, a
lot of effort has been put in the development of the Phoenix provider. It consists of two main
parts, the provider itself and an analysis phase.
When the provider is called to analyze an assembly, it will use a PEModuleUnit to read the
assembly into memory (see Listing 7.8). It retrieves the Symbol for the unit and uses this symbol
to walk through the assembly. Symbols are placed in the symbol tables of a unit and provide
the metadata of the elements in a unit.
1 peModuleUnit = Phx.PEModuleUnit.Open(assemblyName);
2 peModuleUnit.LoadGlobalSyms();
3
4 Phx.Syms.Sym rootSym = peModuleUnit.UnitSym;
5
6 WalkAssembly(rootSym);
Listing 7.8: Loading the assembly using Phoenix
The WalkAssembly method builds a SemanticContainer based on the information in the root
symbol. It uses iterators to find all the sub elements like the classes, methods, fields, and
so on. When the assembly walker finds a method it create a SemanticOperation object, re-
trieves and assigns the parameters, return type, and other properties and adds this object to
the SemanticClass it belongs to.
To retrieve the message body and thus the instructions, the analysis phase is used. The function
is raised to another representation level in the Intermediate Representation (IR). It will use a
number of phases to perform this action and the provider adds our own phase to the list.
Listing 7.9 shows the code performing this action.
1 // Get the FuncUnit and raise it
2 Phx.FuncUnit funcUnit = funcSym.FuncUnit;
3 funcUnit = peModuleUnit.Raise(funcSym, Phx.FuncUnit.
LowLevelIRBeforeLayoutFuncUnitState);
4
5 // Prepare a config list
6 Phx.Phases.PhaseConfig config =
7 Phx.Phases.PhaseConfig.New(peModuleUnit.Lifetime, "temp");
8
9 // Create a new phase
102 Automatic Derivation of Semantic Properties in .NET
7.4 Querying the Model
10 SemanticExtractorPhase semtexPhase = SemanticExtractorPhase.New(config);
11 semtexPhase.SemanticOperation = semOp;
12 Phx.Phases.PhaseList phaseList = Phx.Phases.PhaseList.New(config, "SemTex Phases");
13
14 // Add our phase and provide IR dump
15 phaseList.AppendPhase(semtexPhase);
16 phaseList.DoPhaseList(funcUnit);
17
18 // Place the result in the semOp variable
19 semOp = semtexPhase.SemanticOperation;
Listing 7.9: Starting a phase for a function using the Phoenix Extractor
The SemanticExtractorPhase is the class responsible for converting the instructions to ac-
tions. As its input, it uses a FuncUnit, the fundamental compilation unit in Phoenix, represent-
ing a function. Using the graph functionalities of Phoenix, a control flow graph is created. This
provides us with the Semantic Blocks.
Converting IL instructions to actions is similar to Cecil. An internal stack is used to keep track
of all the stack loading and storing actions. Instructions using operands can retrieve the correct
operand from this stack. Determining the type of the operand values is aided by Phoenix own
type system. Phoenix is able to retrieve the type of the operands so this information can be
stored in a SemanticType object. A huge advantages above Cecil and PostSharp.
Furthermore, Phoenix is able to include information stored in the associated debug files, into
the analyzed assembly. This gives us information about the names of the local variables, which
is not stored in the IL code. Another usage of this debug file is the ability to link actions and
blocks to points in the original code since line numbers are available in the debug file.
Phoenix creates blocks, where each block ends with either a branch instruction or an instruction
that can cause an exception and each block starts with a unique label. This gives us more blocks
then intended, so a special block optimization is performed. The optimization algorithm is
listed in Algorithm 7 and removes empty blocks, which are a result of Phoenix, and updates all
the references to the correct blocks. As input we have a list of label identifiers and associated
blocks.
Phoenix provides a lot of functionality to analyze .NET assemblies. For instance, the control
flow support, the ability to read the debug files, direct access to type information, and so on.
However, Phoenix is still under development and documentation is scarce. Because it is used
internally by Microsoft, the support and development are still going on and getting better.
7.4 Querying the Model
The Semantic Extractor converts the source code to the Semantic Metamodel. An application can
now use this model for further analysis. However, this means the application has to traverse
the whole model each time it wants to search for an item. To facilitate the use of the metamodel,
a database container is created, which has a mechanism for searching in the model. This section
provides more details about the database and the search options. The next chapter provides
some practical examples how this search system can be used.
M.D.W. van Oudheusden 103
7.4 Querying the Model
input : A collection of labels with associated blocks
output: Optimized collection of Semantic Blocks
// Remove blocks with zero actions;1
foreach semanticBlock ∈ semanticBlocks do2
if semanticBlock has no actions then3
Get the next block;4
Update all references to current block to the next block;5
Remove the empty block;6
end7
end8
// Connect the blocks;9
foreach semanticBlock ∈ semanticBlocks do10
// Connect the next and previous blocks;11
if not the first semanticBlock then12
Connect previous block to semanticBlock;13
end14
if not the last semanticBlock then15
Connect next block to semanticBlock;16
end17
// Connect the exception handling block;18
if exception handling block exists then19
Connect exceptionhandlerblock to semanticBlock;20
end21
// Connect the actions to the correct blocks;22
foreach semanticAction ∈ semanticBlock do23
switch semanticAction.ActionType do24
case Jump RaiseException25
Set the semanticAction.LabelName to the correct block;26
case Branch27
Set the semanticAction.TrueLabelName to the correct block;28
Set the semanticAction.FalseLabelName to the correct block;29
case Switch30
foreach SwitchLabel ∈ semanticAction.SwitchLabels do31
Set the semanticAction.SwitchLabel to the correct block;32
end33
end34
end35
end36
end37
Algorithm 7: Optimization of Semantic Blocks
104 Automatic Derivation of Semantic Properties in .NET
7.4 Querying the Model
Figure 7.17: SemanticDatabaseContainer class
7.4.1 Semantic Database
The Semantic Database library contains the classes to store SemanticItem objects, which are
the root of the semantic objects and provides a system to search in the metamodel.
The SemanticDatabaseContainer holds a collection of SemanticItem objects indexed by
their object hash code. An application wanting to store a metamodel in this database, calls
the StoreItem function and passes the object to store as its parameter.
The functions GetContainer and GetContainers provide direct access to the SemanticContainer
objects in the database. The Query function allows the developer to search in the metamodel.
Finally, the ExportToXml function exports the metamodel to an eXtensible Markup Language
(XML) format. The purpose of this function is not to provide save functionality so the model
can be loaded back from file, but only to create a more user friendly view of the model. Full
support for serialization can be added, but is currently not implemented. Figure 7.17 shows
the SemanticDatabaseContainer class and the predicate used by the query function.
7.4.2 What to Retrieve
First we have to consider what type of information, and in what way, we want to retrieve from
the Semantic Metamodel.
For instance, we might want to find all the assignments of an argument to a global field (a
setter method). This assignment resides inside a semantic operation. From this it is clear that
we must search for an action with an action type of assignment inside an operation where the
source operand of this action is an argument and the destination operand is of type field. The
M.D.W. van Oudheusden 105
7.4 Querying the Model
names of the source and destination are not important. We also do not know in which class this
operation may be so we do not specify this.
This should return a list of actions of the type assignment. Each action has a link to the block
it resides in and the name of the operation, class and container. Using this information it is
possible to find other elements related to this action. Instead of searching for actions it should
also be possible to search for all blocks with an assignment action inside. This will return a list
of all the blocks. At that point we can search for all the actions inside one of those blocks. The
same applies for operations. By retrieving the link to the operation from an action we can get
all the actions inside the operation.
How to find out if an action depends on a comparison? This is based on the block where
the action resides in. The action has a link to its parent block. Using this block, we have
information about the control flow and can find which actions (the branching actions) lead to
this block. Instead of retrieving the block we can also retrieve all the actions of the operation
which are referring to this block. If there are none, then there was no branching to this block
and the action inside. If the search returns actions which link to this block, we can find out if
these branching actions where conditional or not. If so, we can retrieve the comparison type
and the values used by the compare function.
Of course it is also possible to retrieve information like which variables are declared inside an
operation or what are the arguments of an operation. This are not directly semantically related
data but can be needed for reasoning about the semantics. We could create a query to select all
the operations where we look at the arguments and one of those arguments should be of type
ReifiedMessage. Since we do not have types in the semantic model we pass this type as a string.
As you can see there are three parts in the query needed to retrieve the correct data. The three
types are as follows:
What we want to return
This is one of the child objects of the SemanticItem object. We always have to indicate
what type of objects we need back from the database;
Where we want to search in
Search in actions, blocks, operations, and so on. Even if we search in operations, we can
still return classes which are the parents of these operations;
What we want to search for
This specifies the values of the properties of the element to search for. For instance; the
type of the destination property must be a field and the action type itself must be a com-
parison. Every public property of the element we search in can be used in this condition.
Another requirement of the search system, is the ability to search in the search results. So we
can use the returned values for further queries. This allows us to work with already found
data, instead of searching for the same data with more detailed search parameters.
7.4.3 Query Options
To retrieve the correct data we must ask the semantic database to retrieve this from the object
store using some sort of interface. There are various ways to implement a querying mechanism
which will be discussed with their positive and negative points.
106 Automatic Derivation of Semantic Properties in .NET
7.4 Querying the Model
7.4.3.1 Predicate Language
A predicate is an expression that can be true of something. For example, the predicate HasAc-
tionOfType(X) can be true or false depending on the value of X. By using this type of logic and
combining predicates we can indicate what type of data we want returned.
However due to the complexity of all the semantic information it is labor intensive to create
a predicate language to cover all the elements. All the information has to be converted to
predicates which will lead to many different predicates to capture all the semantics available.
Changes to the model also means making changes to the predicate language.
Furthermore, support for predicates in the .NET language is not directly available. We have to
use some sort of Prolog interface to integrate predicate queries in the database search system.
7.4.3.2 Resource Description Framework
The Resource Description Framework1 (RDF) is a general-purpose language for representing
knowledge on the Web. An xml syntax is used to define triplets of information and so creating
RDF Graphs. This leads to relations between elements identifiable using Web identifiers (called
Uniform Resource Identifiers or URIs).
RDF does provide a way of defining relations, and thus extra data about objects, and has a
standardized framework for working with this data. However, setting up a model with all
the information before hand is time consuming and inefficient. This would mean we have to
convert the metamodel to another xml based model. It has an advantage for interoperability
since xml is language independent format and supported on a wide variety of platforms.
7.4.3.3 Traverse Over Methods
This technique allows the developer to call a function of the SemanticDatabaseContainer and
passing the search values as parameters. This method will then return the found values. There
are various ways to pass the values. These could be delegates to other functions, a text based
value (e.g. “type=’parameter’”) or events which will be raised by this function. Internally it
will traverse all the information in the metamodel, checks the information against the supplied
arguments and return the found items.
By using this method, the Semantic Database does not have to create a new representation
of its data. It will look through all the data inside the metamodel and passes this data to
the call back functions to do actual comparison. It also provides the developer of the plug-
in a lot of flexibility because he can create its own comparison functions. This could also be
a disadvantage because of the complexity. Retrieving information this way means multiple
functions or events have to be generated. Changes in the model cannot be detected at compile
time while using text based values.
1
http://www.w3.org/RDF/
M.D.W. van Oudheusden 107
7.4 Querying the Model
7.4.3.4 Object Query Language
An Object Query Language (OQL) based on SQL, the Structured Query Language for
databases, gives a mechanism to define a query using standard query elements like SE-
LECT, FROM, WHERE etc. The queries will be parsed and converted to an AST. The parsing
should be performed by the Semantic Database.
The advantage of this system is the expressiveness of OQL. The three parts of our query can
be expressed using the OQL expression. The SELECT indicates what we want to return, the
FROM part indicates where we want to search and the WHERE part specifies the conditions.
An OQL expression can be passed to the Semantic Database query function. Since it is actually a
string, it is also possible to pass this over another transport medium like a web service, files, and
so on. This means more separation between different platforms because of the independence
of certain languages.
A disadvantage of using OQL is the fact that the query is composed of strings. So it is not
strong typed and syntax checking must be performed by the parser. Any detected exceptions
are raised and returned to the caller at runtime.
Keep in mind that the implementation will not be standard ODMG1 OQL since we have a
predefined object database which will be searched through by the Semantic Database using
its own mechanism. The OQL language is only used to pass a search string to the underlying
query function.
7.4.3.5 Simple Object Database Access
With the Simple Object Database Access (SODA)2 system a Query object is used to find data in
the underlying objects store. An example using SODA in C# can be seen in Listing 7.10.
1 Query query = database.Query();
2 query.Constrain(typeof(Student));
3 query.Descend("age").Constrain(20).Smaller();
4 IList students = query.Execute();
Listing 7.10: SODA example in C#.NET
By using the constrain and descend methods on the query object it is possible to add con-
straints to the query and look up descendants of objects.
SODA enforces a more strict integration with the query object then OQL since OQL uses simply
a string to pass all the information instead of object operations. However, the values passed to
the constrain and descend methods are not type safe and only tested at runtime.
1
Object Data Management Group The Standard for Storing Objects. http://www.odmg.org/ or http://
java.sun.com/products/jdo/overview.html
2
http://sourceforge.net/projects/sodaquery/
108 Automatic Derivation of Semantic Properties in .NET
7.4 Querying the Model
7.4.3.6 Native Queries
Native queries are a relative new technique to define search queries [18, 17]. Microsoft is ex-
perimenting with this in the form of LINQ1, which is expected to be released in .NET version
3.0.
Most query systems are string based. These queries are not accessible to development envi-
ronment features like compile-time type checking, auto-completion and refactoring. The de-
velopers must work in two different languages; the implementation language and the query
language.
Native queries provide an object-oriented and type-safe way to express queries native to the
implementation language.
A simple example in C# 2.0 is shown in Listing 7.11, while the same example in Java is listed
in Listing 7.12.
1 List <Student> students = database.Query <Student> (
2 Delegate(Student student) {
3 return student.Age < 20
4 && student.Name.Contains("f");
5 });
Listing 7.11: LINQ query example in C#.NET
1 List <Student> students = database.query <Student> (
2 new Predicate <Student> () {
3 public boolean match(Student student){
4 return student.getAge() < 20
5 && student.getName().Contains("f");
6 }
7 });
Listing 7.12: LINQ query example in Java
A native query uses anonymous classes in Java and delegates in .NET. The use of generics is
recommended for a strongly typed return value.
It is still possible to use native queries with .NET 1.1 by means of a special Predicate class. An
example in C# 1.1 (in Java this will also apply) shows this construction (Listing 7.13).
1 IList students = db.Query(new StudentSearch());
2
3 public class StudentSearch: Predicate {
4 public boolean Match(Student student) {
5 return student.Age < 20
6 && student.Name.Contains("f");
7 }
8 };
Listing 7.13: LINQ query example in C#.NET 1.1
For a programmer, the native queries are a safe and quick way to create queries as they are
in the same native language. There is however the dependency on newer techniques, like
delegates and generics. Still older systems can be supported by using the Predicate class.
1
http://msdn.microsoft.com/data/ref/linq/
M.D.W. van Oudheusden 109
7.4 Querying the Model
Note that the latter may result in a sort of Traverse over Methods functionality where special
functions have to be made to support the querying.
7.4.3.7 Natural Language Queries
A natural language query is one that is expressed using normal conversational syntax; that is,
you phrase your query as if creating a spoken or written statement to another person. There
are no syntax rules or conventions to learn. For instance a natural language query can be “what
are the write operations on field X in class Y?”.
Although this gives a lot of possibilities and flexibility in asking questions, it is very difficult
to extract the essential information from this query. The query has to be analyzed and the
semantics must be extracted. The actual search can only be performed when there is enough
information about the meaning of the query. Because of the complexity of Natural Language
Queries it is out of the scope of this assignment.
7.4.4 Native Queries in Detail
Bases on the advantages and disadvantages of the query options, a decision was made to use
native queries. Its major advantage is that we do not have to parse a text based query and check
for all kinds of constraints. Another advantage is the native support of the developer environ-
ment (IDE) for this type of query. We can have type safe, object-oriented, and refactorable code
and possible errors are visible on compile time instead of runtime. We can use elements of the
IDE like intellisense for automatically completing of code, and tooltips for information about the
code.
Microsoft is using native queries in LINQ, their Language Integrated Query system and it will
be available in the next version of the .NET Framework as it requires a new compiler. This
is needed because it not only depends on existing technologies in .NET version 2.0, but also
needs new technologies.
The techniques from version 2.0 are generics, delegates and anonymous methods and allow us to
create type safe, inline queries. For example, Listing 7.11 shows the use of a generic to create
a list consisting of Student objects and the delegate makes it possible to define an anonymous
method inside the code, instead of creating a separate new method.
In the next version of the .NET Framework, we will also have anonymous types, lambda expres-
sions and extension methods. Anonymous types have no meaning during programming, but
get a real type during the compilation and the type will be deduced from the contents of the
anonymous type. A lambda expression is a form of anonymous method with a more direct and
compact syntax. These lambda expressions can be converted to either IL code or to an expres-
sion tree, based on the context in which they are used. Expression trees are, for instance, used
by DLINQ, a system to convert native queries to database SQL queries. With extension meth-
ods, existing types can be extended with new functions at compile time. This is used in the
next version of .NET to add query operations to the IEnumerable<T> interface, the interface
used by almost all the collections.
LINQ also introduces a new query syntax, which simplifies query expressions with a declar-
ative syntax for the most common query operators: Where, Select, SelectMany, GroupBy, Or-
110 Automatic Derivation of Semantic Properties in .NET
7.4 Querying the Model
derBy, ThenBy, OrderByDescending, and ThenByDescending. We can use the query expres-
sions, the lambda expressions, or delegates to get the same results (see Listing 7.14).
1 IEnumerable<string> expr =
2 from s in names
3 where s.Length == 5
4 orderby s
5 select s.ToUpper();
(a) query expressions
1 IEnumerable<string> expr = names
2 .Where(s => s.Length == 5)
3 .OrderBy(s => s)
4 .Select(s => s.ToUpper());
(b) lambda expressions
Listing 7.14: LINQ query examples
LINQ can provide us with a technique to search in objects, however it is not yet available. A
similar system, partly based on LINQ, is created for the Semantic Database. It uses the generics,
delegates, and anonymous methods available in .NET version 2.0. Expression trees are not used
and the query itself, called the predicate, is executed for each element, which complies to the
requested type, in the metamodel.
To determine which type of the semantic model should be searched in and what type should
be returned, the query function requires two generic types as shown in Listing 7.15.
1 public ExtendedList<OutType> Query<InType, OutType>(Predicate<InType> match)
Listing 7.15: Query function signature
The InType determines the type we are looking for, such as a SemanticAction or a
SemanticClass. This same type is also used in the predicate, a delegate returning a boolean.
The predicate contains the query and must always return true or false based on some sort of
comparison with the InType. OutType is used to indicate the type we want to return. We
might be searching through all the actions inside an operation, but want to return the classes
containing the found actions. This means we upcast the action to a class. It is not allowed to
downcast so searching for classes and returning actions is not possible.
The query function uses the visitor pattern to visit all the objects in the metamodel. The
NativeQuerySearchVisitor is the visitor implementation responsible for executing the pred-
icate for each relevant type in the model. The InType is used to determine if the visited element
should be evaluated. If this is the case, the element is converted to the InType and the predicate
is used to execute the query. If there is a match, the element is converted to the OutType and
added to the result list. Listing 7.16 shows the code for this function. The conversion function,
SafeConvert, knows about the relations between the elements and can perform the upcasting
to other types.
1 private void CheckForMatch(SemanticItem si)
2 {
3 if (inType.Equals(si.GetType()))
4 {
5 InType x;
6 x = SafeConvert<InType>(si, false);
7 if (predicateMatch(x))
8 {
9 OutType y;
10 y = SafeConvert<OutType>(si, true);
11 if (y != null) results.Add(y);
M.D.W. van Oudheusden 111
7.4 Querying the Model
12 }
13 }
14 }
Listing 7.16: Predicate matching
An overloaded query function is available with a parameter used to provide a starting point
for the search. This starting point is a SemanticItem and can be used to search in parts of the
metamodel.
The query function returns the found elements as a strong typed ExtendedList<T>. This is a
extended List<T> class with functionality to further search in the results. LINQ uses method
extensions to add the same functionality to the normal list classes, we have to create our own
list. All the lists in the semantic model are an instance of the ExtendedList and as such are
searchable.
The ExtendedList<T> class is shown is Figure 7.18 and provides the following operators;
Restriction operator
Return a new ExtendedList filtered by a predicate using the Where operator.
Projection operator
The Select operator performs a projection over the list.
Partitioning operators
The Take, TakeWhile, Skip and SkipWhile operators partition the list by either taking
elements from the list or skipping elements before returning the elements.
Concatenation operator
A Concat operator concatenates two lists together.
Ordering operators
Sorting of the lists is handled by the OrderBy and OrderByDescending operators.
Grouping operators
The GroupBy operator groups the elements of the list.
Set operators
To return unique elements, use the Distinct operator. An Union and Intersect oper-
ator are used to return the union or intersection of two lists, while the Except operator
produces the set difference between two lists.
Conversion operators
The ExtendedList can be converted to an array, dictionary, or a list.
Equality operator
To check whether two lists are equal, use the EqualAll operator.
Element operators
Used for returning the first, last or a specific element in the list.
Generation operators
Create a range of numbers or repeat the creation of an element a number of times.
Quantifiers
To check whether the list satisfies a condition for all or any element.
Aggregate operators
Used for counting the elements, getting the minimum or maximum, the sum of, or the
average of values. A Fold operator applies a function over the elements in the list.
Most of the operators return a new ExtendedList so it is possible to combine operators as
shown in Listing 7.17. Because the ExtendedList<T> inherits from the List<T> class, it will
112 Automatic Derivation of Semantic Properties in .NET
7.4 Querying the Model
Figure 7.18: ExtendedList class
M.D.W. van Oudheusden 113
7.4 Querying the Model
also contain the functionality specified in the base class. The ExtendedList contains full intel-
lisense1 support to help developers understand the functions and the Semantic Metamodel has
added functionality to make it efficient to use in the queries. For example, it contains function
to check whether there is an operand, if an operand is a field, does an operation have a return
type, and so on (see Figure 7.12).
1 ExtendedList<SemanticOperation> operations =
2 db.Query<SemanticAction, SemanticOperation>(delegate(SemanticAction sa)
3 {
4 return (sa.HasDestinationOperand &&
5 sa.DestinationOperand.IsSemanticField &&
6 sa.DestinationOperand.Name.Equals("name"));
7 }).Distinct();
Listing 7.17: Return all distinct operations containing actions assigning a value to a field named
”name”
1
A form of automated autocompletion and documentation for variable names, functions, and methods using
metadata reflection inside the Visual Studio IDE.
114 Automatic Derivation of Semantic Properties in .NET
CHAPTER 8
Using the Semantic Analyzer
Chapter 7 provides us with the design of the Semantic Analyzer and introduces the query system
based on native queries (Section 7.4.4). In this chapter, examples of the usage of the Semantic
Database with the native queries are give, we examine the different applications using the ana-
lyzer, and discuss the integration of the analyzer in the Compose /.NET project.
8.1 Semantic Database
If the metamodel is stored in the Semantic Database, there three ways to retrieve it. Use the
GetContainers function to get the root elements of the structure, implement a visitor to visit
all the elements, or use the Query function to execute a search.
The first method allows a developer to get access to the root element of the hierarchical tree of
items in the model and he can iterate over all the elements. The second method does basically
the same thing, but allows the developer to specify what type of action must be performed
based of the different kinds of elements inside the model. The third method provides an effi-
cient way to retrieve specific elements from the model and is described in more detail in this
section.
The Semantic Database uses native queries to specify the query expressions in the same native
language as the development language. To execute a query, the search terms must be enclosed
inside a delegate and supplied to the Query function of the SemanticDatabaseContainer.
Instead of creating a separate delegate, it is possible to use anonymous methods, so the delegate
can be passed to the Query function as a parameter. An example is displayed in Listing 8.1.
1 ExtendedList<SemanticAction> callActions =
2 db.Query<SemanticAction,SemanticAction> (
3 delegate(SemanticAction sa)
4 {
5 return sa.ActionType==SemanticActionType.Call;
115
8.1 Semantic Database
6 })
7 .OrderBy<string>(delegate(SemanticAction sa)
8 {
9 return sa.ParentBlock.ParentOperation.Name;
10 });
Listing 8.1: Search for all call actions and sort by operation name
In Listing 8.1 a search for all the SemanticAction objects calling methods is performed. The
delegate is used as an anonymous method directly as the parameter of the Query function. We
indicate we want to search in all the SemanticAction objects in the model, by specifying this
as the first generic type of the Query function. We want a list of SemanticAction objects to
be returned, so we specify this type as the second generic type of the function (line 2). The
delegate is the first parameter of the Query function and contains the actual query in the form
of a boolean expression (lines 3–6). In this example, the ActionType of the SemanticAction
must be a Call type.
We also order the results by the name of the operation. As the Query function returns an
ExtendedList<SemanticAction> we can access the OrderBy function of this list (lines 7–10).
Again, we use a delegate to specify the string to order by. In this case, the semantic action has
a link to its parent block and this block links back to the operation it resides in. From there, we
can access the name of the operation and use it as the sort key.
We can use all the public properties of the semantic items in our query. Special properties have
been added to facilitate common tasks, like checking for null values, or determining the type of
an operand. Some examples of this type of functions are HasDestinationOperand, HasArguments,
IsSemanticField, or HasReturnType.
Since the query is actually a boolean function, other operations can be included as long as the
query returns a boolean. However, the query is evaluated for each selected type in the model
and thus a query containing complex code is not advisable, due to processing costs.
Listing 8.2 is another example and shows the ability to return a different type than searched
for.
1 ExtendedList<SemanticOperation> ops =
2 db.Query<SemanticAction,SemanticOperation>(
3 delegate(SemanticAction sa)
4 {
5 return (sa.HasDestinationOperand &&
6 sa.DestinationOperand.IsSemanticField &&
7 sa.DestinationOperand.Name.Equals("value"));
8 }
9 ).Distinct();
Listing 8.2: Find all operations using a field named value as their destination operand
In this example, we want to find all the operations with an action where the destination
operand is a field named value. Because this field is the destination operand, it is potentially1
written to.
We supply two different types to the Query function. The first one is the type we are looking
for (actions) and the second one is the type we want to have returned (operations). In the
1
The action with the destination operand might be in a conditional block and as such is not always executed.
116 Automatic Derivation of Semantic Properties in .NET
8.1 Semantic Database
search query, we indicate that the SemanticAction object must contain a destination operand,
that this operand must be a field, and the name of this field must be equal to value. Keep in
mind, that if we change the order of the elements, we can get null reference exceptions if we are
trying to access non-existent destination operands. The Distinct command at line 9 signals the
ExtendedList object to remove duplicate elements. Placing this command here has the same
effect as applying it directly to the ops variable.
To show the usage of the grouping operator, see Listing 8.3. This example finds all the actions
performing a jump to another block and group these actions by the name of the operation they
are in. Lines 12 to 19 show how to use the Grouping class to display these items.
1 ExtendedList<Grouping<string, SemanticAction>> groupedBy =
2 db.Query<SemanticAction, SemanticAction>(
3 delegate(SemanticAction sa)
4 {
5 return (sa.ActionType == SemanticActionType.Jump);
6 })
7 .GroupBy<string>(delegate(SemanticAction a)
8 {
9 return a.ParentBlock.ParentOperation.Name;
10 });
11
12 foreach (Grouping<string, SemanticAction> element in groupedBy)
13 {
14 Console.WriteLine("Jumps in operation {0}", element.Key);
15 foreach (SemanticAction sa in element.Group)
16 {
17 Console.WriteLine("--Jump to {0}", sa.LabelName);
18 }
19 }
Listing 8.3: Group jump labels by operation name
The last example, Listing 8.4, shows how to find all actions assigning a value to an operand
of type integer. To determine the type, we use the SemanticType object of the operand and
indicate we want the type to be an integer (line 8). The check for the existence of a destination
operand (line 6) is not really needed, since an assignment always uses a destination operand.
However, if the Semantic Extractor did not assign a destination operand, the query can raise an
exception at runtime.
1 ExtendedList<SemanticAction> actions =
2 db.Query<SemanticAction, SemanticAction>(
3 delegate(SemanticAction sa)
4 {
5 return (sa.ActionType == SemanticActionType.Assign &&
6 sa.HasDestinationOperand &&
7 sa.DestinationOperand.SemanticType.CommonType ==
8 SemanticCommonType.Integer);
9 }
10 );
Listing 8.4: Retrieve all the assignments where an integer is used
M.D.W. van Oudheusden 117
8.2 Applications
8.2 Applications
Besides the metamodel, the extractors and the database, two other programs were created. One
is a command line utility primarily used for testing, the other is a Windows Forms application
showing the contents of the metamodel and the control flow in a graphical way.
The console application, called SemanticExtractorConsole, accepts as its command line argu-
ments a number of assemblies, a list of plugins, and optional settings. It uses the default
Semantic Extractor Provider to analyze the .NET assemblies, stores the results in the Semantic
Database, and executes the plugins. Each plugin is called with a pointer to the database and can
perform its own analysis. This tool is primarily used for testing the complete system because
tasks can be automated using the command line switches and the plugins. Plugins are .NET
assemblies implementing a certain interface. It is also possible to use a source file as a plugin
and the console application will compile this source in-memory first. The source file must be a
valid C# or VB.NET class implementing the plugin interface.
The Windows Forms application provides a graphical user interface (GUI) to open an assembly.
Internally, it calls the default Semantic Extractor Provider and creates a graphical representation
of the model. Figure 8.1 shows a screenshot of this applications. In the tree view at the up-
per left part of the window, the metamodel is displayed. By selecting an element in this tree,
detailed information in the property grid in the lower left part is shown.
Additional support for displaying the correct data and providing metadata like descriptions of
each property has been added to the Semantic Metamodel using custom attributes. If the selected
element is an operation, a control flow graph is created and displayed in the upper right part. If
the user clicks on a flow block, then the contents (the blocks with actions) is listed in the actions
pane.
Figure 8.1: Windows Forms Semantic Analyzer application
118 Automatic Derivation of Semantic Properties in .NET
8.3 Plugins
8.3 Plugins
Various plugins are created to test the system and to provide specific analysis tasks. Each plu-
gin must implement the SemanticLibrary.Plugin.IPlugin interface as shown in Figure 8.2.
Figure 8.2: Plugin interface
The SemanticExtractorConsole calls the Execute function for each plugin with as its arguments
the SemanticDatabaseContainer, which holds all the metamodels, and a dictionary of com-
mand line options not interpreted by the console application itself. This allows the user to
supply settings for the plugin. A plugin can write to the console window and when it has an
exception, it can simply raise this exception so the SemanticExtractorConsole can handle the er-
ror and display a message. The read-only IsEnabled property allows a plugin to be disabled
when needed. As a result it will not be executed by the SemanticExtractorConsole application.
Plugins provide a way to create separate analysis tasks and allow for the execution of multiple
plugins with one command. They are not part of the metamodel nor the Semantic Database
and are only used by the console application. The next sections give more details about the
implemented plugins.
8.3.1 ReifiedMessage Extraction
This plugin is used to provide a partially solution to the problem discussed in Section 4.1.2,
the ReifiedMessage. The use of aspects on a certain piece of code, changes the behavior of this
code. Not all the behavioral changes are expected and may even lead to conflicts [24, 72]. In
Compose we use the FILTH module, with order specification, to compute all the possible
orders of the filter modules and select one of those orders. The SECRET module, the Semantic
Reasoning Tool, reasons about the possible semantic conflicts in the ordering of filter modules.
It does this by analyzing all the possible actions of the filters, which either accept or reject a
message. However, the meta filter introduces behavior that the SECRET module cannot handle,
because the function called by the meta filter is defining the semantics of the filter. The function
executed by the meta filter has an argument containing a ReifiedMessage object, representing
the message. It can retrieve the target or the arguments, but also change the execution of the
message, like resume the regular execution of the filterset or reply, which returns the message
to the sender.
Currently developers writing a function using a ReifiedMessage should define the behavior of
this message as a custom attribute (see Section 6.1.10). This is a time consuming process, often
M.D.W. van Oudheusden 119
8.3 Plugins
not executed or updated and error prone. The Semantic Analyzer might be helpful here.
The task of this plugin is to determine the behavior of the usage of the ReifiedMessage and
report this to the developer. He/she can then add the custom attributes to the code. Because it
is not possible to capture all the intended behavior, the custom attributes are not automatically
inserted so they can be reviewed by the developer.
We first need to find all the operations using a ReifiedMessage object in their arguments. List-
ing 8.5 shows the query used to find this operations. The reifiedMessageType constant
(line 7) contains the full name of the type used by Compose for the ReifiedMessage.
1 ExtendedList<SemanticOperation> ops =
2 db.Query<SemanticOperation, SemanticOperation>(
3 delegate(SemanticOperation obj)
4 {
5 foreach (SemanticArgument arg in obj.SemanticArguments)
6 {
7 if (arg.SemanticType.FullName.Contains(reifiedMessageType))
8 return true;
9 }
10 return false;
11 });
Listing 8.5: Find operations using a ReifiedMessage
The next step is is to iterate through all the found operations and get the specific argument
containing the ReifiedMessage as shown in Listing 8.6.
1 // Find the argument
2 // p is one of the SemanticOperations in ops
3
4 SemanticArgument reifiedMessageArg =
5 p.SemanticArguments.First(delegate(SemanticArgument arg)
6 {
7 return (arg.SemanticType.FullName.Contains(reifiedMessageType));
8 });
Listing 8.6: Find the argument using a ReifiedMessage
If we have the specific argument containing the ReifiedMessage object, we have to look which
actions in the operation are using this argument.We use a query, listed in Listing 8.7 to find
those actions. This time we search for all the call actions because properties and functions of the
ReifiedMessage object are used. Of course, this call action should have as its first argument the
ReifiedMessage operand we found earlier. Since we do not want to specify that the action should
be in the operation we are currently analyzing, we supply the operation itself (represented as
the p variable at line 20) as the second parameter to the Query function. It is then used as the
starting point for the native search visitor, creating a more time efficient search.
1 // Find the actions which are performing an operation on the argument
2 // passing the operation p as the start point
3 IList<SemanticAction> actions =
4 db.Query<SemanticAction, SemanticAction>(
5 delegate(SemanticAction sa)
6 {
7 if (sa.ActionType == SemanticActionType.Call)
8 {
9 if (sa.HasArguments)
10 {
120 Automatic Derivation of Semantic Properties in .NET
8.3 Plugins
11 // The first argument is the object on which we are
12 // calling the function
13 // This should be the reifiedmessage
14 return (sa.Arguments[0].IsSemanticArgument &&
15 sa.Arguments[0].Equals(reifiedMessageArg));
16 }
17 }
18 return false;
19 }
20 , p); // <-- we start at the operation
21 // so we do not specify the operation name, class etc
Listing 8.7: Retrieve all the calls to methods of the ReifiedMessage argument
The collections of actions is now used to determine the behavior of the reified message us-
ing the rules specified by Staijen’s work [72]. For example; when the getTarget function is
called, we add the target.read semantic to the list of semantics for this operation. A call to the
reply function introduces the returnvalue.write, target.dispose, selector.dispose, args.dispose and
message.return semantics.
This temporary list is then converted to a custom attribute and shown to the developer. One of
the future ideas of Staijen is to annotate methods invoked after a proceed call with a semantic
specification. This is implemented in this plugin as shown in Listing 8.8, which also displays
some of the possibilities of the ExtendedList object.
1 // Get all the semantic actions
2 ExtendedList<ISemanticAction> allActions = p.RetrieveAllSemanticActions();
3
4 // Skip the first items until we find the proceed call
5 allActions = allActions.SkipWhile(delegate (ISemanticAction sa)
6 {
7 return !sa.ActionId.Equals(proceedActionId);
8 });
9
10 // Filter for only call actions
11 allActions = allActions.Where(delegate (ISemanticAction sa)
12 {
13 return sa.ActionType == SemanticActionType.Call;
14 });
Listing 8.8: Retrieve other methods which should be analyzed after a proceed call
If we find a proceed call, we store the unique action id so we can retrieve all the call actions
occurring after the proceed action. The methods called by the found action are displayed to the
developer so he/she can have a look at those methods.
Tests using the ReifiedMessage Extraction plugin on the examples accompanying Compose
showed that the plugin retrieves the same semantics as already specified in the examples. It
does not had any problems in automatically inducing the behavior of the reified message based
on the calls to the functions of this object.
However, there is still no control flow information used in this analysis. Certain call actions
may never happen or may be executed multiple times. Currently this cannot be expressed in
the custom attribute and is not further implemented. To do this in the future, we can use the
flow capabilities of the metamodel to generate a control flow with access level information.
See Section 7.2.6 for more information about control flows.
M.D.W. van Oudheusden 121
8.3 Plugins
8.3.2 Resource Usage
The Resource Usage plugin creates a list containing all the operands used in an operation. Not
only the name of the operand is shown, but also if the operand is created, read from or written
to. The plugin uses the flow blocks generated by the control flow generator and adds the access
level to the found operands.
The result of this plugin will be in the form of the following output:
Read/write set of checkEmpty [pacman.World]:
([SemanticVariable]j.create)1
([SemanticVariable]i.create)1
([SemanticVariable]V_2.create)1
([SemanticConstant]C_0.read)1
([SemanticVariable]i.write)1
([SemanticConstant]C_1.read)*
([SemanticVariable]j.write)*
([SemanticField]screenData.read)*
([SemanticVariable]ArrayResB_8_Action0.write)*
([SemanticVariable]ArrayResB_8_Action0.read)*
([SemanticVariable]ArrayResB_10_Action0.write)*
([SemanticVariable]ArrayResB_10_Action0.read)*
([SemanticVariable]ResultOfB_11_Action0.write)*
([SemanticVariable]ResultOfB_11_Action0.read)*
([SemanticConstant]C_2.read)*
([SemanticVariable]V_2.write)*
([SemanticVariable]ResultOfB_11_Action1.read)*
The variables are created first because the IL standard enforces this rule. Constant values are
read and there are some read and write actions to the variables. The last character of each line
indicate the access level as explained in the following list:
1 The operand is accessed at least once;
? The operand might be accessed. It could be conditional;
* A conditional operand inside a loop. It can be accessed maybe more than once;
+ It is executed at least once and maybe more. A loop with a conditional at the end;
0 Unreachable code. It is never accessed.
The plugin iterates through all the flow blocks of all the operations in the database. Each flow
block contains semantic blocks with actions. The operands of each action are collected and if it
is a destination operand it is a write action and if the operand is a source operand, then it is a
read action. The access level of the flow block is added to the retrieved operand and the list is
presented to the user. A collection of local variables is added at the top of the list as they are
created when the operation is entered.
The read/write sets returned from the examples are similar to the operands used in the actual
source code. However, the Semantic Extractor and the compiler have introduced extra variables
for normalization or optimization purposes.
122 Automatic Derivation of Semantic Properties in .NET
8.4 Integration with Compose
8.3.3 Export Types
Compose uses a separate .NET application, called TypeHarvester, to retrieve the types from
the source assemblies. The harvester uses .NET reflection to load and parse the assemblies,
iterate through all the type information, and store this information in a types.xml file. All the
information found with the reflection API is added to this file, which can become quiet large.
The xml file is imported during the Compose compilation process and stored in the repository.
A language model is created with this data to be used in the selector language.
The export types plugin performs the same action but uses its metamodel as the source. It does
not have all the properties reflection returns, but still have extensive type information available.
The GetContainers function of the SemanticDatabaseContainer is called and returns a col-
lection of SemanticContainer objects. Now we can use the collections inside the containers to
find the classes, fields, operations, and so on. This plugin does not use the search functionality
of the database, but only retrieves the top elements of the metamodel contents and use their
properties to further export the data. The XmlTextWriter of the .NET Framework performs the
actual writing to an xml file with the same structure as used by the TypeHarvester application.
Although it does not export all the information, it does provide an example how the metamodel
can be used to replace the TypeHarvester application and how the metamodel can be browsed
instead of being searched.
8.3.4 Natural Language
The metamodel provides information about the behavior of the functions in the form of actions.
The natural language plugin converts the actions to a natural language representation for each
operation. An example of the output of this plugin is listed below.
Description of operation ’isPlaying’ in Jukebox.Player [Jukebox.Player]
=======================================================================
Assigning SemanticField named playing to SemanticVariable Local0
Jumping to block B_6 where we perform the following:
Exiting operation ’isPlaying’ and returning the contents of Local0
Each action and accompanying operands are converted to a textual representation. Jumps and
branching operations are handled in such a way that the flow of the function is still visible and
loops are only shown once.
Although the practical use of this plugin is little, it can also serve as a basis for a conversion of
actions to a more formal and standard representation. For instance, an xml representation or a
standard for specifying software. Allowing for automatically checking of requirements to the
actual implemented code is a more practical example.
8.4 Integration with Compose
The Semantic Analyzer designed for this assignment is a general purpose analyzer and can be
used for different kinds of tasks. However, one of the reasons to create this analyzer was to
M.D.W. van Oudheusden 123
8.4 Integration with Compose
provide more information to be used in Compose . As explained in the design limitations
(Section 7.1.2) we only have access to a .NET assembly (with method implementations) at the
end of the Compose compilation process. At that point, all the analysis tasks are already exe-
cuted and semantical join points can no longer be selected. This makes it difficult to integrate
the analyzer in the compilation process.
Another problem is the mismatch between the .NET Framework versions of Compose /.NET
and the Semantic Analyzer. The first runs under version 1.1 and the second uses version 2.0 be-
cause it depends on Microsoft Phoenix, which needs the latest version of the runtime. Because
of the dependency of Phoenix, which has rather large libraries (10 MB) and restrictive license,
we cannot make the analyzer part of the standard Compose /.NET install.
Based on the limitations we can make the following requirements for the integration;
1. The Semantic Analyzer should be optional. If it is not installed, then Compose must still
work correctly;
2. The retrieved information must be placed in the internal database, called the repository,
so the components of Compose can access the data from one central store;
3. The analyzer must extract the semantics of the usage of the ReifiedMessage object and
supply this information in such a way the SECRET module can work with it;
4. Information about the resource usage of fields and arguments should be extracted from
the sources and placed in the repository;
5. Per function, retrieve the calls made to other functions and store them in the repository.
To be able to satisfy these requirements we had to make some changes to the compilation or-
der of the modules in Compose /.NET. We first generate a weave file specification and call
ILICIT [10] to weave the assemblies. At that point we can start the two SEMTEX modules.
The DotNETSemTexRunner runs the SemanticExtractorConsole application (see Section 8.2) with
a special Composestar plugin. This plugin creates an xml file with the analyzed data. The sec-
ond module, DotNETSemTexCollector, loads the xml file and places the data in the Compose
datastore. The next module to be executed is the SECRET module, which can now access the
semantical data. Finally, we create a runtime repository and copy the assemblies to the output
folder.
If the DotNETSemTexRunner cannot find the required files, it will simply skip the creation of the
xml file and an information message is shown to the developer indicating why this module is
not executed and how to remedy this. When the next module, DotNETSemTexCollector, cannot
find the xml file, it will continue to the following module. This takes care of point one of the
requirements.
The DotNETSemTexRunner uses the SemanticExtractorConsole application to analyze the assem-
blies and execute a special plugin. This plugin contains elements of the various separate plug-
ins described in Section 8.3. It determines the usage of the ReifiedMessage, the read, write and
create actions of fields and arguments in the functions of the assemblies, and the calls to other
functions. The techniques to retrieve this type of data are described in the Plugins section. An
example of the resulting xml file is listed in Appendix F. The plugin satisfies requirements
three, four and five.
To add the information to the datastore of Compose , we extended the existing MethodInfo
124 Automatic Derivation of Semantic Properties in .NET
8.4 Integration with Compose
class to hold additional collections of CallToOtherMethod objects, ResourceUsage objects
and a list of strings containing the ReifiedMessage usage. The MethodInfo class is automatically
stored in the datastore and can be used by the other components.
The DotNETSemTexCollector class parses the generated semtex.xml file, creates the necessary
objects such as CallToOtherMethod or ResourceUsage, assign the values from the xml file to
the properties of these objects and adds them to the corresponding MethodInfo object already
in the datastore.
To be able to use the automatically extracted semantics of the ReifiedMessage in the SECRET
module, we changed the MetaFilterAction class to use the found semantics stored in the
MethodInfo object. It will still use the semantics defined by the developer first before import-
ing the automatically created semantics.
M.D.W. van Oudheusden 125
CHAPTER 9
Conclusion, Related, and Future Work
In this final chapter, the results of the research and implementation of the Semantic Analyzer are
evaluated and conclusions are drawn. First, the related work is discussed and we end with a
look at future work.
9.1 Related Work
Automatically analyzing source code is certainly not a new field in computer science, there
are various applications which use the resulting data. For example, finding design patterns in
the source code, reverse engineer design documentation [69], generating pre- and postcondi-
tions [53], verifying software contracts [9], or checking behavioral subtyping [30]. Some ana-
lyzers are relatively simple, they might only count programming elements like the number of
lines, methods, or classes or determine the complexity of functions. Other analyzers, like the
Semantic Analyzer, convert the source code to a higher level view, a metamodel, to reason about
behavior.
This section discusses a number of analyzers, which are basically all static code analyzers work-
ing with the semantics of the code.
9.1.1 Microsoft Spec#
Spec# (Spec sharp)1 is a Microsoft Research project attempting to provide a more cost efficient
way to develop and maintain high-quality software [6]. Spec# extends the C# programming
language with specification constructions like pre- and postconditions, non-null types, checked
exceptions, and higher-lever data abstraction. The Spec# compiler statically enforces non-null
types, emits run-time checks for method contracts and invariants, and records the contracts as
1
http://research.microsoft.com/specsharp/
126
9.1 Related Work
metadata for consumption by other tools in the process. Another application, the Spec# static
program verifier, generates logical verification conditions from a Spec# program and analyzes
the verification conditions to prove the correctness of the program or find errors in the code.
The language enhancements of Spec# are in the form of annotations the developer can add
to the existing code. Of course, this is not possible for code in third party libraries, such as
the .NET Framework Class Library. Microsoft is working on a project for semi-automatically
generating contracts for existing code.
Spec# is a tool for correctness and verification checks. Although a part of Spec# uses runtime
checking (using inlined code for pre- and postconditions), the static checking is not much dif-
ferent from the Semantic Analyzer. The static program verifier constructs a metadata view of
the code in its own intermediate language, called BoogiePL. It consists of basic blocks with
four kinds of statements: assignments, asserts, assumes, and function calls. Extensive analyz-
ing systems processes the IL code and extracts additional properties which are added to the
program in the form of asserts and assumes statements. An automatic theorem prover is then
used to verify the conditions in the program.
The verification systems of Spec# are very advanced, although the metamodel is not very differ-
ent from the model used in the Semantic Analyzer. Our metamodel also contains basic building
blocks with assignments and calls, but we lack the assume and assert statements.
9.1.2 SOUL
A project similar to the Semantic Analyzer is the SOUL logic metaprogramming system [28].
This system is designed to reason on a metalevel, about the structure of object-oriented source
code in a language independent way using logic meta-programming [21]. By using logic rules
it is possible to reason about the structure of object-oriented programs. For the SOUL system
the languages Smalltalk and Java are used.
To reason about Smalltalk code, the SOUL system was built. It consists of a Prolog-like lan-
guage and an associated logic inference engine [86]. Logic facts and rules could be used to
query about Smalltalk programs. The mapping between the logic meta language and the object-
oriented source language is handled by a metalevel interface (MLI) implemented as a hierarchy
of classes. The reflection capabilities of Smalltalk are used to build the MLI and method body
statements are converted to functors1 and can be queried for. A logic repository contains logic
predicates to use with the reasoning engine.
An extension to SOUL for the Java platform is called SOULJava2, which contains its own parser
to convert Java code to the repository and adds new methods to the MLI, for instance to query
for interfaces.
Some applications of SOUL are detecting patterns like double dispatch or getters, or finding de-
sign patters such as the visitor pattern. Using the logic rules, the LMI can be searched for certain
constructions. For example, the rule for detecting getting methods is listed in Listing 9.1.
1 gettingMethod(?class,?method,?varname) if
2 method(?class,?method),
1
Functors are objects that model operations that can be performed. In their simplest form they are somewhat
like function pointers.
2
Evolved to Irish; http://prog.vub.ac.be/˜jfabry/irish/index.html
M.D.W. van Oudheusden 127
9.1 Related Work
3 methodSelector(?method,?gettingname),
4 instVar(?class,?varname),
5 gettingMethodName(?varname,?gettingname),
6 varName(?var,?varname),
7 methodStatements(?method,<return(?var)>)
Listing 9.1: Selecting getters in SOUL using Prolog
Idiom rules, like the methodSelector and gettingMethodName, are used to provide a mapping
between the source language and the items in the model to cope with the language specific
differences between Smalltalk and Java.
SOUL and the Semantic Analyzer share some properties, like creating a higher level metamodel,
being language independent, and providing a search mechanism. The implementation is how-
ever different. The Semantic Analyzer has a language independent metamodel and the Semantic
Extractors are responsible for the correct conversion, whereas SOUL uses idioms to deal with
different language constructs. Furthermore, SOUL relies on the Prolog predicates for searching
in the MLI. Our analyzer has a similar search system, but based on native queries. The ma-
jor difference is the conversion of statements to actions, the behavioral representation, in the
Semantic Analyzer. SOUL is more a syntactical than a semantical analyzer.
9.1.3 SetPoints
SetPoints is a system designed to evolve structure-based pointcuts into semantical pointcuts,
called SetPoints [3]. The SetPoints project identifies the same problems as described in the
motivation chapter (Section 4.1.1); join points should be applied based on the behavior of the
code and not on naming conventions.
The SetPoint Framework for .NET tries to solve this problem by linking program semantics,
or views, to source code through metadata elements, such as custom attributes. The pointcut
definitions are based on these annotations. SetPoint uses OWL1 and RDF2 to represent the
ontologies.
Developers have to specify the relations between base code and the views of the system using
custom attributes. The SetPoints developers are working on a version where Microsoft Phoenix
is used to find program annotations.
SetPoints differs from the Semantic Analyzer in a number of ways. SetPoints is primarily de-
signed to define semantical join points and performs the actual weaving of aspects, whereas
the Semantic Analyzer only provides a system to retrieve semantics. The later is a more general
purpose semantics extractor and eventually could serve as an automated tool to retrieve the
views for SetPoints. Selecting the correct join points is handled in SetPoint by RDF bindings.
The Semantic Analyzer uses native queries to find information in the model and could be used
to add semantical information to the selector language of the Compose project.
1
Web Ontology Language; http://www.w3.org/TR/owl-features/
2
Resource Description Framework; http://www.w3.org/RDF/
128 Automatic Derivation of Semantic Properties in .NET
9.1 Related Work
9.1.4 NDepend
NDepend1 is a static analysis tool to generate reports, diagrams, and warnings about .NET
assemblies. Internally it employs Cecil to parse IL byte code and create a representation of the
code. Users can query the model with the Code Query Language (CQL), based on the SQL
syntax. A separate tool, called VisualNDepend, offers a graphical user interface to edit CQL
and shows the results using diagrams.
Information about the inner workings of NDepend could not be found as it is not an open
source project, but the Cecil library is used to analyze the assemblies and the CQL2 operates on
the results to generate metrics. Some examples of CQL queries are listed in Listing 9.2.
1 WARN IF Count > 0 IN SELECT METHODS WHERE NbILInstructions > 200
2 ORDER BY NbILInstructions DESC
3 -- Warn if a method has more then 200 IL instructions
4 SELECT METHODS WHERE ILCyclomaticComplexity > 40
5 ORDER BY ILCyclomaticComplexity DESC
6 -- Return methods where the CC is greater then 40
7 SELECT TYPES WHERE DepthOfInheritance > 6
8 ORDER BY DepthOfInheritance DESC
9 -- Select the types with an inheritance level > 6
10 SELECT TYPES WHERE Implement "System.Web.UI.IDataSource"
11 -- Select the types implementing the IDataSource interface
12 SELECT TOP 10 METHODS WHERE IsPropertyGetter OR IsPropertySetter
13 ORDER BY NbILInstructions DESC
14 -- Return a top 10 of property setters or getters order by the number of
instructions
Listing 9.2: Examples of CQL queries
NDepend offers a good framework for analyzing and querying assemblies. The CQL provides
almost the same functionality as the native queries used in the Semantic Analyzer and the ex-
amples in Listing 9.2 could also be rewritten to native queries to be used in our analyzer. The
instruction level capabilities in NDepend are limited. There are some parameters available to
use in search queries, such as finding property setters or IL complexity, but most of the search
operations focus on metrics and are very structural oriented. The Semantic Analyzer goes fur-
ther and allows searching for actions, the behavior of the code.
9.1.5 Formal Semantics
Besides the static analyzers described above, there is also related work to be found if we con-
sider the semantics itself.
There are three major approaches to semantics [61, 49];
Operational semantics
The computation of a construction specifies the meaning. How the effect of an operation
is produced is important.
Denotational semantics
Mathematical objects are used to represent the effect of executing the constructs. The
effect is important, not how it is obtained.
1
http://www.ndepend.com/
2
CQL is defined in the standard located at http://www.ndepend.com/CQL.htm
M.D.W. van Oudheusden 129
9.2 Evaluation and Conclusion
Axiomatic semantics
Assertions are used to express certain properties of a construct.
One way to formalize the semantics of constructions using an operational explanation is Struc-
tural Operational Semantics, also known as small steps semantics. Small steps semantics de-
scribes the individual steps of the computations. There is also big step semantics, in which
natural language is used to represent the overall results of the executions.
With the denotational semantics, mathematical functions are used and we can calculate the
effects of execution with a certain state using these functions. We do not look at the execution
itself, but only at a mathematical abstraction of the program. Although it is relatively simple to
reason with mathematical objects, converting a program to mathematics is not.
The axiomatic semantics approach is often used for proving the correctness of a program. For
instance, checking pre- and postconditions.
The Semantic Analyzer uses a form of operational semantics. Like the Structural Operational
Semantics, the emphasis is on the individual steps of the execution. However, our semantic
model is more a means to allow other tools to reason about software than a formal specification
of all the semantics in a source file. The control flow graph capabilities of the model and the
availability of operand data are an important part in specifying the semantics.
A related system, using both small step as big step semantics is presented in the paper “A
formal executable semantics for Java” [5]. A system of 400 inference rules is used to describe the
operational semantics of Java using the Typol logical framework. A syntactically correct Java
program, an abstract syntax tree, is converted to a semantical representation and a graphical
environment shows this representation.
9.2 Evaluation and Conclusion
There were multiple reasons to create the Semantic Analyzer. Besides the general purpose static
analyzer, three main issues were identified for the Compose project; the need for semantic
join points, program analysis, and fine grained join points.
After determining the meaning of semantics and how different kinds of semantic constructs
are represented in the target language IL, it was possible to design the Semantic Analyzer. Ba-
sically this system can be divided into three parts; extractors, the metamodel, and the search
mechanism.
The semantics used in the metamodel are based on the execution of their corresponding source
code constructions. We distinguish the basic elements used by programming languages, like
mathematical functions, control flow constructs, conversions, assignments, testing, and so on.
However, there is no formal specification of this metamodel and there is no evidence that the
model is correct and complete. The model is one of the elements in the whole Semantic Analyzer
system. We use it to store the retrieved semantics and search for specific behavior. The added
abilities to the metamodel, such as the control flow graphs and operand information, help us
in determining the behavior of code. The emphasis of the model was more on the usability by
the developers, than providing a complete formal model for semantics.
The Semantic Analyzer uses a static analyzer instead of a dynamic one. As discussed before,
130 Automatic Derivation of Semantic Properties in .NET
9.2 Evaluation and Conclusion
this has its advantages. For one, the source code does not have to be executed and we can
use the Common Intermediate Language to cover a wide range of higher level .NET program-
ming languages. Another advantage is the ability to compute all the possible paths in the code.
However, we try to use static analysis to obtain runtime behavior. Certain information, such
as the actual executed functions, the control flow paths taken, the real values of the variables
are only known at runtime and can only be determined using dynamic analysis. At runtime
it is possible to tell what the actual behavior of a function is. The behavior can only be guar-
anteed for that specific execution run, because the user input can be different the next time.
Static analysis was selected, because this could give all the possible occurring actions. If we
combine the static analysis with dynamic analysis we might get additional information about
polymorphism, inheritance, and dynamic binding, but at a higher effort. Dynamic analysis is,
compared with static analysis, difficult to do because the application must be executed and all
the paths must be visited to get a complete overview of the code. This is not practical to do and
is very time consuming.
To perform the actual code analysis, we use Semantic Extractors. The Provider design pattern
allows us the select a specific provider and four different kinds of providers were created. Each
provider reads the source code, parses it, builds up a metamodel, and converts the statements
to actions. How this is implemented differs per provider. Four different types of IL readers
were discussed in Section 6.2 and for each type a provider has been created. Parsing IL byte
code is a complex operation because of the amount of metadata available. The four different
IL readers can deal with this complexity and offer a form of code object model containing all
the elements of the source code. At the time of implementation, the used IL readers were lim-
ited and error prone. The only one really capable of getting all the information was Microsoft
Phoenix. Although very advanced in its capabilities, it was difficult to use. The documentation
was scarce, the samples were limited, and the system is still undergoing changes with every
new release. The PostSharp system has evolved into a much better usable IL reader and it can
be interesting to see if it is possible to use this reader to get the semantics. A problem with
the Phoenix implementation was the extraction of custom attributes. This was not yet imple-
mented in Phoenix and a trick involving the default reflection capabilities of .NET had to be
used. This is relatively slow operation and better build-in support of Phoenix for attributes is
advisable.
Converting statements into actions is, even with Phoenix, still a difficult process. We raise the
actual code back to a higher level representation, the metamodel. This means we are losing
some information as we are combining statements into a more general action. It is up to the
applications of the metamodel to use the data to deduce semantics from it. The model lacks
actions such as assume and assert, which are present in Spec#. One can argue that the metamodel
is not really a semantical model, but merely a simplified code representation. However, the
available actions in the model represent the behavior of a function in the source code. As
such, we can reason about the intended semantics of the function using the model. So the
metamodel alone is not enough, we need applications operating on the model to do the real
semantic analysis. The metamodel is only a general purpose collection of semantical related
actions combined with control flow and operand information.
The metamodel contains enough information to reason about code and the possible execution
of code. We have information about the control flow and we can retrieve the data flow. Because
of the normalization of the statements we can use the extensive operand information to track
the usage of certain data. For example, we can follow an argument in a function and see the
M.D.W. van Oudheusden 131
9.2 Evaluation and Conclusion
possible changes made to that operand. The model also contains flow graph capabilities. A de-
cision was made to include this type of functionality directly inside the model instead of using
plugins. The flow graphing capabilities are frequently used operations. They use the blocks
and actions to generate the flow blocks and need direct access to the model. The flow graph
functions operate on parts of the model and as such should also be present in the model. They
extend the models capabilities and are placed inside their own namespace in the metamodel
library.
Besides the actions, the model also contains a complete hierarchical structure representing the
original code structure. The main purpose of this tree-like structure is to place the actions in
context. Actions belong to a function, a function resides inside a class, and the class is inside
a container. We need this type of information to map elements to source code, but it does not
provide a direct semantical purpose. The question arises if it possible to place the function
elements in another form of behavior describing the class. Strictly speaking, a class contains
related functions with certain behavior. The combined behavior can be used as the parent of
a single function. For example; a Car class can have the combined behavior of the function
Brake, Accelerate, Go left, and Go right. Finding and representing this behavior however is a
difficult problem. Not all the functions are implemented in this class, or they could be inherited,
overridden, or in a base class. Creating a notation for the semantics of a class requires extended
knowledge about the intended behaviors of all the functions, how they operate together, and
what type of added functionality they provide to the class.
To search the model, a number of different search techniques were discussed. Because of the
capabilities of native queries, this search system was selected. Although searching in this man-
ner is not new (Smalltalk has a similar system) it is currently gaining a lot of popularity in
the .NET programming world. Mostly because Microsoft is developing LINQ, which is built
on the underlying technique of native queries. The capabilities of the search function of the
Semantic Database is very similar to LINQ, the actual implementation is not. Our database uses
delegates to check each element in the database for a match. This is not an efficient method
and it would be better to create some sort of an expression tree based on the query function
and use indexes in finding the information. This entails the need to parse the native query and
convert the statements to an expression tree. This solution was not chosen, because of the extra
work involved. To optimize the database in the future, we could switch to LINQ when it is
released or make use of an object-oriented database like DB4O1, which contains an extensive
native query framework.
A number of plugins were created to test the capabilities of the analyzer and to perform some
of the tasks defined in the motivation chapter. One of the reasons to create the analyzer was to
determine the behavior of the ReifiedMessage object. The plugin is now capable of reasoning
about the usage and generates the correct semantics to be used by the SECRET module. Cur-
rently it does not take into account if the behavior of the ReifiedMessage is conditional and
how the control flow is organized. It is not that difficult to gather this type of information, but
it is at the moment not possible to represent control flow data in the format used by SECRET.
Another plugin provides information about the resource usage and depends heavily on the
flow graph capabilities of the metamodel. Test runs with source code containing a large num-
ber of control flow statements showed that the control flow path algorithm had some serious
problems during the execution. The algorithms took a long time when the source code con-
1
http://www.db4o.com/
132 Automatic Derivation of Semantic Properties in .NET
9.2 Evaluation and Conclusion
tained multiple control flow paths (more than 50, so this were certainly not optimized methods
to start with). It would be wise to invest in stronger and more efficient algorithms for control
flow analysis. In addition, the flow graph capabilities can be extended by adding algorithms
for data flow and method call flow. These frequently used types of flow analysis can then be
accessed directly from the applications without implementing their own algorithm.
By combining the plugins, it was possible to integrate the analyzer with Compose .NET. Be-
cause of the dependencies on the Phoenix libraries and the .NET Framework 2.0 it was not
feasible to package the analyzer directly with the default Compose installation and it is now
an optional part. The information found by the plugin is added to the Compose repository,
but the actual usage of this data is still limited. The SECRET module can now take the extra
found semantics of the ReifiedMessage into account, but this is not extensively tested. In theory,
the added semantical information should partly take care of one of the problems discussed in
the motivation, namely the need for program analysis data. There is now more information
available about the usage of variables, the methods being called, and the ReifiedMessage behav-
ior, that this can be used to reason about potential conflicts and problems. However, it must be
noted that there is currently not a module in Compose .NET which uses this data for further
analysis. The information is available for future use.
Another problem identified in the motivation chapter was the problem of semantic join points;
the need to apply aspects based on semantical information instead of naming conventions or
structural properties. The Semantic Analyzer alone does not solve this, but can certainly be used
to help with this problem. The primary reason this is not yet implemented, is the compilation
order used by Compose .NET. We only have access to a compiled .NET assembly with method
bodies at the end of the process. At that time it is too late to perform the analysis and calculate
the join points based on semantical properties. If we really want to use the Semantic Analyzer
we will have to change the compilation process and perform multiple phases so we have direct
access to the assemblies, extract the semantics, add those semantics to the selector language,
and perform the actual weaving.
Although at this time not yet implemented, the Semantic Analyzer can certainly be used for se-
mantical join points determination. Semantical join points provide a better alternative to using
syntactical selection criteria like naming conventions, an opinion shared by others. Gybels and
Brichau [35] argue that we should write a crosscut as a specification in the form of patterns as
close to the intent of the crosscut as possible to make the crosscut robust to program evolutions.
Tourwe [78] proposes the need for more sophisticated and expressive crosscut languages. This
can partially solve the fragile pointcut problem. Some possible solutions have been presented,
for example SetPoint [3].
The problem with the semantic join points also applies to the fine grained join points, applying
aspects at statement level. The compilation structure does not allow us to perform fine grained
join point weaving, but this can be added in the future. While the statements are converted to
actions, we still save information to map the action back to the source code in the form of line
numbers and file names. The weaver can use this information to add additional code around
the original statements based on certain actions.
Automatically deriving semantical properties by analyzing the source code and the semantics
of the source language can certainly not solve all the problems. Some intent is not present in the
code but in the underlying system and getting a clear behavioral description of a function is not
possible. A call to a method in another library does not give us any semantical information if we
M.D.W. van Oudheusden 133
9.3 Future Work
do not have the code of this function for further analysis. However, the Semantic Analyzer offers
developers an extensive metamodel with basic behavioral actions, control flow information,
and operand data to reason about the possible intended behavior of the code.
9.3 Future Work
As with almost any research and software product, there is always room for more work. This
section offers some possible suggestions for future work.
9.3.1 Extractors
Currently there are four different Semantic Extractors, each with their own IL reader. Only the
Phoenix extractor is working properly, although more testing is certainly advisable. There are
also some improvements to be gained by using more Phoenix functionality like the control flow
capabilities instead of creating our own algorithms.
Using other IL readers, such as the updated PostSharp library or the .NET reflection capabilities
in .NET version 2.0, it should be possible to create good working extractors for .NET assem-
blies. More interesting is developing an extractor capable of converting other object-oriented
languages to the semantic metamodel. For instance, analyzing Java source or byte code, or Bor-
land Delphi (Object Pascal). The metamodel should be language independent and the extractor
can map the language specific parts to the corresponding elements in the model.
9.3.2 Model
The Semantic Metamodel is based on programming language constructions found in most lan-
guages. However, there is no formal specification for the metamodel available and investing
more time in this area can make the model more complete and concise. The model was largely
designed with usability and flexibility in mind. The cooperation with the native query search
mechanism is very extensive and using the model intuitive. This is achieved by the added
functionality, metadata, and comments.
Creating a more formal semantic model, using techniques described in [61] or [49], can lead to
a more concise metamodel with better abilities to reason about than the one currently available.
9.3.3 Graphs
Only control flow graphs are now available in the model. The algorithms used for this can
certainly be optimized, but also other graph capabilities can be added. For instance data flow,
to make it possible to track the flow of an operand in the model, or call graphs, to represent
calling relationships among operations in the model.
The metamodel should contain enough information to create these kinds of graphs.
134 Automatic Derivation of Semantic Properties in .NET
9.3 Future Work
9.3.4 Applications
Only a small number of plugins make use of the Semantic Analyzer. It would be very interest-
ing to develop more applications for the metamodel. From simple structural applications like
calculation metrics, to more advanced behavioral tools. For instance, the detection of design
patterns, automatically generating pre- and postconditions based on the contents of functions,
checking for security, performance, or other problems, and so on.
Another interesting application is to see if the model is really language independent by con-
verting a program written in one language to a program written in another language using the
metamodel. For example, using a .NET program as the input source and applying a plugin to
generate a Java application with the same behavior.
9.3.5 Integration
The integration with Compose .NET is now limited. The analyzer is used to extract some
basic elements from the source assemblies, but the resulting information is not further applied
in any Compose analysis task, other than SECRET. A part of the reason is the compilation
process of Compose , making it difficult to get a complete assembly at the start of the analysis
modules, but also the two different programming languages used. It is not possible to use the
rich metamodel and search functions to get the exact data needed, but we have to use different
subsystems to get the data.
The analyzer is written in the .NET language C#, the Compose modules are in the Java lan-
guage. Communication between the programs is handled by the use of xml files. It is not
possible to directly call the Semantic Analyzer and work with the metamodel and native query
search functions from within the Compose process.
One solution for this problem is to port the analyzer to Java. The difficult part is the creation
of good .NET parsers. The native query functionality can be ported to Java, but need the latest
Java version supporting generics and anonymous methods to work correctly. Another solution
is to create more extensive plugins to gather information from the source code. It might be
wise to store the resulting data in another format than xml, for instance in an object-oriented
database like DB4O or ObjectStore1, for performance reasons.
1
http://www.objectstore.com/datasheet/index.ssp
M.D.W. van Oudheusden 135
Bibliography
[1] Ada. Ada for the web, 1996. URL http://www.acm.org/sigada/wg/web_ada/.
[2] M. Aks¸it, editor. Proc. 2nd Int’ Conf. on Aspect-Oriented Software Development (AOSD-2003),
Mar. 2003. ACM Press.
[3] R. Altman, A. Cyment, and N. Kicillof. On the need for setpoints. In K. Gybels,
M. D’Hondt, I. Nagy, and R. Douence, editors, 2nd European Interactive Workshop on Aspects
in Software (EIWAS’05), Sept. 2005. URL http://prog.vub.ac.be/events/eiwas2005/
Papers/EIWAS2005-AlanCyment.pdf.
[4] T. Archer and A. Whitechapel. Inside C#,Second Edition. Microsoft Press, Redmond, WA,
USA, 2002. ISBN 0735616485.
[5] I. Attali, D. Caromel, and M. Russo. A formal executable semantics for Java. Proceedings of
Formal Underpinnings of Java, an OOPSLA, 98.
[6] M. Barnett, K. Leino, and W. Schulte. The Spec# programming system: An overview. Con-
struction and Analysis of Safe, Secure, and Interoperable Smart Devices: International Workshop,
CASSIS, pages 49–69, 2004.
[7] L. Bergmans. Composing Concurrent Objects. PhD thesis, University of Twente,
1994. URL http://trese.cs.utwente.nl/publications/paperinfo/bergmans.phd.
pi.top.htm.
[8] L. Bergmans and M. Aks¸it. Composing crosscutting concerns using composition filters.
Comm. ACM, 44(10):51–57, Oct. 2001.
[9] A. Beugnard, J. Jezequel, N. Plouzeau, and D. Watkins. Making components contract
aware. Computer, 32(7):38–45, 1999. ISSN 0018-9162. doi: http://doi.ieeecomputersociety.
org/10.1109/2.774917.
[10] S. R. Boschman. Performing transformations on .NET intermediate language code. Mas-
ter’s thesis, University of Twente, The Netherlands, Aug. 2006.
[11] R. Bosman. Automated reasoning about Composition Filters. Master’s thesis, University
of Twente, The Netherlands, Nov. 2004.
136 BIBLIOGRAPHY
BIBLIOGRAPHY
[12] B. Cabral, P. Marques, and L. Silva. RAIL: code instrumentation for .NET. Proceedings of
the 2005 ACM symposium on Applied computing, pages 1282–1287, 2005.
[13] W. Cazzola, J. Jezequel, and A. Rashid. Semantic Join Point Models: Motivations, Notions
and Requirements. SPLAT 2006 (Software Engineering Properties of Languages and Aspect
Technologies), March 2006.
[14] E. Chikofsky and I. JH. Reverse engineering and design recovery: a taxonomy. Software,
IEEE, 7(1):13–17, 1990.
[15] Columbia University. The Columbia Encyclopedia, Sixth Edition. Columbia University Press.,
2001.
[16] O. Conradi. Fine-grained join point model in Compose*. Master’s thesis, University of
Twente, The Netherlands, 2006. To be released.
[17] W. Cook and S. Rai. Safe query objects: statically typed objects as remotely executable
queries. Proceedings of the 27th international conference on Software engineering, pages 97–
106, 2005.
[18] W. Cook and C. Rosenberger. Native Queries for Persistent Objects A Design White Pa-
per. Dr. Dobbs Journal (DDJ), February 2006. URL http://www.cs.utexas.edu/˜wcook/
papers/NativeQueries/NativeQueries8-23%-05.pdf.
[19] C. De Roover. Incorporating Dynamic Analysis and Approximate Reasoning in Declarative
Meta-Programming to Support Software Re-engineering. PhD thesis, Vrije Universiteit Brussel
Faculteit Wetenschappen Departement Informatica en Toegepaste Informatica, 2004.
[20] F. de Saussure. Course in General Linguistics (trans. Wade Baskin). Fontana/Collins, 1916.
[21] K. De Volder. Type-Oriented Logic Meta Programming. PhD thesis, Vrije Universiteit Brussel,
1998.
[22] R. DeLine and M. Fahndrich. The Fugue protocol checker: Is your software baroque.
Unpublished manuscript, 2003.
[23] D. Doornenbal. Analysis and redesign of the Compose* language. Master’s thesis, Uni-
versity of Twente, The Netherlands, 2006. To be released.
[24] P. E. A. D¨urr. Detecting semantic conflicts between aspects (in Compose*). Master’s thesis,
University of Twente, The Netherlands, Apr. 2004.
[25] ECMA-335. Standard ECMA-335, 2006. URL http://www.ecma-international.org/
publications/standards/Ecma-335.htm.
[26] T. Elrad, R. E. Filman, and A. Bader. Aspect-oriented programming. Comm. ACM, 44(10):
29–32, Oct. 2001.
[27] M. D. Ernst. Static and dynamic analysis: Synergy and duality. In WODA 2003: ICSE
Workshop on Dynamic Analysis, pages 24–27, Portland, OR, May 9, 2003.
[28] J. Fabry and T. Mens. Language-independent detection of object-oriented design patterns.
Computer Languages, Systems, and Structures, 30:21–33, 2004.
[29] N. Fenton and S. Pfleeger. Software metrics: a rigorous and practical approach. PWS Publishing
Co. Boston, MA, USA, 1997.
M.D.W. van Oudheusden 137
BIBLIOGRAPHY
[30] R. B. Findler, M. Latendresse, and M. Felleisen. Behavioral contracts and behavioral sub-
typing. In Proceedings of ACM Conference Foundations of Software Engineering, 2001. URL
citeseer.ist.psu.edu/article/findler01behavioral.html.
[31] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: elements of reusable
object-oriented software. Addison Wesley, 1995.
[32] H. Giese, J. Graf, and G. Wirtz. Closing the Gap Between Object-Oriented Modeling of
Structure and Behavior. UML, pages 534–549, 1999.
[33] M. Glandrup. Extending C++ using the concepts of composition filters. Master’s the-
sis, University of Twente, 1995. URL http://trese.cs.utwente.nl/publications/
paperinfo/glandrup.thesis.pi.top.htm.
[34] J. D. Gradecki and N. Lesiecki. Mastering AspectJ: Aspect-Oriented Programming in Java.
John Wiley and Sons, 2003. ISBN 0471431044.
[35] K. Gybels and J. Brichau. Arranging language features for pattern-based crosscuts. In
Aks¸it [2], pages 60–69.
[36] B. Harbulot and J. R. Gurd. Using AspectJ to separate concerns in parallel scientific Java
code. In K. Lieberherr, editor, Proc. 3rd Int’ Conf. on Aspect-Oriented Software Development
(AOSD-2004), pages 122–131. ACM Press, Mar. 2004. doi: http://doi.acm.org/10.1145/
976270.976286.
[37] W. Havinga. Designating join points in Compose* - a predicate-based superimposition
language for Compose*. Master’s thesis, University of Twente, The Netherlands, May
2005.
[38] A. Heberle, W. Zimmermann, and G. Goos. Specification and Verification of Compiler
Frontend Tasks: Semantic Analysis. 04/96 Verifix Report UKA, 7, 1996.
[39] F. J. B. Holljen. Compilation and type-safety in the Compose* .NET environment. Master’s
thesis, University of Twente, The Netherlands, May 2004.
[40] R. Howard. Provider Model Design Pattern and Specification, Part 1. Technical re-
port, Microsoft Corporation, 2004. URL http://msdn.microsoft.com/library/en-us/
dnaspnet/html/asp02182004.asp.
[41] R. L. R. Huisman. Debugging Composition Filters. Master’s thesis, University of Twente,
The Netherlands, 2006. To be released.
[42] S. H. G. Huttenhuis. Patterns within aspect orientation. Master’s thesis, University of
Twente, The Netherlands, 2006. To be released.
[43] E. International. Common language infrastructure (CLI). Standard ECMA-335, ECMA In-
ternational, 2002. URL http://www.ecma-international.org/publications/files/
ecma-st/Ecma-335.pdf.
[44] Jeffrey Richter. Type Fundamentals. Technical report, Microsoft Corporation, 2000. URL
http://msdn.microsoft.com/msdnmag/issues/1200/dotnet/toc.asp.
[45] Jython. Jython homepage. URL http://www.jython.org/.
138 Automatic Derivation of Semantic Properties in .NET
BIBLIOGRAPHY
[46] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G. Griswold. An overview
of AspectJ. In J. L. Knudsen, editor, Proc. ECOOP 2001, LNCS 2072, pages 327–353, Berlin,
June 2001. Springer-Verlag.
[47] P. Koopmans. Sina user’s guide and reference manual. Technical report, Dept. of
Computer Science, University of Twente, 1995. URL http://trese.cs.utwente.nl/
publications/paperinfo/sinaUserguide.pi.top.htm.
[48] C. Koppen and M. St¨orzer. PCDiff: Attacking the fragile pointcut problem. In K. Gy-
bels, S. Hanenberg, S. Herrmann, and J. Wloka, editors, European Interactive Workshop
on Aspects in Software (EIWAS), Sept. 2004. URL http://www.topprax.de/EIWAS04/
EIWAS-Papers.zip.
[49] S. Krishnamurthi. Programming Languages: Application and Interpretation. January 2006.
URL http://www.cs.brown.edu/˜sk/Publications/Books/ProgLangs/.
[50] J. Lefor. Phoenix as a Tool in Research and Instrumentation. 2004. URL
http://research.microsoft.com/phoenix/Johns%20Phoenix%20Backgrounder%
2%0-%20External.doc.
[51] S. Lidin. Inside Microsoft .NET IL Assembler. Microsoft Press, Redmond, WA, USA, 2002.
ISBN 0-7356-1547-0.
[52] C. Lopes, L. Bergmans, M. D’Hondt, and P. Tarr, editors. Workshop on Aspects and Di-
mensions of Concerns (ECOOP 2000), June 2000. URL http://trese.cs.utwente.nl/
Workshops/adc2000/papers/adc2000_all.zip.
[53] C. Marti. Automatic contract extraction: Developing a cil parser. Master’s thesis, ETH
Z¨urich, September 2003.
[54] Microsoft Corporation. Overview of the .NET framework. Technical report, Microsoft
Corporation, 2003. URL http://msdn.microsoft.com/library/default.asp?url=
/library/en-us/cpguide/html/cpovrintroductiontonetframeworksdk.asp.
[55] Microsoft Corporation. What is the common language specification. Technical report, Mi-
crosoft Corporation, 2003. URL http://msdn.microsoft.com/library/default.asp?
url=/library/en-us/cpguide/html/cpconwhatiscommonlanguagespecification.
asp.
[56] Microsoft Corporation. .NET compact framework - technology overview. Technical
report, Microsoft Corporation, 2003. URL http://msdn.microsoft.com/mobility/
prodtechinfo/devtools/netcf/overview/default.aspx.
[57] Microsoft Corporation. What’s is .NET? Technical report, Microsoft Corporation, 2005.
URL http://www.microsoft.com/net/basics.mspx.
[58] Microsoft Corporation. Design Guidelines for Class Library Developers. Technical re-
port, Microsoft Corporation, 2006. URL http://msdn.microsoft.com/library/en-us/
cpgenref/html/cpconnetframeworkdesignguidelines.asp.
[59] Microsoft Corporation. Phoenix Documentation, 2006.
[60] I. Nagy. On the Design of Aspect-Oriented Composition Models for Software Evolution. PhD
thesis, University of Twente, The Netherlands, June 2006.
M.D.W. van Oudheusden 139
BIBLIOGRAPHY
[61] H. R. Nielson and F. Nielson. Semantics with applications: a formal introduction. John Wiley
& Sons, Inc., New York, NY, USA, 1992. ISBN 0-471-92980-8.
[62] H. Ossher and P. Tarr. Multi-dimensional separation of concerns and the Hyperspace
approach. In M. Aks¸it, editor, Software Architectures and Component Technology. Kluwer
Academic Publishers, 2001. ISBN 0-7923-7576-9.
[63] A. Popovici, T. Gross, and G. Alonso. Dynamic weaving for aspect-oriented programming.
In G. Kiczales, editor, Proc. 1st Int’ Conf. on Aspect-Oriented Software Development (AOSD-
2002), pages 141–147. ACM Press, Apr. 2002.
[64] A. Popovici, G. Alonso, and T. Gross. Just in time aspects. In Aks¸it [2], pages 100–109.
[65] J. Prosise. Programming Microsoft .NET. Microsoft Press, Redmond, WA, USA, 2002. ISBN
0-7356-1376-1.
[66] H. Rajan and K. J. Sullivan. Generalizing aop for aspect-oriented testing. In In the proceed-
ings of the Fourth International Conference on Aspect-Oriented Software Development (AOSD
2005), 2005.
[67] T. Richner. Recovering Behavioral Design Views: a Query-Based Approach. PhD thesis, Uni-
versity of Berne, May 2002.
[68] T. Richner and S. Ducasse. Recovering high-level views of object-oriented applications
from static and dynamic information. Proceedings ICSM, 99:13–22, 1999.
[69] C. D. Roover, K. Gybels, and T. D’Hondt. Towards abstract interpretation for recovering
design information. Electr. Notes Theor. Comput. Sci., 131:15–25, 2005.
[70] P. Salinas. Adding systemic crosscutting and super-imposition to Composition Filters.
Master’s thesis, Vrije Universiteit Brussel, Aug. 2001.
[71] D. R. Spenkelink. Compose* incremental. Master’s thesis, University of Twente, The
Netherlands, 2006. To be released.
[72] T. Staijen. Towards safe advice: Semantic analysis of advice types in Compose*. Master’s
thesis, University of Twente, Apr. 2005.
[73] D. Stutz. The Microsoft shared source CLI implementation. 2002.
[74] P. Tarr, H. Ossher, S. M. Sutton, Jr., and W. Harrison. N degrees of separation: Multi-
dimensional separation of concerns. In R. E. Filman, T. Elrad, S. Clarke, and M. Aks¸it,
editors, Aspect-Oriented Software Development, pages 37–61. Addison-Wesley, Boston, 2005.
ISBN 0-321-21976-7.
[75] J. W. te Winkel. Bringing Composition Filters to C. Master’s thesis, University of Twente,
The Netherlands, 2006. To be released.
[76] F. Tip. A survey of program slicing techniques. Journal of programming languages, 3:121–189,
1995. URL citeseer.ist.psu.edu/tip95survey.html.
[77] M. A. Tobias Rho, Gnter Kniesel. Fine-grained generic aspects. 2006. URL http://www.
cs.iastate.edu/˜leavens/FOAL/papers-2006/RhoKnieselAppeltauer.pdf.
[78] T. Tourw´e, J. Brichau, and K. Gybels. On the existence of the AOSD-evolution paradox. In
L. Bergmans, J. Brichau, P. Tarr, and E. Ernst, editors, SPLAT: Software engineering Properties
140 Automatic Derivation of Semantic Properties in .NET
BIBLIOGRAPHY
of Languages for Aspect Technologies, Mar. 2003. URL http://www.daimi.au.dk/˜eernst/
splat03/papers/Tom_Tourwe.pdf.
[79] M. D. W. van Oudheusden. Automatic Derivation of Semantic Properties in .NET. Mas-
ter’s thesis, University of Twente, The Netherlands, Aug. 2006.
[80] C. Vinkes. Superimposition in the Composition Filters model. Master’s thesis, University
of Twente, The Netherlands, Oct. 2004.
[81] N. Walkinshaw, M. Roper, and M. Wood. Understanding Object-Oriented Source Code
from the Behavioural Perspective. Program Comprehension, 2005. IWPC 2005. Proceedings.
13th International Workshop on, pages 215–224, 2005.
[82] D. Watkins. Handling language interoperability with the Microsoft .NET framework.
Technical report, Monash Univeristy, Oct. 2000. URL http://msdn.microsoft.com/
library/default.asp?url=/library/en-us/dndotnet/html/interopdotnet.asp.
[83] D. A. Watt. Programming language concepts and paradigms. Prentice Hall, 1990.
[84] M. D. Weiser. Program slices: formal, psychological, and practical investigations of an automatic
program abstraction method. PhD thesis, 1979.
[85] J. C. Wichman. The development of a preprocessor to facilitate composition filters in the
Java language. Master’s thesis, University of Twente, 1999. URL http://trese.cs.
utwente.nl/publications/paperinfo/wichman.thesis.pi.top.htm.
[86] R. Wuyts. A Logic Meta-Programming Approach to Support the Co-Evolution of Object-
Oriented Design and Implementation. PhD thesis, 2001. URL citeseer.ist.psu.edu/
wuyts01logic.html.
M.D.W. van Oudheusden 141
APPENDIX A
CIL Instruction Set
In the following table, all the available operational codes of the CIL instruction set are listed
with their description.
OpCode Description
nop Do nothing
break Inform a debugger that a breakpoint has been reached.
ldarg.0 Load argument 0 onto stack
ldarg.1 Load argument 1 onto stack
ldarg.2 Load argument 2 onto stack
ldarg.3 Load argument 3 onto stack
ldloc.0 Load local variable 0 onto stack.
ldloc.1 Load local variable 1 onto stack.
ldloc.2 Load local variable 2 onto stack.
ldloc.3 Load local variable 3 onto stack.
stloc.0 Pop value from stack into local variable 0.
stloc.1 Pop value from stack into local variable 1.
stloc.2 Pop value from stack into local variable 2.
stloc.3 Pop value from stack into local variable 3.
ldarg.s Load argument numbered num onto stack, short form.
ldarga.s fetch the address of argument argNum, short form
starg.s Store a value to the argument numbered num, short form
ldloc.s Load local variable of index indx onto stack, short form.
ldloca.s Load address of local variable with index indx, short form
stloc.s Pop value from stack into local variable indx, short form.
ldnull Push null reference on the stack
ldc.i4.m1 Push -1 onto the stack as int32.
ldc.i4.0 Push 0 onto the stack as int32.
ldc.i4.1 Push 1 onto the stack as int32.
ldc.i4.2 Push 2 onto the stack as int32.
142
OpCode Description
ldc.i4.3 Push 3 onto the stack as int32.
ldc.i4.4 Push 4 onto the stack as int32.
ldc.i4.5 Push 5 onto the stack as int32.
ldc.i4.6 Push 6 onto the stack as int32.
ldc.i4.7 Push 7 onto the stack as int32.
ldc.i4.8 Push 8 onto the stack as int32.
ldc.i4.s Push num onto the stack as int32, short form.
ldc.i4 Push num of type int32 onto the stack as int32.
ldc.i8 Push num of type int64 onto the stack as int64.
ldc.r4 Push num of type float32 onto the stack as F.
ldc.r8 Push num of type float64 onto the stack as F.
dup Duplicate value on the top of the stack
pop Pop a value from the stack
jmp Exit current method and jump to specified method
call Call method
calli Call method indicated on the stack with arguments de-
scribed by callsitedescr.
ret Return from method, possibly returning a value
br.s Branch to target, short form
brfalse.s Branch to target if value is zero (false), short form
brtrue.s Branch to target if value is non-zero (true), short form
beq.s Branch to target if equal, short form
bge.s Branch to target if greater than or equal to, short form
bgt.s Branch to target if greater than, short form
ble.s Branch to target if less than or equal to, short form
blt.s Branch to target if less than, short form
bne.un.s Branch to target if unequal or unordered, short form
bge.un.s Branch to target if greater than or equal to (unsigned or
unordered), short form
bgt.un.s Branch to target if greater than (unsigned or unordered),
short form
ble.un.s Branch to target if less than or equal to (unsigned or un-
ordered), short form
blt.un.s Branch to target if less than (unsigned or unordered), short
form
br Branch to target
brfalse Branch to target if value is zero (false)
brtrue Branch to target if value is non-zero (true)
beq Branch to target if equal
bge Branch to target if greater than or equal to
bgt Branch to target if greater than
ble Branch to target if less than or equal to
blt Branch to target if less than
bne.un Branch to target if unequal or unordered
bge.un Branch to target if greater than or equal to (unsigned or
unordered)
bgt.un Branch to target if greater than (unsigned or unordered)
M.D.W. van Oudheusden 143
OpCode Description
ble.un Branch to target if less than or equal to (unsigned or un-
ordered)
blt.un Branch to target if less than (unsigned or unordered)
switch jump to one of n values
ldind.i1 Indirect load value of type int8 as int32 on the stack.
ldind.u1 Indirect load value of type unsigned int8 as int32 on the
stack.
ldind.i2 Indirect load value of type int16 as int32 on the stack.
ldind.u2 Indirect load value of type unsigned int16 as int32 on the
stack.
ldind.i4 Indirect load value of type int32 as int32 on the stack.
ldind.u4 Indirect load value of type unsigned int32 as int32 on the
stack.
ldind.i8 Indirect load value of type int64 as int64 on the stack.
ldind.i Indirect load value of type native int as native int on the
stack
ldind.r4 Indirect load value of type float32 as F on the stack.
ldind.r8 Indirect load value of type float64 as F on the stack.
ldind.ref Indirect load value of type object ref as O on the stack.
stind.ref Store value of type object ref (type O) into memory at ad-
dress
stind.i1 Store value of type int8 into memory at address
stind.i2 Store value of type int16 into memory at address
stind.i4 Store value of type int32 into memory at address
stind.i8 Store value of type int64 into memory at address
stind.r4 Store value of type float32 into memory at address
stind.r8 Store value of type float64 into memory at address
add Add two values, returning a new value
sub Subtract value2 from value1, returning a new value
mul Multiply values
div Divide two values to return a quotient or floating-point
result
div.un Divide two values, unsigned, returning a quotient
rem Remainder of dividing value1 by value2
rem.un Remainder of unsigned dividing value1 by value2
and Bitwise AND of two integral values, returns an integral
value
or Bitwise OR of two integer values, returns an integer.
xor Bitwise XOR of integer values, returns an integer
shl Shift an integer left (shifting in zeros), return an integer
shr Shift an integer right (shift in sign), return an integer
shr.un Shift an integer right (shift in zero), return an integer
neg Negate value
not Bitwise complement
conv.i1 Convert to int8, pushing int32 on stack
conv.i2 Convert to int16, pushing int32 on stack
conv.i4 Convert to int32, pushing int32 on stack
144 Automatic Derivation of Semantic Properties in .NET
OpCode Description
conv.i8 Convert to int64, pushing int64 on stack
conv.r4 Convert to float32, pushing F on stack
conv.r8 Convert to float64, pushing F on stack
conv.u4 Convert to unsigned int32, pushing int32 on stack
conv.u8 Convert to unsigned int64, pushing int64 on stack
callvirt Call a method associated with obj
cpobj Copy a value type from srcValObj to destValObj
ldobj Copy instance of value type classTok to the stack.
ldstr push a string object for the literal string
newobj Allocate an uninitialized object or value type and call ctor
castclass Cast obj to class
isinst test if obj is an instance of class, returning null or an in-
stance of that
conv.r.un Convert unsigned integer to floating-point, pushing F on
stack
unbox Extract the value type data from obj, its boxed representa-
tion
throw Throw an exception
ldfld Push the value of field of object, or value type, obj, onto
the stack
ldflda Push the address of field of object obj on the stack
stfld Replace the value of field of the object obj with val
ldsfld Push the value of field on the stack
ldsflda Push the address of the static field, field, on the stack
stsfld Replace the value of field with val
stobj Store a value of type classTok from the stack into memory
conv.ovf.i1.un Convert unsigned to an int8 (on the stack as int32) and
throw an exception on overflow
conv.ovf.i2.un Convert unsigned to an int16 (on the stack as int32) and
throw an exception on overflow
conv.ovf.i4.un Convert unsigned to an int32 (on the stack as int32) and
throw an exception on overflow
conv.ovf.i8.un Convert unsigned to an int64 (on the stack as int64) and
throw an exception on overflow
conv.ovf.u1.un Convert unsigned to an unsigned int8 (on the stack as
int32) and throw an exception on overflow
conv.ovf.u2.un Convert unsigned to an unsigned int16 (on the stack as
int32) and throw an exception on overflow
conv.ovf.u4.un Convert unsigned to an unsigned int32 (on the stack as
int32) and throw an exception on overflow
conv.ovf.u8.un Convert unsigned to an unsigned int64 (on the stack as
int64) and throw an exception on overflow
conv.ovf.i.un Convert unsigned to a native int (on the stack as native int)
and throw an exception on overflow
conv.ovf.u.un Convert unsigned to a native unsigned int (on the stack as
native int) and throw an exception on overflow
box Convert valueType to a true object reference
M.D.W. van Oudheusden 145
OpCode Description
newarr Create a new array with elements of type etype
ldlen Push the length (of type native unsigned int) of array on
the stack
ldelema Load the address of element at index onto the top of the
stack
ldelem.i1 Load the element with type int8 at index onto the top of
the stack as an int32
ldelem.u1 Load the element with type unsigned int8 at index onto
the top of the stack as an int32
ldelem.i2 Load the element with type int16 at index onto the top of
the stack as an int32
ldelem.u2 Load the element with type unsigned int16 at index onto
the top of the stack as an int32
ldelem.i4 Load the element with type int32 at index onto the top of
the stack as an int32
ldelem.u4 Load the element with type unsigned int32 at index onto
the top of the stack as an int32
ldelem.i8 Load the element with type int64 at index onto the top of
the stack as an int64
ldelem.i Load the element with type native int at index onto the top
of the stack as an native int
ldelem.r4 Load the element with type float32 at index onto the top
of the stack as an F
ldelem.r8 Load the element with type float64 at index onto the top
of the stack as an F
ldelem.ref Load the element of type object, at index onto the top of
the stack as an O
stelem.i Replace array element at index with the i value on the
stack
stelem.i1 Replace array element at index with the int8 value on the
stack
stelem.i2 Replace array element at index with the int16 value on the
stack
stelem.i4 Replace array element at index with the int32 value on the
stack
stelem.i8 Replace array element at index with the int64 value on the
stack
stelem.r4 Replace array element at index with the float32 value on
the stack
stelem.r8 Replace array element at index with the float64 value on
the stack
stelem.ref Replace array element at index with the ref value on the
stack
conv.ovf.i1 Convert to an int8 (on the stack as int32) and throw an
exception on overflow
conv.ovf.u1 Convert to a unsigned int8 (on the stack as int32) and
throw an exception on overflow
146 Automatic Derivation of Semantic Properties in .NET
OpCode Description
conv.ovf.i2 Convert to an int16 (on the stack as int32) and throw an
exception on overflow
conv.ovf.u2 Convert to a unsigned int16 (on the stack as int32) and
throw an exception on overflow
conv.ovf.i4 Convert to an int32 (on the stack as int32) and throw an
exception on overflow
conv.ovf.u4 Convert to a unsigned int32 (on the stack as int32) and
throw an exception on overflow
conv.ovf.i8 Convert to an int64 (on the stack as int64) and throw an
exception on overflow
conv.ovf.u8 Convert to a unsigned int64 (on the stack as int64) and
throw an exception on overflow
refanyval Push the address stored in a typed reference
ckfinite Throw ArithmeticException if value is not a finite number
mkrefany Push a typed reference to ptr of type class onto the stack
ldtoken Convert metadata token to its runtime representation
conv.u2 Convert to unsigned int16, pushing int32 on stack
conv.u1 Convert to unsigned int8, pushing int32 on stack
conv.i Convert to native int, pushing native int on stack
conv.ovf.i Convert to an native int (on the stack as native int) and
throw an exception on overflow
conv.ovf.u Convert to a native unsigned int (on the stack as native int)
and throw an exception on overflow
add.ovf Add signed integer values with overflow check.
add.ovf.un Add unsigned integer values with overflow check.
mul.ovf Multiply signed integer values. Signed result must fit in
same size
mul.ovf.un Multiply unsigned integer values. Unsigned result must
fit in same size
sub.ovf Subtract native int from a native int. Signed result must fit
in same size
sub.ovf.un Subtract native unsigned int from a native unsigned int.
Unsigned result must fit in same size
endfinally End finally clause of an exception block
leave Exit a protected region of code.
leave.s Exit a protected region of code, short form
stind.i Store value of type native int into memory at address
conv.u Convert to native unsigned int, pushing native int on stack
arglist Return argument list handle for the current method
ceq Push 1 (of type int32) if value1 equals value2, else 0
cgt Push 1 (of type int32) if value1 > value2, else 0
cgt.un Push 1 (of type int32) if value1 > value2, unsigned or un-
ordered, else 0
clt Push 1 (of type int32) if value1 < value2, else 0
clt.un Push 1 (of type int32) if value1 < value2, unsigned or un-
ordered, else 0
ldftn Push a pointer to a method
M.D.W. van Oudheusden 147
OpCode Description
ldvirtftn Push address of virtual method mthd on the stack
ldarg Load argument numbered num onto stack.
ldarga fetch the address of argument argNum.
starg Store a value to the argument numbered num
ldloc Load local variable of index indx onto stack.
ldloca Load address of local variable with index indx
stloc Pop value from stack into local variable indx.
localloc Allocate space from the local memory pool.
endfilter End filter clause of SEH exception handling
unaligned. Subsequent pointer instruction may be unaligned
volatile. Subsequent pointer reference is volatile
tail. Subsequent call terminates current method
initobj Initialize a value type
cpblk Copy data from memory to memory
initblk Set a block of memory to a given byte
rethrow Rethrow the current exception
sizeof Push the size, in bytes, of a value type as a unsigned int32
refanytype Push the type token stored in a typed reference
Table A.1: CIL instruction set
148 Automatic Derivation of Semantic Properties in .NET
APPENDIX B
Evaluation Stack Types
The execution engine of the common language runtime implements a coarse type system for
the evaluation stack. Only the types listed in Table B.1 can be present on the stack.
Type Description
int32 Signed 4-byte integer
native int Native integer, size dependent on the underlying platform
int64 Signed 8-byte integer
Float 80-bit floating point number (covering both 32 and 64 bit)
& Managed or unmanaged pointer
o Object reference
Table B.1: Evaluation Stack types
149
APPENDIX C
Semantic Extractor Configuration File
The Semantic Extractor uses a provider design pattern to specify the provider used to handle
the calls to the base class. Different types of providers are defined and stored in the app.config
file. The contents of this file is shown in Listing C.1.
1 <?xml version="1.0" encoding="utf-8" ?>
2 <configuration>
3 <configSections>
4 <section
5 name="SemanticExtractors"
6 type="SemanticLibrary.SemanticExtractorSection, SemanticLibrary"
7 allowLocation="true" allowDefinition="Everywhere" />
8 </configSections>
9 <!-- Semantic Extractor Provider Settings -->
10 <SemanticExtractors defaultProvider="phoenix">
11 <providers>
12 <clear/>
13 <add name="cecil" description="Mono Cecil 0.2"
14 type="SemanticExtractorCecil.SemanticExtractorCecil,
15 SemanticExtractorCecil" />
16 <add name="phoenix" description="Microsoft Phoenix"
17 type="SemanticExtractorPhoenix.SemTexPhoenix,
18 SemanticExtractorPhoenix" />
19 <add name="rail" description="Runtime Assembly Instrumentation Library"
20 type="SemanticExtractorRail.SemTexRail, SemanticExtractorRail" />
21 <add name="postsharp" description="PostSharp reads .NET binary modules,
22 represents them as a Code Object Model, lets plug-ins analyze and
23 transforms this model and writes it back to the binary form."
24 type="SemanticExtractorPostSharp.SemTexPostSharp,
25 SemanticExtractorPostSharp" />
26 </providers>
27 </SemanticExtractors>
28 </configuration>
Listing C.1: Contents of the app.config file
150
APPENDIX D
Semantic Actions
Table D.1 list all the available semantic action kinds and their properties.
Action Properties Description
Assign Source, destination Assignment of a value. The value of source is
assigned to destination.
Negate Source, destination Negates a value
Add Source1, source2,
destination
Add the value of source1 to source2 and store
result in destination
Not Source1, destination Bitwise complement of the source
Multiply Source1, source2,
destination
The values of source1 and source2 will be mul-
tiplied and placed in the destination.
Divide Source1, source2,
destination
The value of source1 will be divided by the
value of the source2 and the result will be
placed in the destination.
Remainder Source1, source2,
destination
The remainder action divides source1 by
source2 and places the remainder result in the
destination.
Subtract Source1, source2,
destination
The value of source1 will be subtracted from the
value of source2 and the result is placed in the
destination.
And Source1, source2,
destination
An and operation is performed on the values of
source1 and source2 and the result is placed in
the destination.
Or Source1, source2,
destination
An or operation is performed on the values of
source1 and source2 and the result is placed in
the destination.
Xor Source1, source2,
destination
A bitwise Xor operation is performed on the
values of source1 and source2 and the result is
placed in the destination.
151
Action Properties Description
Jump Labelname A jump to the LabelName is performed.
Branch ConditionAction,
truelabel, falselabel
A branch action is performed and based on the
condition it will jump either to the TrueLabel-
Name or the FalseLabelName. The condition
can be found in the ConditionAction property.
Compare ComparisonType, source1,
source2, destination
A comparison is performed on the source1
and source2 values using the ComparisonType.
The resulted boolean value is placed in the
destination.
Create Destination,
operationname
An new instance or a new array is created. The
new object or array is placed in the destination
and in this operand you can find the type and
name. If the create operation calls a construc-
tor, then you can find this constructor in the
operationname property.
Convert Source, destination A conversion is performed on the source to the
destination. The new type can be found in the
destination properties while the old type infor-
mation is still present in the source.
Call OperationName,
destination
A call to another operation is made. The
name of this operation can be found in
the OperationName property. It’s return
value, when available, will be placed in the
destination.
Return Source The control flow is returned to the calling op-
eration. This means the end of the current op-
eration. The return value can be placed in the
source.
RaiseException Source Raises an exception. The exception type is
placed in source.
Test Source1, source2,
destination
Test if source1 is equal to source2 and store the
result in the destination.
Switch Source1, SwitchLabels A switch construction where the source1 de-
fines a value indication the label to jump to. The
labels can be found in the SwitchLabels.
Table D.1: Available semantic actions kinds
152 Automatic Derivation of Semantic Properties in .NET
APPENDIX E
Semantic Types
Table E.1 list all the available semantic common type kinds.
Type name Description
Unknown It is an unknown type. If a type cannot be determined
or is subject to change, set the semantic type to Un-
known.
Char A single character.
String Strings are used to hold text.
Byte Represents an 8-bit signed integer.
Short Represents a signed 16-bit integer.
Integer Represents a signed 32-bit integer.
Long Represents a signed 64-bit integer.
Float The float keyword denotes a simple type that stores
32-bit floating-point values.
Double The double keyword denotes a simple type that stores
64-bit floating-point values.
Boolean Represents a boolean value.
Object Represents a general object.
Unsigned Short Represents an 16-bit unsigned integer.
Unsigned Integer Represents an 32-bit unsigned integer.
Unsigned Long Represents an 64-bit unsigned integer.
Unsigned Byte Represents an 8-bit unsigned integer.
DateTime A date and/or time field.
Table E.1: Available semantic common types
153
APPENDIX F
SEMTEX Generated File
The ComposeStarAnalyserPlugin is called during the compilation process and analyzes the as-
semblies created by the weaver. Basically it provides three main tasks; extracting the semantics
of the ReifiedMessage, the resource usage of fields and arguments, and the calls made to other
functions. See Section 8.4 for more information. The information is stored in an xml file so
it can be imported and used by the other Compose modules. An example of this xml file is
shown in Listing F.1.
1 <?xml version="1.0" encoding="utf-8"?>
2 <SemanticContainers>
3 <SemanticContainer name="pacman.ConcernImplementations.ScoreIncreaser" sourcefile
="Pacman/obj/Weaver/pacman.ConcernImplementations.ScoreIncreaser.dll">
4 <SemanticClass name="pacman.ConcernImplementations.ScoreIncreaser">
5 <FullName>[pacman.ConcernImplementations.ScoreIncreaser.dll]pacman.
ConcernImplementations.ScoreIncreaser</FullName>
6 <BaseType>[mscorlib]System.Object - Object</BaseType>
7 <SemanticMethod name="increase">
8 <ReturnType>System.Void</ReturnType>
9 <CallsToOtherMethods>
10 <Call operationName="getArgs" className="Composestar.RuntimeCore.FLIRT.
Message.ReifiedMessage" />
11 <Call operationName="toString" className="com.ms.vjsharp.lang.ObjectImpl"
/>
12 <Call operationName="parseInt" className="java.lang.Integer" />
13 <Call operationName="handleReturnMethodCall" className="Composestar.
RuntimeCore.FLIRT.MessageHandlingFacility" />
14 <Call operationName="setArgs" className="Composestar.RuntimeCore.FLIRT.
Message.ReifiedMessage" />
15 <Call operationName="append" className="java.lang.StringBuffer" />
16 <Call operationName="getSelector" className="Composestar.RuntimeCore.FLIRT
.Message.ReifiedMessage" />
17 <Call operationName="append" className="java.lang.StringBuffer" />
18 <Call operationName="handleReturnMethodCall" className="Composestar.
RuntimeCore.FLIRT.MessageHandlingFacility" />
19 <Call operationName="append" className="java.lang.StringBuffer" />
154
20 <Call operationName="append" className="java.lang.StringBuffer" />
21 <Call operationName="handleReturnMethodCall" className="Composestar.
RuntimeCore.FLIRT.MessageHandlingFacility" />
22 <Call operationName="println" className="java.io.PrintStream" />
23 <Call operationName="handleVoidMethodCall" className="Composestar.
RuntimeCore.FLIRT.MessageHandlingFacility" />
24 </CallsToOtherMethods>
25 <ReifiedMessageBehaviour>
26 <Semantic value="selector.read" />
27 </ReifiedMessageBehaviour>
28 <ResourceUsages>
29 <ResourceUsage name="message" operandType="SemanticArgument" accessType="
read" accessOccurence="AtLeastOnce" />
30 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
" accessOccurence="AtLeastOnce" />
31 <ResourceUsage name="this" operandType="SemanticArgument" accessType="
write" accessOccurence="AtLeastOnce" />
32 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
" accessOccurence="AtLeastOnce" />
33 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
" accessOccurence="MaybeMoreThenOnce" />
34 <ResourceUsage name="message" operandType="SemanticArgument" accessType="
read" accessOccurence="MaybeMoreThenOnce" />
35 <ResourceUsage name="message" operandType="SemanticArgument" accessType="
read" accessOccurence="MaybeMoreThenOnce" />
36 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
" accessOccurence="MaybeMoreThenOnce" />
37 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
" accessOccurence="MaybeMoreThenOnce" />
38 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
" accessOccurence="MaybeMoreThenOnce" />
39 </ResourceUsages>
40 </SemanticMethod>
41 </SemanticClass>
42 </SemanticContainer>
43 </SemanticContainers>
Listing F.1: Part of the SEMTEX file for the pacman example
M.D.W. van Oudheusden 155

More Related Content

PDF
C# c# for beginners crash course master c# programming fast and easy today
PPT
Kent's h&f presentation
DOC
CV MASSIOT_Philippe NOV_2015 English Final
PDF
Cisco Letter
PPT
Mt. Pleasant Animal Shelter PSA
DOCX
URUS NPIK-MAINAN ANAK CEPAT
PDF
10 hours ss4.5
C# c# for beginners crash course master c# programming fast and easy today
Kent's h&f presentation
CV MASSIOT_Philippe NOV_2015 English Final
Cisco Letter
Mt. Pleasant Animal Shelter PSA
URUS NPIK-MAINAN ANAK CEPAT
10 hours ss4.5

Similar to Thesis (20)

PDF
Systems se
PDF
Work Measurement Application - Ghent Internship Report - Adel Belasker
PDF
robert-kovacsics-part-ii-dissertation
PDF
PDF
Machine_translation_for_low_resource_Indian_Languages_thesis_report
PDF
PDF
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
PDF
Matloff programming on-parallel_machines-2013
PDF
PythonIntro
PDF
eclipse.pdf
PDF
MicroFSharp
PDF
Intro to embedded systems programming
PDF
Knapp_Masterarbeit
PDF
Python Programming Hans-petter Halvorsen.pdf
PDF
PDF
PDF
PDF
Digital Content Retrieval Final Report
PDF
Shariar Rostami - Master Thesis
Systems se
Work Measurement Application - Ghent Internship Report - Adel Belasker
robert-kovacsics-part-ii-dissertation
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Thesis - Nora Szepes - Design and Implementation of an Educational Support Sy...
Matloff programming on-parallel_machines-2013
PythonIntro
eclipse.pdf
MicroFSharp
Intro to embedded systems programming
Knapp_Masterarbeit
Python Programming Hans-petter Halvorsen.pdf
Digital Content Retrieval Final Report
Shariar Rostami - Master Thesis
Ad

Thesis

  • 1. Twente Research and Education on Software Engineering, Department of Computer Science, Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente Automatic Derivation of Semantic Properties in .NET M.D.W. van Oudheusden Enschede, August 28, 2006 graduation committee prof. dr. ir. M. Aksit dr. ir. L.M.J. Bergmans ir. P.E.A. D¨urr
  • 3. Abstract The Compose .NET project offers aspect-oriented programming for Microsoft .NET languages through the composition filters model. Most of the information used by Compose (and other Aspect Oriented Programming implementations) is based on structural and syntactical prop- erties. For instance, the join point selection is largely based on naming conventions and hier- archical structures. Furthermore, the analysis tasks of Compose would benefit when there is more semantical data available. Like the behavior of the reified message used by the meta filters or the side effects generated by the use of aspects. Fine grained join points, join points at instruction level, are currently not possible because the information about the inner workings of a function is not present. The automatic derivation of semantic properties is an attempt to deduce the behavior of a .NET application with the use of existing tools. Not only Compose can benefit from this, but also other applications like finding design patterns in the source code, reverse engineer design documentation, generating pre- and postconditions, verifying software contracts, checking be- havioral subtyping, or any other type of statical analysis. The Semantic Analyzer is the implementation of a static source code analyzer, which converts instructions into semantic actions and store these actions inside a metamodel, a high level rep- resentation of the original code. In order to do this, we first have to examine the different kinds of semantic programming constructs and how these semantics are represented in the target language, namely the Microsoft Intermediate Language. Different tools are available to read and parse the Intermediate Language and Microsoft Phoenix is selected to be used as the main IL reader and converter, called a Semantic Extractor. The extractors convert an object-oriented language to the language independent metamodel. In order to search in this model, a query mechanism based on native queries was devel- oped. Native queries are a type-safe, compile-time checked, and object-oriented way to express queries directly as Java or C# methods. A number of tools, such as plugins, were created to make use of the native queries to query the model to solve some of the semantical information needs. The automatic extraction of the reified message behavior and the resource usage are some of the functionalities now added to i
  • 4. Compose by using the Semantic Analyzer. Automatically deriving semantical properties by analyzing the source code and the semantics of the source language can certainly not solve all the problems. Some intent is not present in the code but in the underlying system and getting a clear behavioral description of a function is not possible. However, the Semantic Analyzer offers developers an extensive metamodel with basic semantic related actions, control flow information, and operand data to reason about the possible intended behavior of the code.
  • 5. Acknowledgements Research and development of the Semantic Analyzer and the actual writing of this thesis was an interesting, but time consuming process. Of course there are some people I would like to thank for helping me during this time. First, my graduation committee, Lodewijk Bergmans and Pascal D¨urr, for their insights in AOP in general and Compose in particular. Their remarks, suggestions, and questions helped me a great deal in creating this work. Also the other Compose developers for their work on the whole project and the tips and tricks I received from them when hunting down bugs and trying to get LATEXto do what I wanted. Last, but certainly not least, I would like to thank my parents for making this all happen and always supporting me in my decisions. iii
  • 7. Reading Guide A short guide is presented here to help you in reading this thesis. The first three chapters are common chapters written by the Compose developers. Chapter 1 presents a general introduction to Aspect Oriented Software Development and the evolution of programming languages. The next chapter, chapter 2, provides more information about Compose , which is an implementation of the composition filters approach. If you are unfa- miliar with either AOSD or the AOP solution Compose , then read the first two chapters. Chapter 3 describes the .NET Framework, the language platform used in the implementation. Chapter 6 will present more details about the language, so for background information of the .NET Framework, read chapter 3 first. The reasons why this assignment was carried out are discussed in the motivation chapter, chap- ter 4. To learn more about semantics, take a look at chapter 5. How semantic programming constructions are represented in the .NET Intermediate Language and how this language can be accessed is described in chapter 6. The actual design of the Semantic Analyzer is described in chapter 7 and chapter 8 will explain how to use the analyzer and provides some practical examples. Finally, the evaluation and conclusions are presented in chapter 9, as are related and future work. Code examples in the C# language are created in version 2.0 of the Microsoft .NET Framework unless stated otherwise. The algorithms use a pseudo C# language for their representation. Class diagrams were created with the Class Designer of Visual Studio 2005. More information about Compose and the Semantic Analyzer can be found at the Compose project website1. The source code for the Semantic Analyzer is available in the CVS archives of SourceForge2. 1 http://composestar.sf.net/ 2 http://composestar.cvs.sourceforge.net/composestar/home/mdwvanoudheusden/code/ v
  • 8. Contents Abstract i Acknowledgements iii Reading Guide v List of Figures xi List of Tables xiii List of Listings xv List of Algorithms xvii Nomenclature xix 1 Introduction to AOSD 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Traditional Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 AOP Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 AOP Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 Aspect Weaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2.1 Source Code Weaving . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2.2 Intermediate Language Weaving . . . . . . . . . . . . . . . . . . 6 1.3.2.3 Adapting the Virtual Machine . . . . . . . . . . . . . . . . . . . . 7 1.4 AOP Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.1 AspectJ Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.2 Hyperspaces Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.3 Composition Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Compose 12 2.1 Evolution of Composition Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 vi
  • 9. 2.2 Composition Filters in Compose . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Demonstrating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Initial Object-Oriented Design . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Completing the Pacman Example . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2.1 Implementation of Scoring . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2.2 Implementation of Dynamic Strategy . . . . . . . . . . . . . . . . 18 2.4 Compose Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Integrated Development Environment . . . . . . . . . . . . . . . . . . . . . 21 2.4.2 Compile Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.3 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.4 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.2 C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.3 .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6 Features Specific to Compose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Introduction to the .NET Framework 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Architecture of the .NET Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Version 2.0 of .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Common Language Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 Java VM vs .NET CLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Common Language Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 Framework Class Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.6 Common Intermediate Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Motivation 34 4.1 Current State of Compose /.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.1 Selecting Match Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.2 Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.3 Fine Grained Join Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Providing Semantical Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 General Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 Semantics 39 5.1 What is Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Semantics of Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2.1 Static and Dynamic Program Analysis . . . . . . . . . . . . . . . . . . . . . 40 5.2.2 Software Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3 Semantical Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3.1 Value Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3.2 Comparison of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.3 Branching Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.4 Method Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3.5 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3.6 Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.7 Type Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
  • 10. 5.3.8 Data Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.4 Program Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.4.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4.2 Method Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6 Analyzing the Intermediate Language 49 6.1 Inside the IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.1.1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.1.2 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.1.3 Assemblies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.1.4 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.1.5 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1.7 Method Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.1.8 IL Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.1.8.1 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.1.8.2 Arithmetical Instructions . . . . . . . . . . . . . . . . . . . . . . . 57 6.1.8.3 Loading and Storing . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.1.8.4 Method Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.1.8.5 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.1.9 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1.10 Custom Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Access the IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.2.1 How to Read IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.2.2 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.2.3 Mono Cecil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2.4 PostSharp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2.5 RAIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.6 Microsoft Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7 Design and Implementation 73 7.1 General Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.1.2 Design Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.1.3 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.1.4 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.1.4.1 Semantic Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.1.4.2 Semantic Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.1.4.3 Semantic Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.1.4.4 Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.1.5 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.2 Semantic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2.1 Overall Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2.2 From Instructions to Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.2.3 Dealing with Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.2.4 Type Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.5 Model Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.5.1 SemanticContainer . . . . . . . . . . . . . . . . . . . . . . . . . . 83
  • 11. 7.2.5.2 SemanticClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2.5.3 SemanticOperation . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2.5.4 SemanticOperand and Subclasses . . . . . . . . . . . . . . . . . . 89 7.2.5.5 SemanticBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.2.5.6 SemanticAction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.2.5.7 SemanticType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.2.5.8 SemanticAttributes . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.2.6 Flow graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.3 Extracting Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.3.1 Semantic Extractor Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.3.2 Mono Cecil Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.3.3 PostSharp Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.3.4 RAIL Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.3.5 Microsoft Phoenix Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.4 Querying the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.4.1 Semantic Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.4.2 What to Retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.4.3 Query Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.4.3.1 Predicate Language . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.4.3.2 Resource Description Framework . . . . . . . . . . . . . . . . . . 107 7.4.3.3 Traverse Over Methods . . . . . . . . . . . . . . . . . . . . . . . . 107 7.4.3.4 Object Query Language . . . . . . . . . . . . . . . . . . . . . . . . 108 7.4.3.5 Simple Object Database Access . . . . . . . . . . . . . . . . . . . 108 7.4.3.6 Native Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.4.3.7 Natural Language Queries . . . . . . . . . . . . . . . . . . . . . . 110 7.4.4 Native Queries in Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8 Using the Semantic Analyzer 115 8.1 Semantic Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.3 Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.3.1 ReifiedMessage Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.3.2 Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.3.3 Export Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.3.4 Natural Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.4 Integration with Compose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 9 Conclusion, Related, and Future Work 126 9.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 9.1.1 Microsoft Spec# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 9.1.2 SOUL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.1.3 SetPoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 9.1.4 NDepend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.1.5 Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.2 Evaluation and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 9.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 9.3.1 Extractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 9.3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
  • 12. 9.3.3 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 9.3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 9.3.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Bibliography 135 A CIL Instruction Set 142 B Evaluation Stack Types 149 C Semantic Extractor Configuration File 150 D Semantic Actions 151 E Semantic Types 153 F SEMTEX Generated File 154
  • 13. List of Figures 1.1 Dates and ancestry of several important languages . . . . . . . . . . . . . . . . . . 2 2.1 Components of the composition filters model . . . . . . . . . . . . . . . . . . . . . 14 2.2 UML class diagram of the object-oriented Pacman game . . . . . . . . . . . . . . 16 2.3 Overview of the Compose architecture . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1 Context of the .NET framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Relationships in the CTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Main components of the CLI and their relationships . . . . . . . . . . . . . . . . . 30 3.4 From source code to machine code . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.1 Structure of a managed executable module . . . . . . . . . . . . . . . . . . . . . . 50 6.2 Assembly containing multiple files . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.3 Different kinds of methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.4 Method body structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.5 Graphical User Interface of ILDASM . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.6 Lutz Roeder’s Reflector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.7 Microsoft FxCop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.8 Platform of Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.9 Control flow graph in Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.1 Process overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.2 Control flow of the console application with a plugin . . . . . . . . . . . . . . . . 76 7.3 Loop represented as blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.4 Structure of the metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.5 Semantic Item and direct derived classes . . . . . . . . . . . . . . . . . . . . . . . . 85 7.6 SemanticUnit and child classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.7 Visitor pattern in the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.8 SemanticContainer and SemanticClass classes . . . . . . . . . . . . . . . . . . 87 7.9 SemanticOperation class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.10 SemanticOperand class and derived classes . . . . . . . . . . . . . . . . . . . . . . 89 7.11 SemanticBlock class with collection of SemanticAction . . . . . . . . . . . . . . 90 xi
  • 14. 7.12 The SemanticAction class with supporting types . . . . . . . . . . . . . . . . . . 91 7.13 SemanticType class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.14 SemanticAttribute class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.15 The flow classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.16 Semantic Extractor Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.17 SemanticDatabaseContainer class . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.18 ExtendedList class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.1 Windows Forms Semantic Analyzer application . . . . . . . . . . . . . . . . . . . 118 8.2 Plugin interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
  • 15. List of Tables 5.1 Comparison operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.1 Aritmetical operations in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Bitwise and shift operations in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.3 Phoenix unit hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Phoenix instruction forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.1 Assembly naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.1 CIL instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 B.1 Evaluation Stack types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.1 Available semantic actions kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 E.1 Available semantic common types . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 xiii
  • 17. Listings 1.1 Modeling addition, display, and logging without using aspects . . . . . . . . . . 3 (a) Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 (b) CalcDisplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Modeling addition, display, and logging with aspects . . . . . . . . . . . . . . . . 5 (a) Addition concern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 (b) Tracing concern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Example of dynamic crosscutting in AspectJ . . . . . . . . . . . . . . . . . . . . . 8 1.4 Example of static crosscutting in AspectJ . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Creation of a hyperspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Specification of concern mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Defining a hypermodule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Abstract concern template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 DynamicScoring concern in Compose . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Implementation of class Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 DynamicStrategy concern in Compose . . . . . . . . . . . . . . . . . . . . . . . 20 3.1 Adding example in IL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Adding example in the C# language . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Adding example in the VB.NET language . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Getter and Setter examples in C# .NET . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1 Assignment examples in C# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Comparison examples in C# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3 Exception handling example in C# . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4 Method AssignWhenMoreThenOne in C# .NET . . . . . . . . . . . . . . . . . . . 46 5.5 Method AssignWhenMoreThenOne in VB .NET . . . . . . . . . . . . . . . . . . . 46 5.6 Method AssignWhenMoreThenOne in Borland Delphi . . . . . . . . . . . . . . . 47 5.7 Method AssignWhenMoreThenOne in Common Intermediate Language . . . . . 47 6.1 Syntax of a class definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2 Syntax of a field definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.3 Example of a field definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.4 Syntax of a field definition with default value . . . . . . . . . . . . . . . . . . . . . 53 6.5 Example of a field definition with default value . . . . . . . . . . . . . . . . . . . . 53 xv
  • 18. 6.6 Syntax of a method definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.7 Control flow examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.8 Constant loading instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.9 Condition check followed by a branching instruction . . . . . . . . . . . . . . . . 58 6.10 Method call example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.11 Exception handling in label form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.12 Exception handling in label form example . . . . . . . . . . . . . . . . . . . . . . . 59 6.13 Exception handling in scope form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.14 Stack based expression in the Common Intermediate Language . . . . . . . . . . 60 6.15 Custom attribute syntax in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.16 Custom attribute example in IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.17 Example of a custom attribute in C# . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.18 Reflection example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.19 Cecil get types example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.20 PostSharp get body instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.21 Phoenix phase execute example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.1 Calling the SemanticExtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2 For loop in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.3 Expression in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.4 Expression in IL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.5 Semantical representation of the expression . . . . . . . . . . . . . . . . . . . . . . 81 7.6 Part of the Cecil Instruction Visitor . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.7 Using the instruction stream in PostSharp . . . . . . . . . . . . . . . . . . . . . . . 101 7.8 Loading the assembly using Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.9 Starting a phase for a function using the Phoenix Extractor . . . . . . . . . . . . . 102 7.10 SODA example in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.11 LINQ query example in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.12 LINQ query example in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.13 LINQ query example in C#.NET 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.14 LINQ query examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 (a) query expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 (b) lambda expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.15 Query function signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.16 Predicate matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.17 Return all distinct operations containing actions assigning a value to a field named ”name” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.1 Search for all call actions and sort by operation name . . . . . . . . . . . . . . . . 115 8.2 Find all operations using a field named value as their destination operand . . . . 116 8.3 Group jump labels by operation name . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.4 Retrieve all the assignments where an integer is used . . . . . . . . . . . . . . . . 117 8.5 Find operations using a ReifiedMessage . . . . . . . . . . . . . . . . . . . . . . . . 120 8.6 Find the argument using a ReifiedMessage . . . . . . . . . . . . . . . . . . . . . . 120 8.7 Retrieve all the calls to methods of the ReifiedMessage argument . . . . . . . . . 120 8.8 Retrieve other methods which should be analyzed after a proceed call . . . . . . 121 9.1 Selecting getters in SOUL using Prolog . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.2 Examples of CQL queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.1 Contents of the app.config file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 F.1 Part of the SEMTEX file for the pacman example . . . . . . . . . . . . . . . . . . . 154
  • 19. List of Algorithms 1 GenerateSemanticControlFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 2 Connect flow edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3 Start DetermineControlDependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4 DetermineControlDependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5 Determine Flow Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6 Determine Access Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7 Optimization of Semantic Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 xvii
  • 21. Nomenclature AOP Aspect-Oriented Programming API Application Programming Interface AST Abstract Syntax Tree BLOB Binary Large Object CF Composition Filters CIL Common Intermediate Language CLI Common Language Infrastructure CLR Common Language Runtime CLS Common Language Specification CQL Code Query Language CTS Common Type System FCL Framework Class Library GUI Graphical User Interface IL Intermediate Language JIT Just-in-time JVM Java Virtual Machine MLI Meta Level Interface xix
  • 22. OOP Object-Oriented Programming OpCode Operation Code OQL Object Query Language PDA Personal Digital Assistant RDF Resource Description Framework SDK Software Development Kit SOAP Simple Object Access Protocol SODA Simple Object Database Access SQL Structured Query Language UML Unified Modeling Language URI Uniform Resource Identifiers VM Virtual Machine WSDL Web Services Description Language XML eXtensible Markup Language
  • 23. CHAPTER 1 Introduction to AOSD The first two chapters have originally been written by seven M. Sc. students [39, 24, 80, 11, 72, 37, 10] at the University of Twente. The chapters have been rewritten for use in the following theses [79, 16, 75, 42, 23, 41, 71]. They serve as a general introduction into Aspect-Oriented Software Development and Compose in particular. 1.1 Introduction The goal of software engineering is to solve a problem by implementing a software system. The things of interest are called concerns. They exist at every level of the engineering process. A re- current theme in engineering is that of modularization: separation and localization of concerns. The goal of modularization is to create maintainable and reusable software. A programming language is used to implement concerns. Fifteen years ago the dominant programming language paradigm was procedural program- ming. This paradigm is characterized by the use of statements that update state variables. Examples are Algol-like languages such as Pascal, C, and Fortran. Other programming paradigms are the functional, logic, object-oriented, and aspect-oriented paradigms. Figure 1.1 summarizes the dates and ancestry of several important languages [83]. Every paradigm uses a different modularization mechanism for separating concerns into mod- ules. Functional languages try to solve problems without resorting to variables. These languages are entirely based on functions over lists and trees. Lisp and Miranda are examples of functional languages. A logic language is based on a subset of mathematical logic. The computer is programmed to infer relationships between values, rather than to compute output values from input values. Prolog is currently the most used logic language [83]. 1
  • 24. 1.1 Introduction object-oriented languages procedural and concurrent languages functional languages logic languages aspect-oriented languages 2000 1990 1980 1970 1960 1950 Smalltalk Simula Ada Pascal Algol-60 Algol-68 C Cobol Fortran Lisp ML Miranda Prolog Sina Sina/st Java C++ BASIC VB C# AspectJ 2005 Compose* Hyper/J Legenda: Influenced by Figure 1.1: Dates and ancestry of several important languages A shortcoming of procedural programming is that global variables can potentially be accessed and updated by any part of the program. This can result in unmanageable programs because no module that accesses a global variable can be understood independently from other modules that also access that global variable. The Object-Oriented Programming (OOP) paradigm improves modularity by encapsulating data with methods inside objects. The data may only be accessed indirectly, by calling the associated methods. Although the concept appeared in the seventies, it took twenty years to become popular [83]. The most well known object-oriented languages are C++, Java, C#, and Smalltalk. The hard part about object-oriented design is decomposing a system into objects. The task is difficult because many factors come into play: encapsulation, granularity, dependency, adaptability, reusability, and others. They all influence the decomposition, often in conflict- ing ways [31]. Existing modularization mechanisms typically support only a small set of decompositions and usually only a single dominant modularization at a time. This is known as the tyranny of the dominant decomposition [74]. A specific decomposition limits the ability to implement other concerns in a modular way. For example, OOP modularizes concerns in classes and only fixed relations are possible. Implementing a concern in a class might prevent another concern from being implemented as a class. Aspect-Oriented Programming (AOP) is a paradigm that solves this problem. AOP is commonly used in combination with OOP but can be applied to other paradigms as well. The following sections introduce an example to demonstrate the problems that may arise with OOP and show how AOP can solve this. Finally, we look at three particular AOP method- ologies in more detail. 2 Automatic Derivation of Semantic Properties in .NET
  • 25. 1.2 Traditional Approach 1 public class Add extends Calculation{ 2 3 private int result; 4 private CalcDisplay calcDisplay; 5 private Tracer trace; 6 7 Add() { 8 result = 0; 9 calcDisplay = new CalcDisplay(); 10 trace = new Tracer(); 11 } 12 13 public void execute(int a, int b) { 14 trace.write("void Add.execute(int, int )"); 15 result = a + b; 16 calcDisplay.update(result); 17 } 18 19 public int getLastResult() { 20 trace.write("int Add.getLastResult()") ; 21 return result; 22 } 23 } (a) Addition 1 public class CalcDisplay { 2 private Tracer trace; 3 4 public CalcDisplay() { 5 trace = new Tracer(); 6 } 7 8 public void update(int value){ 9 trace.write("void CalcDisplay.update( int)"); 10 System.out.println("Printing new value of calculation: "+value); 11 } 12 } (b) CalcDisplay Listing 1.1: Modeling addition, display, and logging without using aspects 1.2 Traditional Approach Consider an application containing an object Add and an object CalcDisplay. Add inherits from the abstract class Calculation and implements its method execute(a, b). It performs the addition of two integers. CalcDisplay receives an update from Add if a calculation is finished and prints the result to screen. Suppose all method calls need to be traced. The objects use a Tracer object to write messages about the program execution to screen. This is implemented by a method called write. Three concerns can be recognized: addition, display, and tracing. The implementation might look something like Listing 1.1. From our example, we recognize two forms of crosscutting: code tangling and code scattering. The addition and display concerns are implemented in classes Add and CalcDisplay respec- tively. Tracing is implemented in the class Tracer, but also contains code in the other two classes (lines 5, 10, 14, and 20 in (a) and 2, 5, and 9 in (b)). If a concern is implemented across several classes it is said to be scattered. In the example of Listing 1.1 the tracing concern is scattered. Usually a scattered concern involves code replication. That is, the same code is implemented a number of times. In our example the classes Add and CalcDisplay contain similar tracing code. In class Add the code for the addition and tracing concerns are intermixed. In class M.D.W. van Oudheusden 3
  • 26. 1.3 AOP Approach CalcDisplay the code for the display and tracing concerns are intermixed. If more then one concern is implemented in a single class they are said to be tangled. In our example the addition and tracing concerns are tangled. Also display and tracing concerns are tangled. Crosscutting code has the following consequences: Code is difficult to change Changing a scattered concern requires us to modify the code in several places. Making modifications to a tangled concern class requires checking for side-effects with all existing crosscutting concerns; Code is harder to reuse To reuse an object in another system, it is necessary to either remove the tracing code or reuse the (same) tracer object in the new system; Code is harder to understand Tangled code makes it difficult to see which code belongs to which concern. 1.3 AOP Approach To solve the problems with crosscutting, several techniques are being researched that attempt to increase the expressiveness of the OO paradigm. Aspect-Oriented Programming (AOP) in- troduces a modular structure, the aspect, to capture the location and behavior of crosscutting concerns. Examples of Aspect-Oriented languages are Sina, AspectJ, Hyper/J, and Compose . A special syntax is used to specify aspects and the way in which they are combined with reg- ular objects. The fundamental goals of AOP are twofold [34]: first to provide a mechanism to express concerns that crosscut other components. Second to use this description to allow for the separation of concerns. Join points are well-defined places in the structure or execution flow of a program where ad- ditional behavior can be attached. The most common join points are method calls. Pointcuts describe a set of join points. This allows us to execute behavior at many places in a program by one expression. Advice is the behavior executed at a join point. In the example of Listing 1.2 the class Add does not contain any tracing code and only imple- ments the addition concern. Class CalcDisplay also does not contain tracing code. In our example the tracing aspect contains all the tracing code. The pointcut tracedCalls specifies at which locations tracing code is executed. The crosscutting concern is explicitly captured in aspects instead of being embedded within the code of other objects. This has several advantages over the previous code. Aspect code can be changed Changing aspect code does not influence other concerns; Aspect code can be reused The coupling of aspects is done by defining pointcuts. In theory, this low coupling allows for reuse. In practice reuse is still difficult; Aspect code is easier to understand A concern can be understood independent of other concerns; Aspect pluggability Enabling or disabling concerns becomes possible. 4 Automatic Derivation of Semantic Properties in .NET
  • 27. 1.3 AOP Approach 1 public class Add extends Calculation{ 2 private int result; 3 private CalcDisplay calcDisplay; 4 5 Add() { 6 result = 0; 7 calcDisplay = new CalcDisplay(); 8 } 9 10 public void execute(int a, int b) { 11 result = a + b; 12 calcDisplay.update(result); 13 } 14 15 public int getLastResult() { 16 return result; 17 } 18 } (a) Addition concern 1 aspect Tracing { 2 Tracer trace = new Tracer(); 3 4 pointcut tracedCalls(): 5 call(* (Calculation+).*(..)) || 6 call(* CalcDisplay.*(..)); 7 8 before(): tracedCalls() { 9 trace.write(thisJoinPoint.getSignature() .toString()); 10 } 11 } (b) Tracing concern Listing 1.2: Modeling addition, display, and logging with aspects 1.3.1 AOP Composition AOP composition can be either symmetric or asymmetric. In the symmetric approach every component can be composed with any other component. This approach is followed by e.g. Hyper/J. In the asymmetric approach, the base program and aspects are distinguished. The base pro- gram is composed with the aspects. This approach is followed by e.g. AspectJ (covered in more detail in the next section). 1.3.2 Aspect Weaving The integration of components and aspects is called aspect weaving. There are three approaches to aspect weaving. The first and second approach rely on adding behavior in the program, either by weaving the aspect in the source code, or by weaving directly in the target language. The target language can be intermediate language (IL) or machine code. Examples of IL are Java byte code and Common Intermediate Language (CIL). The remainder of this chapter considers only intermediate language targets. The third approach relies on adapting the virtual machine. Each method is explained briefly in the following sections. 1.3.2.1 Source Code Weaving The source code weaver combines the original source with aspect code. It interprets the defined aspects and combines them with the original source, generating input for the native compiler. For the native compiler there is no difference between source code with and without aspects. M.D.W. van Oudheusden 5
  • 28. 1.3 AOP Approach Hereafter the compiler generates an intermediate or machine language output (depending on the compiler-type). The advantages of using source code weaving are: High-level source modification Since all modifications are done at source code level, there is no need to know the target (output) language of the native compiler; Aspect and original source optimization First the aspects are woven into the source code and hereafter compiled by the native compiler. The produced target language has all the benefits of the native compiler opti- mization passes. However, optimizations specific to exploiting aspect knowledge are not possible; Native compiler portability The native compiler can be replaced by any other compiler as long as it has the same input language. Replacing the compiler with a newer version or another target language can be done with little or no modification to the aspect weaver. However, the drawbacks of source code weaving are: Language dependency Source code weaving is written explicitly for the syntax of the input language; Limited expressiveness Aspects are limited to the expressive power of the source language. For example, when using source code weaving, it is not possible to add multiple inheritance to a single in- heritance language. 1.3.2.2 Intermediate Language Weaving Weaving aspects through an intermediate language gives more control over the executable program and solves some issues as identified in subsubsection 1.3.2.1 on source code weaving. Weaving at this level allows for creating combinations of intermediate language constructs that can not be expressed at the source code level. Although IL can be hard to understand, IL weaving has several advantages over source code weaving: Programming language independence All compilers generating the target IL output can be used; More expressiveness It is possible to create IL constructs that are not possible in the original programming language; Source code independence Can add aspects to programs and libraries without using the source code (which may not be available); Adding aspects at load- or runtime A special class loader or runtime environment can decide and do dynamic weaving. The aspect weaver adds a runtime environment into the program. How and when aspects can be added to the program depend on the implementation of the runtime environment. However, IL weaving also has drawbacks that do not exist for source code weaving: 6 Automatic Derivation of Semantic Properties in .NET
  • 29. 1.4 AOP Solutions Hard to understand Specific knowledge about the IL is needed; More error-prone Compiler optimization may cause unexpected results. Compiler can remove code that breaks the attached aspect (e.g., inlining of methods). 1.3.2.3 Adapting the Virtual Machine Adapting the virtual machine (VM) removes the need to weave aspects. This technique has the same advantages as intermediate language weaving and can also overcome some of its disad- vantages as mentioned in subsubsection 1.3.2.2. Aspects can be added without recompilation, redeployment, and restart of the application [63, 64]. Modifying the virtual machine also has its disadvantages: Dependency on adapted virtual machines Using an adapted virtual machine requires that every system should be upgraded to that version; Virtual machine optimization People have spend a lot of time optimizing virtual machines. By modifying the virtual machine these optimizations should be revisited. Reintegrating changes introduced by newer versions of the original virtual machine, might have substantial impact. 1.4 AOP Solutions As the concept of AOP has been embraced as a useful extension to classic programming, dif- ferent AOP solutions have been developed. Each solution has one or more implementations to demonstrate how the solution is to be used. As described by [26] these differ primarily in: How aspects are specified Each technique uses its own aspect language to describe the concerns; Composition mechanism Each technique provides its own composition mechanisms; Implementation mechanism Whether components are determined statically at compile time or dynamically at run time, the support for verification of compositions, and the type of weaving. Use of decoupling Should the writer of the main code be aware that aspects are applied to his code; Supported software processes The overall process, techniques for reusability, analyzing aspect performance of aspects, is it possible to monitor performance, and is it possible to debug the aspects. This section will give a short introduction to AspectJ [46] and Hyperspaces [62], which together with Composition Filters [8] are three main AOP approaches. M.D.W. van Oudheusden 7
  • 30. 1.4 AOP Solutions 1 aspect DynamicCrosscuttingExample { 2 Log log = new Log(); 3 4 pointcut traceMethods(): 5 execution(edu.utwente.trese.*.*(..)); 6 7 before() : traceMethods { 8 log.write("Entering " + thisJointPoint.getSignature()); 9 } 10 11 after() : traceMethods { 12 log.write("Exiting " + thisJointPoint.getSignature()); 13 } 14 } Listing 1.3: Example of dynamic crosscutting in AspectJ 1.4.1 AspectJ Approach AspectJ [46] is an aspect-oriented extension to the Java programming language. It is probably the most popular approach to AOP at the moment, and it is finding its way into the industrial software development. AspectJ has been developed by Gregor Kiczales at Xerox’s PARC (Palo Alto Research Center). To encourage the growth of the AspectJ technology and community, PARC transferred AspectJ to an open Eclipse project. The popularity of AspectJ comes partly from the various extensions based on it, build by several research groups. There are various projects that are porting AspectJ to other languages, resulting in tools such as AspectR and AspectC. One of the main goals in the design of AspectJ is to make it a compatible extension to Java. AspectJ tries to be compatible in four ways: Upward compatibility All legal Java programs must be legal AspectJ programs; Platform compatibility All legal AspectJ programs must run on standard Java virtual machines; Tool compatibility It must be possible to extend existing tools to support AspectJ in a natural way; this includes IDEs, documentation tools and design tools; Programmer compatibility Programming with AspectJ must feel like a natural extension of programming with Java. AspectJ extends Java with support for two kinds of crosscutting functionality. The first allows defining additional behavior to run at certain well-defined points in the execution of the pro- gram and is called the dynamic crosscutting mechanism. The other is called the static crosscutting mechanism and allows modifying the static structure of classes (methods and relationships be- tween classes). The units of crosscutting implementation are called aspects. An example of an aspect specified in AspectJ is shown in Listing 1.3. The points in the execution of a program where the crosscutting behavior is inserted are called join points. A pointcut has a set of join points. In Listing 1.3 is traceMethods an example of a pointcut definition. The pointcut includes all executions of any method that is in a class contained by package edu.utwente.trese. 8 Automatic Derivation of Semantic Properties in .NET
  • 31. 1.4 AOP Solutions 1 aspect StaticCrosscuttingExample { 2 private int Log.trace(String traceMsg) { 3 Log.write(" --- MARK --- " + traceMsg); 4 } 5 } Listing 1.4: Example of static crosscutting in AspectJ The code that should execute at a given join point is declared in an advice. Advice is a method- like code body associated with a certain pointcut. AspectJ supports before, after and around advice that specifies where the additional code is to be inserted. In the example both before and after advice are declared to run at the join points specified by the traceMethods pointcut. Aspects can contain anything permitted in class declarations including definitions of pointcuts, advice and static crosscutting. For example, static crosscutting allows a programmer to add fields and methods to certain classes as shown in Listing 1.4. The shown construct is called inter-type member declaration and adds a method trace to class Log. Other forms of inter-type declarations allow developers to declare the parents of classes (superclasses and realized interfaces), declare where exceptions need to be thrown, and allow a developer to define the precedence among aspects. With its variety of possibilities AspectJ can be considered a useful approach for realizing soft- ware requirements. 1.4.2 Hyperspaces Approach The Hyperspaces approach is developed by H. Ossher and P. Tarr at the IBM T.J. Watson Research Center. The Hyperspaces approach adopts the principle of multi-dimensional separation of concerns [62], which involves: • Multiple, arbitrary dimensions of concerns; • Simultaneous separation along these dimensions; • Ability to dynamically handle new concerns and new dimensions of concern as they arise throughout the software life cycle; • Overlapping and interacting concerns. It is appealing to think of many concerns as inde- pendent or orthogonal, but they rarely are in practice. We explain the Hyperspaces approach by an example written in the Hyper/J language. Hyper/J is an implementation of the Hyperspaces approach for Java. It provides the ability to identify concerns, specify modules in terms of those concerns, and synthesize systems and components by integrating those modules. Hyper/J uses bytecode weaving on binary Java class files and generates new class files to be used for execution. Although the Hyper/J project seems aban- doned and there has not been any update in the code or documentation for a while, we still mention it because the Hyperspaces approach offers a unique AOP solution. As a first step, developers create hyperspaces by specifying a set of Java class files that contain the code units that populate the hyperspace. To do this is, you create a hyperspace specification, as demonstrated in Listing 1.5. M.D.W. van Oudheusden 9
  • 32. 1.4 AOP Solutions 1 Hyperspace Pacman 2 class edu.utwente.trese.pacman.*; Listing 1.5: Creation of a hyperspace Hyper/J will automatically create a hyperspace with one dimension—the class file dimension. A dimension of concern is a set of concerns that are disjoint. The initial hyperspace will con- tain all units within the specified package. To create a new dimension you can specify concern mappings, which describe how existing units in the hyperspace relate to concerns in that di- mension, as demonstrated in Listing 1.6. The first line indicates that, by default, all of the units contained within the package edu. utwente.trese.pacman address the kernel concern of the feature dimension. The other map- pings specify that any method named trace or debug address the logging and debugging concern respectively. These later mappings override the first one. Hypermodules are based on concerns and consist of two parts. The first part specifies a set of hyperslices in terms of the concerns identified in the concern matrix. The second part specifies the integration relationships between the hyperslices. A hyperspace can contain several hyper- modules realizing different modularizations of the same units. Systems can be composed in many ways from these hypermodules. Listing 1.7 shows a hypermodule with two concerns, kernel and logging. They are related by a mergeByName integration relationship. This means that units in the different concerns correspond if they have the same name (ByName) and that these corresponding units are to be combined (merge). For example, all members of the corresponding classes are brought together into the composed class. The hypermodule results in a hyperslice that contains all the classes without the debugging feature; thus no debug methods will be present. The most important feature of the hyperspaces approach is the support for on-demand remod- ularisation: the ability to extract hyperslices to encapsulate concerns that were not separated in the original code. Which makes hyperspaces especially useful for evolution of existing soft- ware. 1.4.3 Composition Filters Composition Filters is developed by M. Aks¸it and L. Bergmans at the TRESE group, which is a part of the Department of Computer Science of the University of Twente, The Netherlands. The composition filters (CF) model predates aspect-oriented programming. It started out as an extension to the object-oriented model and evolved into an aspect-oriented model. The current implementation of CF is Compose , which covers .NET, Java, and C. 1 package edu.utwente.trese.pacman: Feature.Kernel 2 operation trace: Feature.Logging 3 operation debug: Feature.Debugging Listing 1.6: Specification of concern mappings 10 Automatic Derivation of Semantic Properties in .NET
  • 33. 1.4 AOP Solutions 1 hypermodule Pacman_Without_Debugging 2 hyperslices: Feature.Kernel, Feature.Logging; 3 relationships: mergeByName; 4 end hypermodule; Listing 1.7: Defining a hypermodule One of the key elements of CF is the message, a message is the interaction between objects, for instance a method call. In object-oriented programming the message is considered an abstract concept. In the implementations of CF it is therefore necessary to reify the message. This reified message contains properties, like where it is send to and where it came from. The concept of CF is that messages that enter and exit an object can be intercepted and manip- ulated, modifying the original flow of the message. To do so, a layer called the interface part is introduced in the CF model, this layer can have several properties. The interface part can be placed on an object, which behavior needs to be altered, and this object is referred to as inner. There are three key elements in CF: messages, filters, and superimposition. Messages are sent from one object to another, if there is an interface part placed on the receiver, then the message that is sent goes through the input filters. In the filters the message can be manipulated before it reaches the inner part, the message can even be sent to another object. How the message will be handled depends on the filter type. An output filter is similar to an input filter, the only difference is that it manipulates messages that originate from the inner part. The latest addition to CF is superimposition, which is used to specify which interfaces needs to be superimposed on which inner objects. M.D.W. van Oudheusden 11
  • 34. CHAPTER 2 Compose Compose is an implementation of the composition filters approach. There are three target environments: the .NET, Java, and C. This chapter is organized as follows, first the evolution of Composition Filters and its implementations are described, followed by an explanation of the Compose language and a demonstrating example. In the third section, the Compose architecture is explained, followed by a description of the features specific to Compose . 2.1 Evolution of Composition Filters Compose is the result of many years of research and experimentation. The following time line gives an overview of what has been done in the years before and during the Compose project. 1985 The first version of Sina is developed by Mehmet Aks¸it. This version of Sina contains a preliminary version of the composition filters concept called semantic networks. The semantic network construction serves as an extension to objects, such as classes, mes- sages, or instances. These objects can be configured to form other objects such as classes from which instances can be created. The object manager takes care of syn- chronization and message processing of an object. The semantic network construction can express key concepts like delegation, reflection, and synchronization [47]. 1987 Together with Anand Tripathi of the University of Minnesota the Sina language is further developed. The semantic network approach is replaced by declarative specifi- cations and the interface predicate construct is added. 1991 The interface predicates are replaced by the dispatch filter, and the wait filter manages the synchronization functions of the object manager. Message reflection and real-time specifications are handled by the meta filter and the real-time filter [7]. 1995 The Sina language with Composition Filters is implemented using Smalltalk [47]. The implementation supports most of the filter types. In the same year, a preprocessor 12
  • 35. 2.2 Composition Filters in Compose 1 filtermodule{ 2 internals 3 externals 4 conditions 5 inputfilters 6 outputfilters 7 } 8 9 superimposition{ 10 selectors 11 filtermodules 12 annotations 13 constraints 14 } 15 16 implementation 17 } Listing 2.1: Abstract concern template providing C++ with support for Composition Filters is implemented [33]. 1999 The composition filters language ComposeJ [85] is developed and implemented. The implementation consists of a preprocessor capable of translating composition filter specifications into the Java language. 2001 ConcernJ is implemented as part of a M. Sc. thesis [70]. ConcernJ adds the notion of superimposition to Composition Filters. This allows for reuse of the filter modules and to facilitate crosscutting concerns. 2003 The start of the Compose project, the project is described in further detail in this chapter. 2004 The first release of Compose , based on .NET. 2005 The start of the Java port of Compose . 2006 Porting Compose to C is started. 2.2 Composition Filters in Compose A Compose application consists of concerns that can be divided in three parts: filter module specification, superimposition, and implementation. A filter module contains the filter logic to filter on messages that are incoming or outgoing the superimposed object. A message has a target, which is an object reference, and a selector, which is a method name. The superim- position part specifies which filter modules, annotations, conditions, and methods need to be superimposed on which objects. The implementation part contains the class implementation of the concern. How these parts are placed in a concern is shown in Listing 2.1. The working of the filter module is shown in Figure 2.1. A filter module can contain input and output filters. The difference between these two sets of filters is that the first is used to filter on incoming messages and the second filter set is used on the outgoing messages. A return of a method is not considered as an outgoing message. A filter has three parts: the filter identifier, the filter type, and one or more filter elements. The filter element exist out of an optional condition part, a matching part, and a substitution part. These parts are shown below: M.D.W. van Oudheusden 13
  • 36. 2.2 Composition Filters in Compose Figure 2.1: Components of the composition filters model identifier stalker filter : filter type Dispatch = { condition part !pacmanIsEvil => matching part [∗.getNextMove] substitution part stalk strategy.getNextMove } The filter identifier is the unique name for a filter in a filter module. A filter matches when both the condition as the matching provide the boolean value true. In the demonstrated filter it matches on every message where the selector is getNextMove, the ‘*’ in the target means that every target matches. When the condition part and the matching part are true, the message is substituted with the values of the substitution part. How these values are substituted and how the message continues depends on the filter type. At the moment there are four basic filter types in Compose ; it is possible to write custom filter types. Dispatch If the message is accepted, it is dispatched to the specified target of the message, other- wise the message continues to the subsequent filter. This filter type can only be used for input filters; Send If the message is accepted, it is sent to the specified target of the message, otherwise the message continues to the subsequent filter. This filter type can only be used for output filters; Error If the filter rejects the message, it raises an exception, otherwise the message continues to the next filter in the set; Meta If the message is accepted, the message is sent as a parameter of another meta message to an internal or external object, otherwise the message just continues to the next filter. The object that receives the meta message can observe and manipulate the message and can re-activate the execution of the message. 14 Automatic Derivation of Semantic Properties in .NET
  • 37. 2.3 Demonstrating Example The pacmanIsEvil used in the condition part must be declared in the conditions section of a filtermodule. The targets that are used in a filter must declared as internals or externals. Internals are objects which are unique for each instance of a filter module and externals are shared between filter modules. The filter modules can be superimposed on classes with filter module binding, this binding has a selection of objects on one side and a filter module on the other side. The selection is de- fined with a selector definition. The selector uses predicates, such as isClassWithNameInList, isNamespaceWithName, and namespaceHasClass, to select objects. It is also possible to bind conditions, methods, and annotations to classes with the use of superimposition. The last part of the concern is the implementation part. In the implementation part we can define the object behavior of the concern, so for example in a logging concern, we can define specific log functions. 2.3 Demonstrating Example To illustrate the Compose toolset, this section introduces a Pacman example. The Pacman game is a classic arcade game in which the user, represented by pacman, moves in a maze to eat vitamins. Meanwhile, a number of ghosts try to catch and eat pacman. There are, however, four mega vitamins in the maze that make pacman evil. In its evil state, pacman can eat ghosts. A simple list of requirements for the Pacman game is briefly discussed here: • The number of lives taken from pacman when eaten by a ghost; • A game should end when pacman has no more lives; • The score of a game should increase when pacman eats a vitamin or a ghost; • A user should be able to use a keyboard to move pacman around the maze; • Ghosts should know whether pacman is evil or not; • Ghosts should know where pacman is located; • Ghosts should, depending on the state of pacman, hunt or flee from pacman. 2.3.1 Initial Object-Oriented Design Figure 2.2 shows an initial object-oriented design for the Pacman game. Note that this UML class diagram does not show the trivial accessors. The classes in this diagram are: Game This class encapsulates the control flow and controls the state of a game; Ghost This class is a representation of a ghost chasing pacman. Its main attribute is a property that indicates whether it is scared or not (depending on the evil state of pacman); GhostView This class is responsible for painting ghosts; Glyph This is the superclass of all mobile objects (pacman and ghosts). It contains common information like direction and speed; Keyboard This class accepts all keyboard input and makes it available to pacman; M.D.W. van Oudheusden 15
  • 38. 2.3 Demonstrating Example Figure 2.2: UML class diagram of the object-oriented Pacman game 16 Automatic Derivation of Semantic Properties in .NET
  • 39. 2.3 Demonstrating Example Main This is the entry point of a game; Pacman This is a representation of the user controlled element in the game. Its main attribute is a property that indicates whether pacman is evil or not; PacmanView This class is responsible for painting pacman; RandomStrategy By using this strategy, ghosts move in random directions; View This class is responsible for painting a maze; World This class has all the information about a maze. It knows where the vitamins, mega vitamins and most importantly the walls are. Every class derived from class Glyph checks whether movement in the desired direction is possible. 2.3.2 Completing the Pacman Example The initial object-oriented design, described in the previous section, does not implement all the stated system requirements. The missing requirements are: • The application does not maintain a score for the user; • Ghosts move in random directions instead of chasing or fleeing from pacman. In the next sections, we describe why and how to implement these requirements in the Compose language. 2.3.2.1 Implementation of Scoring The first system requirement that we need to add to the existing Pacman game is scoring. This concern involves a number of events. First, the score should be set to zero when a game starts. Second, the score should be updated whenever pacman eats a vitamin, mega vitamin or ghost. And finally, the score itself has to be painted on the maze canvas to relay it back to the user. These events scatter over multiple classes: Game (initializing score), World (updating score), Main (painting score). Thus scoring is an example of a crosscutting concern. To implement scoring in the Compose language, we divide the implementation into two parts. The first part is a Compose concern definition stating which filter modules to superimpose. Listing 2.2 shows an example Compose concern definition of scoring. This concern definition is called DynamicScoring (line 1) and contains two parts. The first part is the declaration of a filter module called dynamicscoring (lines 2–11). This filter module contains one meta filter called score_filter (line 6). This filter intercepts five relevant calls and sends the message in a reified form to an instance of class Score. The final part of the concern definition is the superimposition part (lines 12–18). This part defines that the filter module dynamicscoring is to be superimposed on the classes World, Game and Main. The final part of the scoring concern is the so-called implementation part. This part is defined by a class Score. Listing 2.3 shows an example implementation of class Score. Instances of this M.D.W. van Oudheusden 17
  • 40. 2.4 Compose Architecture 1 concern DynamicScoring in pacman { 2 filtermodule dynamicscoring { 3 externals 4 score : pacman.Score = pacman.Score.instance(); 5 inputfilters 6 score_filter : Meta = {[*.eatFood] score.eatFood, 7 [*.eatGhost] score.eatGhost, 8 [*.eatVitamin] score.eatVitamin, 9 [*.gameInit] score.initScore, 10 [*.setForeground] score.setupLabel} 11 } 12 superimposition { 13 selectors 14 scoring = { C | isClassWithNameInList(C, [’pacman.World’, 15 ’pacman.Game’, ’pacman.Main’]) }; 16 filtermodules 17 scoring <- dynamicscoring; 18 } 19 } Listing 2.2: DynamicScoring concern in Compose class receive the messages sent by score_filter and subsequently perform the events related to the scoring concern. In this way, all scoring events are encapsulated in one class and one Compose concern definition. 2.3.2.2 Implementation of Dynamic Strategy The last system requirement that we need to implement is the dynamic strategy of ghosts. This means that a ghost should, depending on the state of pacman, hunt or flee from pacman. We can implement this concern by using the strategy design pattern. However, in this way, we need to modify the existing code. This is not the case when we use Compose dispatch filters. Listing 2.4 demonstrates this. This concern uses dispatch filters to intercept calls to method RandomStrategy.getNextMove and redirect them to either StalkerStrategy.getNextMove or FleeStrategy.getNextMove. If pacman is not evil, the intercepted call matches the first filter, which dispatches the inter- cepted call to method StalkerStrategy.getNextMove (line 9). Otherwise, the intercepted call matches the second filter, which dispatches the intercepted call to method FleeStrategy. getNextMove (line 11). 2.4 Compose Architecture An overview of the Compose architecture is illustrated in Figure 2.3. The Compose archi- tecture can be divided in four layers [60]: IDE, compile time, adaptation, and runtime. 18 Automatic Derivation of Semantic Properties in .NET
  • 41. 2.4 Compose Architecture 1 import Composestar.Runtime.FLIRT.message.*; 2 import java.awt.*; 3 4 public class Score 5 { 6 private int score = -100; 7 private static Score theScore = null; 8 private Label label = new java.awt.Label("Score: 0"); 9 10 private Score() {} 11 12 public static Score instance() { 13 if(theScore == null) { 14 theScore = new Score(); 15 } 16 return theScore; 17 } 18 19 public void initScore(ReifiedMessage rm) { 20 this.score = 0; 21 label.setText("Score: "+score); 22 } 23 24 public void eatGhost(ReifiedMessage rm) { 25 score += 25; 26 label.setText("Score: "+score); 27 } 28 29 public void eatVitamin(ReifiedMessage rm) { 30 score += 15; 31 label.setText("Score: "+score); 32 } 33 34 public void eatFood(ReifiedMessage rm) { 35 score += 5; 36 label.setText("Score: "+score); 37 } 38 39 public void setupLabel(ReifiedMessage rm) { 40 rm.proceed(); 41 label = new Label("Score: 0"); 42 label.setSize(15*View.BLOCKSIZE+20,15*View.BLOCKSIZE); 43 Main main = (Main)Composestar.Runtime.FLIRT.message.MessageInfo 44 .getMessageInfo().getTarget(); 45 main.add(label,BorderLayout.SOUTH); 46 } 47 } Listing 2.3: Implementation of class Score M.D.W. van Oudheusden 19
  • 42. 2.4 Compose Architecture 1 concern DynamicStrategy in pacman { 2 filtermodule dynamicstrategy { 3 internals 4 stalk_strategy : pacman.Strategies.StalkerStrategy; 5 flee_strategy : pacman.Strategies.FleeStrategy; 6 conditions 7 pacmanIsEvil : pacman.Pacman.isEvil(); 8 inputfilters 9 stalker_filter : Dispatch = {!pacmanIsEvil => 10 [*.getNextMove] stalk_strategy.getNextMove}; 11 flee_filter : Dispatch = { 12 [*.getNextMove] flee_strategy.getNextMove} 13 } 14 superimposition { 15 selectors 16 random = { C | isClassWithName(C, 17 ’pacman.Strategies.RandomStrategy’) }; 18 filtermodules 19 random <- dynamicstrategy; 20 } 21 } Listing 2.4: DynamicStrategy concern in Compose Figure 2.3: Overview of the Compose architecture 20 Automatic Derivation of Semantic Properties in .NET
  • 43. 2.4 Compose Architecture 2.4.1 Integrated Development Environment Some of the purposes of the Integrated Development Environment (IDE) layer are to interface with the native IDE and to create a build configuration. In the build configuration it is specified which source files and settings are required to build a Compose application. After creating the build configuration the compile time is started. The creation of a build configuration can be done manually or by using a plug-in. Examples of these plug-ins are the Visual Studio add-in for Compose /.NET and the Eclipse plug-in for Compose /J and Compose /C. 2.4.2 Compile Time The compile time layer is platform independent and reasons about the correctness of the com- position filter implementation with respect to the program which allows the target program to be build by the adaptation. The compile time ‘pre-processes’ the composition filter specifications by parsing the specifica- tion, resolving the references, and checking its consistency. To provide an extensible architec- ture to facilitate this process a blackboard architecture is chosen. This means that the compile time uses a general knowledgebase that is called the ‘repository’. This knowledgebase contains the structure and metadata of the program which different modules can execute their activities on. Examples of modules within analysis and validation are the three modules SANE, LOLA and FILTH. These three modules are responsible for (some) of the analysis and validation of the super imposition and its selectors. 2.4.3 Adaptation The adaptation layer consists of the program manipulation, harvester, and code generator. These components connect the platform independent compile time to the target platform. The harvester is responsible for gathering the structure and the annotations within the source pro- gram and adding this information to the knowledgebase. The code generation generates a reduced copy of the knowledgebase and the weaving specification. This weaving specification is then used by the weaver contained by the program manipulation to weave in the calls to the runtime into the target program. The end result of the adaptation the target program which interfaces wit the runtime. 2.4.4 Runtime The runtime layer is responsible for executing the concern code at the join points. It is acti- vated at the join points by function calls that are woven in by the weaver. A reduced copy of the knowledgebase containing the necessary information for filter evaluation and execution is enclosed with the runtime. When the function is filtered the filter is evaluated. Depending on if the the condition part evaluates to true, and the matching part matches the accept or reject behavior of the filter is executed. The runtime also facilitates the debugging of the composition filter implementations. M.D.W. van Oudheusden 21
  • 44. 2.5 Platforms 2.5 Platforms Compose can in theory be applied to any programming language given certain assumptions are met. Currently Compose has three platforms. 2.5.1 Java Compose /J, the Java platform of Compose , uses different compiling and weaving tools then the other platforms. For the use of Compose /J an Eclipse plug-in is provided. 2.5.2 C Compose /C, the C platform of Compose , is different from its Java and .NET counterparts because it does not have a runtime interpreter. This implies that the filters implementation of Compose /C uses generated composition filter code that is weaved directly in the source code. Because the programming language C does not have the concept of objects the reasoning within Compose is based on sets of functions. Like the Java platform, Compose /C provides a plug-in for Eclipse. 2.5.3 .NET The .NET platform called Compose /.NET of Compose is the oldest implementation of Compose . Because Compose /.NET works with CIL code, it is programming language inde- pendent as long as the programming language can be compiled to CIL code. The .NET platform uses a Visual Studio add-in for ease of development. 2.6 Features Specific to Compose The Composition Filters approach uses a restricted (pattern matching) language to define fil- ters. This language makes it possible to reason about the semantics of the concern. Compose offers three features that use this possibility, which originate in more control and correctness over an application under construction. These features are: Ordering of filter modules It is possible to specify how the superimposition of filter modules should be ordered. Ordering constraints can be specified in a fixed, conditional, or partial manner. A fixed ordering can be calculated exactly, whereas a conditional ordering is dependent on the re- sult of filter execution and therefore evaluated at runtime. When there are multiple valid orderings of filtermodules on a join point, partial ordering constraints can be applied to reduce this number. These constraints can be declared in the concern definition; Filter consistency checking When superimposition is applied, Compose is able to detect if the ordering and con- junction of filters creates a conflict. For example, imagine a set of filters where the first filter only evaluates method m and another filter only evaluates methods a and b. In this 22 Automatic Derivation of Semantic Properties in .NET
  • 45. 2.6 Features Specific to Compose case the latter filter is only reached with method m; this is consequently rejected and as a result the superimposition may never be executed. There are different scenarios that lead to these kinds of problems, e.g., conditions that exclude each other; Reason about semantic problems When multiple pieces of advice are added to the same join point, Compose can reason about problems that may occur. An example of such a conflict is the situation where a real-time filter is followed by a wait filter. Because the wait filter can wait indefinitely, the real-time property imposed by the real-time filter may be violated. The above mentioned conflict analyzers all work on the assumption that the behavior of every filter is well-defined. This is not the case for the meta filter, its user-undefined, and therefore unpredictable, behavior poses a problem to the analysis tools. Furthermore, Compose is extended with features that enhance the usability. These features are briefly described below: Integrated Development Environment support The Compose implementations all have a IDE plug-in; Compose /.NET for Visual Stu- dio, Compose /J and Compose /C for Eclipse; Debugging support The debugger shows the flow of messages through the filters. It is possible to place break- points to view the state of the filters; Incremental building process When a project is build and not all the modules are changed, incremental building saves time. Some language properties of Compose can also be seen as features, being: Language independent concerns A Compose concern can be used for all the Compose platforms, because the composi- tion filters approach is language independent; Reusable concerns The concerns are easy to reuse, through the dynamic filter modules and the selector lan- guage; Expressive selector language Program elements of an implementation language can be used to select a set of objects to superimpose on; Support for annotations Using the selector, annotations can be woven at program elements. At the moment anno- tations can be used for superimposition. M.D.W. van Oudheusden 23
  • 46. CHAPTER 3 Introduction to the .NET Framework This chapter gives an introduction to the .NET Framework of Microsoft. First, the architecture of the .NET Framework is introduced. This section includes terms like the Common Language Runtime, the .NET Class Library, the Common Language Infrastructure and the Intermediate Language. These are discussed in more detail in the sections following the architecture. 3.1 Introduction Microsoft defines [57] .NET as follows; “.NET is the Microsoft Web services strategy to con- nect information, people, systems, and devices through software.”. There are different .NET technologies in various Microsoft products providing the capabilities to create solutions using web services. Web services are small, reusable applications that help computers from many different operating system platforms work together by exchanging messages. Based on indus- try standards like XML (Extensible Markup Language), SOAP (Simple Object Access Protocol), and WSDL (Web Services Description Language) they provide a platform and language inde- pendent way to communicate. Microsoft products, such as Windows Server System (providing web services) or Office Sys- tem (using web services) are some of the .NET technologies. The technology described in this chapter is the .NET Framework. Together with Visual Studio, an integrated development envi- ronment, they provide the developer tools to create programs for .NET. Many companies are largely dependent on the .NET Framework, but need or want to use AOP. Currently there is no direct support for this in the Framework. The Compose /.NET project is addressing these needs with its implementation of the Composition Filters approach for the .NET Framework. This specific Compose version for .NET has two main goals. First, it combines the .NET Framework with AOP through Composition Filters. Second, Compose offers superimposition 24
  • 47. 3.2 Architecture of the .NET Framework in a language independent manner. The .NET Framework supports multiple languages and is, as such, suitable for this purpose. Composition Filters are an extension of the object-oriented mechanism as offered by .NET, hence the implementation is not restricted to any specific object- oriented language. 3.2 Architecture of the .NET Framework The .NET Framework is Microsoft’s platform for building, deploying, and running Web Ser- vices and applications. It is designed from scratch and has a consistent API providing support for component-based programs and Internet programming. This new Application Program- ming Interface (API) has become an integral component of Windows. The .NET Framework was designed to fulfill the following objectives [54]: Consistency Allow object code to be stored and executed locally, executed locally but Internet- distributed, or executed remotely and to make the developer experience consistent across a wide variety of types of applications, such as Windows-based applications and Web- based applications; Operability The ease of operation is enhanced by minimizing versioning conflicts and providing bet- ter software deployment support; Security All the code is executed safely, including code created by an unknown or semi-trusted third party; Efficiency The .NET Framework compiles applications to machine code before running thus elimi- nating the performance problems of scripted or interpreted environments; Interoperability Code based on the .NET Framework can integrate with other code because all communi- cation is built on industry standards. The .NET Framework consists of two main components [54]: the Common Language Run- time (CLR, simply called the .NET Runtime or Runtime for short) and the .NET Framework Class Library (FCL). The CLR is the foundation of the .NET Framework, executing the code and providing the core services such as memory management, thread management and ex- ception handling. The CLR is described in more detail in Section 3.3. The class library, the other main component of the .NET Framework, is a comprehensive, object-oriented collection of reusable types that can be used to develop applications ranging from traditional command- line or graphical user interface (GUI) applications to applications such as Web Forms and XML Web services. Section 3.5 describes the class libraries in more detail. The code run by the runtime is in a format called Common Intermediate Language (CIL), fur- ther explained in Section 3.6. The Common Language Infrastructure (CLI) is an open specifi- cation that describes the executable code and runtime environment that form the core of the Microsoft .NET Framework. Section 3.4 tells more about this specification. Figure 3.1 shows the relationship of the .NET Framework to other applications and to the com- plete system. The two parts, the class library and the runtime, are managed, i.e., applications M.D.W. van Oudheusden 25
  • 48. 3.2 Architecture of the .NET Framework Figure 3.1: Context of the .NET Framework (Modified) [54] managed during execution. The operating system is in the core, managed and unmanaged applications operate on the hardware. The runtime can us other object libraries and the class library, but the other libraries can use the same class library them self. Besides the Framework, Microsoft also provides a developer tool called the Visual Studio. This is an IDE with functionality across a wide range of areas allowing developers to build appli- cations with decreased development time in comparison with developing applications using command line compilers. 3.2.1 Version 2.0 of .NET In November 2005, Microsoft released a successor of the .NET Framework. Major changes are the support for generics, the addition of nullable types, 64 bit support, improvements in the garbage collector, new security features and more network functionality. Generics make it possible to declare and define classes, structures, interfaces, methods and del- egates with unspecified or generic type parameters instead of specific types. When the generic is used, the actual type is specified. This allows for type-safety at compile-time. Without gener- ics, the use of casting or boxing and unboxing decreases performance. By using a generic type, the risks and costs of these operations is reduced. Nullable types allow a value type to have a normal value or a null value. This null value can be useful for indicating that a variable has no defined value because the information is not currently available. Besides changes in the Framework, there are also improvements in the four main Microsoft .NET programming languages (C#, VB.NET, J# and C++). The language elements are now almost equal for all languages. For instance, additions to the Visual Basic language are the support for unsigned values and new operators and additions to the C# language include the ability to define anonymous methods thus eliminating the need to create a separate method. 26 Automatic Derivation of Semantic Properties in .NET
  • 49. 3.3 Common Language Runtime A new Visual Studio 2005 edition was released to support the new Framework and functional- ities to create various types of applications. 3.3 Common Language Runtime The Common Language Runtime executes code and provides core services. These core services are memory management, thread execution, code safety verification and compilation. Apart from providing services, the CLR also enforces code access security and code robustness. Code access security is enforced by providing varying degrees of trust to components, based on a number of factors, e.g., the origin of a component. This way, a managed component might or might not be able to perform sensitive functions, like file-access or registry-access. By im- plementing a strict type-and-code-verification infrastructure, called the Common Type System (CTS), the CLR enforces code robustness. Basically there are two types of code; Managed Managed code is code, which has its memory handled and its types validated at execu- tion by the CLR. It has to conform to the Common Type Specification (CTS Section 3.4). If interoperability with components written in other languages is required, managed code has to conform to an even more strict set of specifications, the Common Language Spec- ification (CLS). The code is run by the CLR and is typically stored in an intermediate language format. This platform independent intermediate language is officially known as Common Intermediate Language (CIL Section 3.6) [82]. Unmanaged Unmanaged code is not managed by the CLR. It is stored in the native machine language and is not run by the runtime but directly by the processor. All language compilers (targeting the CLR) generate managed code (CIL) that conforms to the CTS. At runtime, the CLR is responsible for generating platform specific code, which can actually be executed on the target platform. Compiling from CIL to the native machine language of the platform is executed by the just-in-time (JIT) compiler. Because of this language indepen- dent layer it allows the development of CLRs for any platform, creating a true interoperability infrastructure [82]. The .NET Runtime from Microsoft is actually a specific CLR implementa- tion for the Windows platform. Microsoft has released the .NET Compact Framework especially for devices such as personal digital assistants (PDAs) and mobile phones. The .NET Com- pact Framework contains a subset of the normal .NET Framework and allows .NET developer to write mobile applications. Components can be exchanged and web services can be used so an easier interoperability between mobile devices and workstations/servers can be imple- mented [56]. At the time of writing, the .NET Framework is the only advanced Common Language Infras- tructure (CLI) implementation available. A shared-source1 implementation of the CLI for re- search and teaching purposes was made available by Microsoft in 2002 under the name Ro- tor [73]. In 2006 Microsoft released an updated version of Rotor for the .NET platform version two. Also Ximian is working on an open source implementation of the CLI under the name 1 Only non-commercial purposes are allowed. M.D.W. van Oudheusden 27
  • 50. 3.4 Common Language Infrastructure Mono1, targeting both Unix/Linux and Windows platforms. Another, somewhat different ap- proach, is called Plataforma.NET2 and aims to be a hardware implementation of the CLR, so that CIL code can be run natively. 3.3.1 Java VM vs .NET CLR There are many similarities between Java and .NET technology. This is not strange, because both products serve the same market. Both Java and .NET are based on a runtime environment and an extensive development frame- work. These development frameworks provide largely the same functionality for both Java and .NET. The most obvious difference between them is lack of language independence in Java. While Java’s strategy is ‘One language for all platforms’ the .NET philosophy is ‘All lan- guages on one platform’. However these philosophies are not as strict as they seem. As noted in Section 3.5 there is no technical obstacle for other platforms to implement the .NET Frame- work. There are compilers for non-Java languages like Jython (Python) [45] and WebADA [1] available for the JVM. Thus, the JVM in its current state, has difficulties supporting such a vast array of languages as the CLR. However, the multiple language support in .NET is not optimal and has been the target of some criticism. Although the JVM and the CLR provide the same basic features they differ in some ways. While both CLR and the modern JVM use JIT (Just In Time) compilation the CLR can directly access native functions. This means that with the JVM an indirect mapping is needed to interface directly with the operating system. 3.4 Common Language Infrastructure The entire CLI has been documented, standardized and approved [43] by the European associ- ation for standardizing information and communication systems, Ecma International3. Benefits of this CLI for developers and end-users are: • Most high level programming languages can easily be mapped onto the Common Type System (CTS); • The same application will run on different CLI implementations; • Cross-programming language integration, if the code strictly conforms to the Common Language Specification (CLS); • Different CLI implementations can communicate with each other, providing applications with easy cross-platform communication means. This interoperability and portability is, for instance, achieved by using a standardized meta data and intermediate language (CIL) scheme as the storage and distribution format for appli- cations. In other words, (almost) any programming language can be mapped to CIL, which in turn can be mapped to any native machine language. 1 http://www.go-mono.com/ 2 http://personals.ac.upc.edu/enric/PFC/Plataforma.NET/p.net.html 3 An European industry association founded in 1961 and dedicated to the standardization of Information and Communication Technology (ICT) Systems. Their website can be found at http://www.ecma-international. org/. 28 Automatic Derivation of Semantic Properties in .NET
  • 51. 3.5 Framework Class Library Figure 3.2: Relationships in the CTS The Common Language Specification is a subset of the Common Type System, and defines the basic set of language features that all .NET languages should adhere to. In this way, the CLS helps to enhance and ensure language interoperability by defining a set of features that are available in a wide variety of languages. The CLS was designed to include all the language constructs that are commonly needed by developers (e.g., naming conventions, common prim- itive types), but no more than most languages are able to support [55]. Figure 3.2 shows the relationships between the CTS, the CLS, and the types available in C++ and C#. In this way the standardized CLI provides, in theory1, a true cross-language and cross-platform development and runtime environment. To attract a large number of developers for the .NET Framework, Microsoft has released CIL compilers for C++, C#, J#, and VB.NET. In addition, third-party vendors and open-source projects also released compilers targeting the .NET Framework, such as Delphi.NET, Perl.NET, IronPython, and Eiffel.NET. These programming languages cover a wide-range of different programming paradigms, such as classic imperative, object-oriented, scripting, and declara- tive languages. This wide coverage demonstrates the power of the standardized CLI. Figure 3.3 shows the relationships between all the main components of the CLI. The top of the figure shows the different programming languages with compiler support for the CLI. Because the compiled code is stored and distributed in the Common Intermediate Language format, the code can run on any CLR. For cross-language usage this code has to comply with the CLS. Any application can use the class library (the FCL) for common and specialized programming tasks. 3.5 Framework Class Library The .NET Framework class library is a comprehensive collection of object-oriented reusable types for the CLR. This library is the foundation on which all the .NET applications are built. It is object oriented and provides integration of third-party components with the classes in the .NET Framework. Developers can use components provided by the .NET Framework, other 1 Unfortunately Microsoft did not submit all the framework classes for approval and at the time of writing only the .NET Framework implementation is stable. M.D.W. van Oudheusden 29
  • 52. 3.5 Framework Class Library Figure 3.3: Main components of the CLI and their relationships. The right hand side of the figure shows the difference between managed code and unmanaged code. 30 Automatic Derivation of Semantic Properties in .NET
  • 53. 3.6 Common Intermediate Language Figure 3.4: From source code to machine code developers and their own components. A wide range of common programming tasks (e.g., string management, data collection, reflection, graphics, database connectivity or file access) can be accomplished easily by using the class library. Also a great number of specialized de- velopment tasks are extensively supported, like: • Console applications; • Windows GUI applications (Windows Forms); • Web applications (Web Forms); • XML Web services; • Windows services. All the types in this framework are CLS compliant and can therefore be used from any pro- gramming language whose compiler conforms to the Common Language Specification (CLS). 3.6 Common Intermediate Language The Common Intermediate Language (CIL) has already been mentioned briefly in the sections before, but this section will describe the IL in more detail. All the languages targeting the .NET Framework compile to this CIL (see Figure 3.4). M.D.W. van Oudheusden 31
  • 54. 3.6 Common Intermediate Language A .NET compiler generates a managed module which is an executable designed to be run by the CLR [65]. There are four main elements inside a managed module: • A Windows Portable Executable (PE) file header; • A CLR header containing important information about the module, such as the location of its CIL and metadata; • Metadata describing everything inside the module and its external dependencies; • The CIL instructions generated from the source code. The Portable Executable file header allows the user to start the executable. This small piece of code will initiate the just-in-time compiler which compiles the CIL instructions to native code when needed, while using the metadata for extra information about the program. This native code is machine dependent while the original IL code is still machine independent. This way the same IL code can be JIT-compiled and executed on any supported architecture. The CLR cannot use the managed module directly but needs an assembly. An assembly is the fundamental unit of security, versioning, and deployment in the .NET Framework and is a collection of one or more files grouped together to form a logical unit [65]. Besides managed modules inside an assembly, it is also possible to include resources like im- ages or text. A manifest file is contained in the assembly describing not only the name, culture and version of the assembly but also the references to other files in the assembly and security requests. The CIL is an object oriented assembly language with around 100 different instructions called OpCodes. It is stack-based, meaning objects are placed on an evaluation stack before the ex- ecution of an operation, and when applicable, the result can be found on the stack after the operation. For instance, when adding two numbers, first those numbers have to be placed onto the stack, second the add operation is called and finally the result can be retrieved from the stack. 1 .assembly AddExample {} 2 3 .method static public void main() il managed 4 { 5 .entrypoint // entry point of the application 6 .maxstack 2 7 8 ldc.i4 3 // Place a 32-bit (i4) 3 onto the stack 9 ldc.i4 7 // Place a 32-bit (i4) 7 onto the stack 10 11 add // Add the two and 12 // leave the sum on the stack 13 14 // Call static System.Console.Writeline function 15 // (function pops integer from the stack) 16 call void [mscorlib]System.Console::WriteLine(int32) 17 18 ret 19 } Listing 3.1: Adding example in IL code To illustrate how to create a .NET program in IL code we use the previous example of adding two numbers and show the result. In Listing 3.1 a new assembly is created with the name AddExample. In this assembly a function main is declared as the starting point (entrypoint) 32 Automatic Derivation of Semantic Properties in .NET
  • 55. 3.6 Common Intermediate Language of this assembly. The maxstack command indicates there can be a maximum of two objects on the stack and this is enough for the example method. Next, the values 3 and 7 are placed onto the stack. The add operation is called and the results stays on the stack. The method WriteLine from the .NET Framework Class Library is called. This method resides inside the Console class placed in the System assembly. It expects one parameter with a int32 as its type that will be retrieved from the stack. The call operation will transfer the control flow to this method passing along the parameters as objects on the stack. The WriteLine method does not return a value. The ret operation returns the control flow from the main method to the calling method, in this case the runtime. This will exit the program. To be able to run this example, we need to compile the IL code to bytecode where each OpCode is represented as one byte. To compile this example, save it as a text file and run the ILASM compiler with as parameter the filename. This will produce an executable runnable on all the platforms where the .NET Framework is installed. This example was written directly in IL code, but we could have used a higher level language such as C# or VB.NET. For instance, the same example in C# code is shown in Listing 3.2 and the VB.NET version is listed in Listing 3.3. When this code is compiled to IL, it will look like the code in Listing 3.1. 1 public static void main() 2 { 3 Console.WriteLine((int) (3 + 7)); 4 } Listing 3.2: Adding example in the C# language 1 Public Shared Sub main() 2 Console.WriteLine(CType((3 + 7), Integer)) 3 End Sub Listing 3.3: Adding example in the VB.NET language M.D.W. van Oudheusden 33
  • 56. CHAPTER 4 Motivation This chapter describes the motivation for designing and implementing a system for the automatic derivation of semantic properties in .NET languages. The current state of the Compose /.NET project is explained in the first section. How this system can be extended is discussed in the second section. The last section mentions the general design goals. 4.1 Current State of Compose /.NET The Compose /.NET project offers aspect-oriented programming for the Microsoft .NET Framework through the composition filters model. An introduction to Compose is given in Chapter 2 and information about the .NET Framework can be found in Chapter 3. Most of the information discussed below can also be applied to other Aspect Oriented Programming (AOP) implementations. 4.1.1 Selecting Match Points With composition filters, the incoming and outgoing messages on an object can be intercepted through the use of input filters and output filters. A filter has three parts: the filter identifier, the filter type, and one or more filter elements. The filter element exist out of an optional condition part, a matching part, and a substitution part. When a filter is evaluated, the matching part is checked with the current message. A filter matches when both the condition as the matching provide the boolean value true and at that point the message gets substituted with the values of the substitution part. The filters are superimposed on classes with a filter module binding. This binding has a se- lection of objects on one side and a filter module on the other side. The selection is made with a selector definition in the form of a selector language, based on the logical programming 34
  • 57. 4.1 Current State of Compose /.NET language Prolog. By using elements like isClassWithNameInList, isNamespaceWithName or isMethod the developer can specify the points in the code the filter applies to. This selection is based on syntactical properties, like naming and coding conventions or struc- tural properties, such as coding patterns and hierarchical relations. This approach is used in almost all the current AOSD techniques and has the following problems [78, 60, 3, 13]: • Coding conventions are not always used or used incorrectly. There are multiple reasons for this; the complexity and evolution of the application, refactoring of code or the lack of documentation. The pointcut definitions are fragile, changes in the code can easily break pointcut semantics, a problem which is hard to detect [48]; • Method names are sometimes extended to be used as an identifier for join points. To provide a correct naming convention, they can become to long and this leads to name explosion; • Using specific naming conventions for the use of identifying join points violates the in- tention revealing naming rule in which methods names should represent the intention of the method code. The result of these problems is that it is by no means certain that the selected join point is the one intended to be found. It is also possible that some join points are not selected, while they should be selected; To illustrate the use of naming conventions, consider the following example. In most pro- gramming languages there are Get and Set methods allowing another object to read from and write to a private variable or field. Examples of such a getter and setter method are given in Listing 4.1. 1 private string _stringValue; 2 3 public string GetStringValue 4 { 5 return _stringValue; 6 } 7 8 public void SetStringValue(string value) 9 { 10 _stringValue = value; 11 } Listing 4.1: Getter and Setter examples in C# .NET To select all the methods setting a value, as in assigning a value to a variable, we can use a Set* pointcut. This will select all the methods beginning with the word Set. However, this will also match any methods called, for instance, Setup, Settings or Settle. In addition, methods actually assigning a value, but not having a name starting with the word Set are not selected. On the other hand, we might find Set methods with an implementation part doing something completely different than actually setting a value. In this case, the selection is performed on the syntactical level instead of the semantical level. There is no knowledge about the actual implementation and the assumed purpose of the method is retrieved by using (parts of) the signature, a unique identifier of the method. Us- ing coding and naming conventions to match points in the code does not give the best results. There are some possible solutions to this problem [35]; M.D.W. van Oudheusden 35
  • 58. 4.1 Current State of Compose /.NET • Refactor the original code so coding and naming conventions can be used to define as- pects more correctly. However, refactoring for the sake of aspects is a bad idea and should actually only be performed to increase the quality of the code. Furthermore, the original source code should be, to a degree, a black box to the aspect designer and refactoring violates the goal of separation of concerns and AOP; • Use a list to enumerate all the join points by name. This requires knowledge about the source program and can lead to long enumerations in large software systems. This tech- nique is also not robust to changes in the original code; • Pattern matching, as used in Compose , provides more possibilities. Using wild cards and structural conditions (like is in class or has interface) the selector part is more robust. There is still a great dependency on naming conventions as shown in the previous exam- ple; • By annotating methods with special tags, a developer can provide more information about the intended behavior of the method. Naming conventions do not have to be used, but the major drawback is the necessity to place annotations in the source code. The main problem is the use of structure based properties and syntactic conventions [3]. The selection of join points should be applied based on the behavior of a method and not on the name of the method. 4.1.2 Program Analysis Compose has some basic information about the source of the program. Information about the types, their relations to other types and properties of these types are collected. This is all syntactical information, i.e. it describes the structure of the program. There is almost no information about the behavior of the program. Two methods can use the same resources without any problems, but might give resource conflicts when an aspect is used. If a condition is used by a concern, can this condition be changed by other pieces of code? Are there any side effects while using an aspect on a method [52]? There is a partial solution to these questions in the form of SECRET (see [24] and [72] for more information). This module in Compose reasons about possible semantic conflicts due to the composition of filter modules and it analyzes the resource usage of the filters. One type of filter, the meta filter, passes a ReifiedMessage object to the selected method as a parameter. The object that receives the meta message can observe and manipulate the message, then re-activate its execution. This can lead to unexpected behavior of the ReifiedMessage and as a result to possible conflicts between filters. A developer can annotate methods, using this ReifiedMessage, with a textual description of the intended behavior of the message. The SECRET module uses this information to detect possible conflicts between aspects. A major requirement is the need for the developer to specify the behavior beforehand inside the code. If there is semantical information about the code in the Compose repository, besides the cur- rently available syntactical data, then more extensive code analysis can be performed to detect conflicts, side effects, and so on. 36 Automatic Derivation of Semantic Properties in .NET
  • 59. 4.2 Providing Semantical Information 4.1.3 Fine Grained Join Points The current Composition Filter model in Compose works at the level of message call events. It could be interesting to expand this model to a statement and expression level model [77]. This way it is possible to select points inside the code as matching points for the filters. Possible applications of this technique are code coverage analysis 1 [66] or code optimization [36]. Currently Compose does not support this type of fine grained join points, but operates on the level of object interfaces. There is however work in progress to support this at a certain level [16]. An issue here is the need for (semantical) information about the code itself, the instructions or statements. Compose does not have the necessary information about the mes- sage body implementation. 4.2 Providing Semantical Information The three main issues described in the previous section (match point selection, program analy- sis and fine grained join points) all suffer from the same problem; there is almost no semantical information available. The behavior of the source code is not known. With more informa- tion about the meaning of the code it is possible to solve some of the shortcomings mentioned before. One of the solutions used by Compose is the use of annotations to describe the semantics of a function. There are three major problems with this approach; • The developer must specify the semantics manually for each function. A time consuming process and easely skipped because it is not enforced. • The current annotation representation is not powerful. For instance, it lacks the ability to provide control flow information of the instructions inside the function. • It is possible the annotations are not consistent with the actual implementation due to an incorrect description of the semantics or by changes to the code. This assignment is called the automatic derivation of semantic properties in .NET and is an attempt to extract semantical information from a .NET assembly using existing tools. Thus providing a way to detect possible conflicts and give more information to Compose . Semantical information can not only be used for Compose but also by other applications wanting to do source code analysis. For example, finding design patterns in the source code, reverse engineer design documentation [69], generating pre- and postconditions [53], verifying software contracts [9], checking behavioral subtyping [30], or any other type of statical analy- sis. 4.3 General Design Goals As stated in the previous section, the goal is to design and implement a system to automatically derive semantic properties from .NET assemblies. To get an idea what type of behavior we are 1 Code coverage analysis is a technique to determine whether a set of test cases satisfy an adequacy criterion. M.D.W. van Oudheusden 37
  • 60. 4.3 General Design Goals interested in we first have to look at the different type of semantics that can occur in typical programming languages (see Chapter 5). The context of this assignment is the Compose /NET platform, so the sources to analyze are in the .NET format called Intermediate Language. In Chapter 6 a detailed description is given of the IL and how the semantical elements, described in Chapter 5, are represented in this format. Although the assignment is for the .NET platform it would be an advantage if the semanti- cal representation is language independent so it can also be used with other object-oriented languages like Java or C++. One requirement however is the need to conserve the type infor- mation. This will discussed in the design chapter (Chapter 7). After a code analysis, the semantical information should be stored in some sort of model. This model must contain enough information to reason about the behavior of the original program. Not only what it is suppose to do, but also when it performs certain actions. For this, flow information, like control flow, is important to save and this will threated in Chapter 5. Storing the data in a metamodel is not sufficient. There must be a system to query this model for the required information. The possible options and the implementation can be found in Section 7.4. Examples of the use of this search mechanism are mentioned in Chapter 8. 38 Automatic Derivation of Semantic Properties in .NET
  • 61. CHAPTER 5 Semantics Before a model can be created to store semantical information about a program we must first understand what semantic is and what type of behavior can be found in the sources of a pro- gram. The first section provides a definition of semantic. The next sections describes how semantic is represented in source code. 5.1 What is Semantics Semantics is the study of meaning [15]. It comes from the Greek σηµαντικoς, semantikos, which stands for significant meaning where sema is sign. Semantics is the branch of semiotics, the study of signs and meaning [20]. The other two branches of semiotics are syntactics (the arrangement of signs) and pragmatics (the relationship between the speaker and the signs). In discussing natural and computer languages, the distinction is sometimes made between syntax (for ex- ample, the order of computer commands) and semantics (which functions are requested in the command). Syntax, coming from the Greek words συν, together, and ταξις, sequence or order, is the study of the formation of sentences and the relationship of their component parts. Looking at computer languages, syntax is the grammar of the language statements where semantics is the meaning of these statements. The statements are the so called signs used in semiotics. Keep in mind that there is a distinct difference between syntax and semantics. Syntax is the relation between signs, and semantics is the meaning of these signs. For computer languages the syntax is very clearly defined as a grammar the developer has to use. This grammar is, for instance, used to create an abstract syntax tree (AST), in which each node of the tree represents a syntactic construct. The compiler uses this AST to create the actual program. Each element of the grammar has a certain semantic and the composition of those elements in a certain order forms the semantics of a program. 39
  • 62. 5.2 Semantics of Software 5.2 Semantics of Software Understanding a software product is difficult. Usually the behavior is described in the docu- mentation of the product but is often not complete or up-to-date [67]. There are tools which aid in program understanding, like debuggers, metric calculators, program visualization and animation techniques. Basically these tools reverse engineer the software and represent this at a higher level of abstraction than that of the information which is directly extracted from the code [68]. They differ in the way how the data is retrieved, a higher level model is created and the information is presented. 5.2.1 Static and Dynamic Program Analysis The collection of relevant data for an analyzer is performed by either static or dynamic analysis. It is possible to combine those two techniques to get more precise data [19]. In static program analysis, only the code of the software is considered without actually execut- ing the program built from this software. Usually the analysis is performed on the source code of the software, but other representations like object code or byte code can also be used. This type of analysis is suited to retrieve structural information, like the elements in the source such as the classes, methods, and so forth. Analyzing classes is hard, due to polymorphism and dynamic binding. Also information about the references to objects (pointers) are difficult to catch with static analysis. Conditional branches and iterations are only known at runtime, so the exact control flow information, the sequence of executed commands, is difficult to extract when the software is not executed. Most of the time there are multiple possible executing paths. Static analysis is limited due to the amount of data available about the structure of the source and the parse tree of the statements inside the methods of a class. Dynamic program analysis uses program execution traces to gather information about the soft- ware. This means the program is analyzed while it is running, usually by attaching a separate program to it, called a profiler. The runtime behavior retrieved, provides information about the actual usage of the program. The intended methods of classes are called and the types of the objects are known. It also provides timing and object instance information, and the real values of operands, which can not be retrieved using static program analysis. However, the gathered information is the result of certain user input and can change during a different program execution. Not all the paths in the control flow could be executed, thus information can be missing. Static and dynamic analysis can be combined so all the possible paths are known beforehand using static analysis and the runtime behavior can be analyzed using dynamic analysis [27]. For instance, static analysis is used to retrieve all possible control flow paths in the software that can be used to generate different input for the software at runtime. Dynamic analysis is then used to retrieve the information during runtime using the path information found by the static analysis. 40 Automatic Derivation of Semantic Properties in .NET
  • 63. 5.3 Semantical Statements 5.2.2 Software Models We can distinguish two models of software, namely structural and behavioral [32]. In the structural model the organization of the software is described, like inheritance and the rela- tions between components. The behavioral model describes how the elements in the software operate to carry out functions. Usually static analysis is used to get the structural model while dynamic analysis collects the behavioral model during an execution run. The system designed for this assignment tries to create a software model consisting of both structural and behavioral information using static analysis. The reason for using static instead of dynamic analysis is explained in the design chapter (see Chapter 7). Extracting a structural model is easier then extracting a behavioral model. We have to give meaning, semantics, to parts of the code without using dynamic analysis. This means we have to look at the statement level, in other words, the instructions inside the methods, to extract their meaning. Combining the behavior of the individual statements and the control flow could tell us more about the meaning of the complete method. 5.3 Semantical Statements To extract the behavior of a program we can start bottom-up, which means we have to look at the finest and lowest part of the program, the statements, before working our way to the whole program [49]. Statements are the instructions executed inside a method. There are various types of statements. Some return a value, we then call those statements expressions. Statements can operate on zero or more operands. An operand can, for instance, be a value, a memory address, a function name or a constant literal. A statement operating on only one operand is called unary. If the statement works on two operands, it is called binary. The next sections show the major generic kinds of statements. 5.3.1 Value Assignment A commonly used statement is the assignment of values. This can be a unary or a binary statement. With an assignment statement a value is assigned to a variable, a symbol denoting a quantity or symbolic representation. A value can be a single value, but also the result of an operation such as the binary operations adding, multiplying, subtracting, dividing, and so on. A unary assignment statement is the negative operation. In Listing 5.1 some examples are shown of assignments in the C# language. 1 a = 4; // Assign the value 4 to the variable a 2 b = a + 5; // Add 5 to the value of a and assign to b 3 s = "Hello World"; // Place the text in string s Listing 5.1: Assignment examples in C# Semantically, the assignment statement changes the state of variables. If there is an expression, like adding, the result of this expression is evaluated and assigned to the variable. A prior value, stored in the variable, is replaced by the new value. M.D.W. van Oudheusden 41
  • 64. 5.3 Semantical Statements 5.3.2 Comparison of Values Comparing two values is an operation which is frequently used for controlling the flow of the program. A simple example is conditional branching like the If...Then...Else construction, but also loops (do...while, do...until, while...loop constructions), for...next loops and switch opera- tions. Basically a condition is checked determining if the loop must be exited or continued. An example of some comparison statements can be seen in Listing 5.2. 1 if (x > 4) 2 { 3 // perform action when x has a value greater then 4 4 5 while (x < 10) 6 { 7 x = x + 1; 8 } 9 } 10 else 11 { 12 // else, perform another action 13 14 for (int i; i < 10; i++) 15 { 16 x = x + i; 17 } 18 } Listing 5.2: Comparison examples in C# A comparison always works on two values and the result is stored in a destination operand. This result is either true or false. There are different kinds of comparison as shown in Table 5.1. Description Sign Abbr. A is equal to B A = B EQ A is not equal to B A = B NE A is less then B A < B LT A is less or equal to B A ≤ B LTE A is greater then B A > B GT A is greater or equal to B A ≥ B GTE Table 5.1: Comparison operators Together with branching, discussed in the next section, the comparison statements are an im- portant element for the control flow. Based on the relation of one value to another value certain actions are performed. 5.3.3 Branching Statements Branching statements change the flow of control to another point in the code. This point is identified with a label, like a line number or a legal identifier. We can distinguish two types of branching; 42 Automatic Derivation of Semantic Properties in .NET
  • 65. 5.3 Semantical Statements Conditional A conditional branch occurs only if a certain condition expression evaluates to true. Unconditional With unconditional branching, the control flow is directly transfered to a new location without any conditions. In Listing 5.2, two branches are visible for the first condition (x > 4). If the value of x is greater then 4, it moves the control flow to the statements directly after the if statement. If x is equal or less then 4, the control flow is moved to the statements after the else statement. Typical unconditional branching commands are the continue or break statements which explic- itly jump to a location in the code without checking for a condition. These statements can, of course, exist inside another condition and are as such conditional. Branching is an important semantic because the flow of the program is controlled with these statements. Together with the conditions, branching makes it possible to use iteration state- ments, such as while, for, or foreach loops. 5.3.4 Method Calling Because branching only moves the control flow inside a method, we need a special statement to indicate the move of the control flow to another method. When this method has finished processing its statements, the flow will be returned to the calling method. In most program- ming languages it is possible to specify input for the new method in the form of parameters and the called method can also return a value. Inside the method, a special statement is available to return to the calling method. Usually this is the return statement and if the method should return a value, this value can be returned directly. 5.3.5 Exception Handling It can be necessary to apply exception handling to certain statement. When an exception is thrown by one of those guarded instructions, a special block of code can handle this exception. 1 try 2 { 3 x = 4 / y; 4 } 5 catch (Exception ex) 6 { 7 8 } Listing 5.3: Exception handling example in C# In Listing 5.3 the combined assignment and division statement are placed inside a guarded block. if, for instance, y is zero, a division by zero exception will occur. This exception will be handled by the statements inside the catch block. If there is no exception handling, the exceptions will be thrown upwards to the calling method until eventually the runtime itself. We are interested in this information because it provides insight in the capability of the code to handle exceptions. A division by zero and a not initialized value of x or y are the only exceptions M.D.W. van Oudheusden 43
  • 66. 5.3 Semantical Statements which can occur in example 5.3. However, all the exceptions are catched using the general Exception class. By using the information about the possible exceptions, we can chose to catch more specific types of exceptions instead of all the exceptions. 5.3.6 Instantiation The instantiation of a new object or variable can be important to detect. If an object is not instantiated its internal functionality can not be accessed. Not only an object can be created, also an array or a value type. A type indicates a set of values that have the same sort of generic meaning or intended purpose. In object-oriented languages, an object is an individual unit which is used as the basic building block of programs. An object is created from a class and is called an instance of that class. Most object-oriented languages divide types into reference types and value types. Value types are stored on the stack. The variable contains the data itself. Reference types consist of two parts. A reference or handle is stored on the stack. The object itself is stored on the heap, also called the managed heap in .NET languages [44]. Value types tend to be simply, like a character or a number, while reference types are more complex. Every object is a reference type and must be explicitly created by the developer using a special new statement. Value types can be accessed directly and do not need to be created, however they need to be initialized to a default value, e.g., 0. Usually this is handled automatically by the runtime. The creation of a new object or the (re)initialization of variables is also important to detect. If an object is not instantiated it can not be used and could generate errors are runtime. Knowing when a value has a certain value, even if this is the default value, can be interpreted as an assignment with a default value. 5.3.7 Type Checking Although types are directly related to a certain language, it is still important semantic infor- mation. Adding a string to an integer is probably syntactically correct, but semantically it is incorrect. We need to know what type of data we are talking about. A value being a string has as such a different meaning than when the value has a numeric type. Type checking is the process of verifying and enforcing the constraints of types. This can occur at compile time, called static checking, or at runtime, called dynamic checking. Usually both techniques are used. When a compiler performs static checking of the types in the source code, it is performing a semantical analysis. Semantical information is added to the parse tree and used to check for inconsistencies [38]. For the purpose of this assignment, we can distinguish two different kinds of semantical type information. Compile time At compile time, the types of all the variables must be known. When the analyzer de- signed for this assignment is run on the code, it knows the types of all the elements. 44 Automatic Derivation of Semantic Properties in .NET
  • 67. 5.4 Program Semantics Runtime During runtime, the type information of a variable can change. This is called type casting and we need to know the new type the variable will become. 5.3.8 Data Conversion As explained in the previous section, we need to store type information. However at runtime the type of a variable can change because of casting or (un)boxing. Boxing is a mechanism for converting value types to reference types. The value type is placed inside a box so it can be used when an object reference is needed [44]. Data conversions change the type and thus the meaning of a variable. As such, it is interesting semantical information we need to be able to reason about the contents of variables. 5.4 Program Semantics In the previous section the lowest level of the code, the statements, were described. Multiple statements are grouped together inside a method to perform a specific action. Sometimes a method has a set of input1 parameters to parameterize those actions, and possibly an output value (called return value) of some type. In object-oriented languages, a method resides in a class and is providing a service for this particular object. A method is used to perform some task of the object. For example; a class called Car can provide the services Accelerate and Brake to control the inner state of the Car object. Not only the class and methods tell something about the semantics of a program, also the rela- tions between the different classes provide added information about the behavior [69]. Com- ponents have to work together to execute the tasks of the complete program. Detecting and recognizing these interactions in the source code using static analysis is not an easy task. Be- cause of polymorphism, allowing a single definition to be used with different types of data, it is difficult to determine which method is actually executed at runtime. Inheritance, the ability to create new classes using existing classes, introduces the problem that the behavior of a subclass is not solely defined in the class itself, but spread over multiple classes. Semantically analyzing the source code of a program using static analysis techniques is thus a difficult process [81]. Gamma [31] says the following about the relation between run-time and compile-time: “An object-oriented program’s run-time structure often bears little resemblance to its code structure. The code structure is frozen at compile-time; it consists of classes in fixed inheritance relationships. A programs’ run-time structure consists of rapidly changing networks of communicating objects. In fact, the two structures are largely independent.” 1 Some languages also allow output parameters, where the storage location of the variable specified on the invocation is used. M.D.W. van Oudheusden 45
  • 68. 5.4 Program Semantics 5.4.1 Slicing If we cannot give a definite description of the semantics of the whole program, we can at least try to describe the behavior of the individual methods. A useful technique is called slicing, in- troduced by Weiser [84]. Slicing is used to highlight statements that are relevant to a particular computation and are as such semantically related [76]. Again, we can make a distinction be- tween static and dynamic slicing. With static slicing no assumptions are made, while dynamic slicing depends on specific test data. Slicing depends on the control and data flow of the statements inside a method. Control flow is the order in which the individual statements are executed. Data flow follows the trail of a data item, such as a variable, as it is created or handled by a program [29]. With slicing we can find out which statements contain variables that can be affected by another variable. This is called a backward static slice as it is computed by a backwards traversal of the statements beginning by the variable we are interested in. 5.4.2 Method Example If we want to create a method with a specific purpose, we usually specify this method first in some sort of formal requirements specification, a high level view of the functionality. For instance, we need a method called AssignWhenMoreThenOne with the following requirement: “Method AssignWhenMoreTheOne must assign a value of 1 to the global variable moreThenOne if its first parameter is greater then or equal to 2.”. This single sentence describing the method AssignWhenMoreThenOne can be split into multiple semantical elements: • reading the value of the first parameter; • reading of a constant value of 2; • comparison of two values; • the use of the greater then or equal to operator; • branching based on the comparison; • assignment of a value 1 to global variable moreThenOne if the condition holds. We can implement this method in various ways and in different programming languages as shown in Listing 5.4, Listing 5.5 and Listing 5.6. 1 public int moreThenOne; 2 3 public void AssignWhenMoreThenOne(int stockAmount) 4 { 5 if (stockAmount >= 2) 6 moreThenOne = 1; 7 } Listing 5.4: Method AssignWhenMoreThenOne in C# .NET 1 Public moreThenOne As Integer 2 3 Public Sub AssignWhenMoreThenOne(ByVal stockAmount As Integer) 4 If stockAmount >= 2 Then 5 moreThenOne = 1 6 End If 46 Automatic Derivation of Semantic Properties in .NET
  • 69. 5.4 Program Semantics 7 End Sub Listing 5.5: Method AssignWhenMoreThenOne in VB .NET 1 procedure Module1.AssignWhenMoreThenOne(stockAmount: Integer); 2 begin 3 if (stockAmount >= 2) then 4 Module1.moreThenOne := 1 5 end; Listing 5.6: Method AssignWhenMoreThenOne in Borland Delphi While the previous examples differ in syntax they have the same semantics. The C# and VB .NET examples both compile to the Common Intermediate Language as shown in List- ing 5.7. 1 .method public static void AssignWhenMoreThenOne(int32 stockAmount) cil managed 2 { 3 // Code Size: 21 byte(s) 4 .maxstack 2 5 .locals init (bool flag1) 6 7 L_0000: nop 8 L_0001: ldarg.0 9 L_0002: ldc.i4.2 10 L_0003: clt 11 L_0005: ldc.i4.0 12 L_0006: ceq 13 L_0008: stloc.0 14 L_0009: ldloc.0 15 L_000a: brfalse.s L_0012 16 L_000c: ldc.i4.1 17 L_000d: stsfld int32 ConsoleApplication2.Module1::moreThenOne 18 L_0012: nop 19 L_0013: nop 20 L_0014: ret 21 } Listing 5.7: Method AssignWhenMoreThenOne in Common Intermediate Language From this IL code it is still possible to determine the semantics we mentioned before. The ldarg.0 loads the first parameter on the stack; the ldc.i4.2 OpCode puts the value of 2 on the stack. Both values are used by the compare less then (clt) assignment where the result is also put on the stack. A zero value is loaded (ldc.i4.0) and a compare for equal is performed (ceq) where the result is stored (stloc.0) in a variable. Based on this value, which is loaded back onto the stack (ldloc.0) a branch action is performed (brfalse.s) to label L 0012 when the value is false. If the value of the variable on the stack was true, the constant value of 1 is placed on the stack (ldc.i4.0) and stored in the variable moreThenOne (stsfld). Although the C#, VB and Delphi code samples where practically the same, the IL code is some- what different. The compiler introduced two comparisons and a new local variable to hold the result of one of these comparisons. Also the branching is reversed; the check is now for a negative instead of a positive value. Furthermore the IL is a stack based language. Still this piece of code behaves as indicated by the definition of method AssignWhenMoreThenOne. With the information described in this chapter, we know what type of semantical constructions we are interested in. Not only the constructions are important semantical information, also M.D.W. van Oudheusden 47
  • 70. 5.4 Program Semantics the control flow and the operands play an important part in the behavior of a function. The next chapter discusses how these constructions and the related data are represented in the Intermediate Language and how this information can be extracted from the code. 48 Automatic Derivation of Semantic Properties in .NET
  • 71. CHAPTER 6 Analyzing the Intermediate Language In Chapter 3 the .NET Framework is introduced and a brief introduction to the Common In- termediate Language (IL) is presented. This assignment uses the Compose /.NET project and hence the .NET languages, so we need access to the languages represented in the Intermediate Language. This chapter provides more details of the IL and explains how to access this IL. 6.1 Inside the IL An intermediate language is a CPU-independent instruction set, which resides between the source code and the machine code. This has several advantages; • The code can be optimized just before it is executed; • Allows for a platform independent layer before generating a platform dependent version. Optimization can occur per platform; • Interoperability of other languages compiling to the same IL. Functionality in the IL can be shared; • Multiple different kinds of higher level languages can compile to this intermediate lan- guage, so a large number of languages can be supported. Two major Intermediate Languages are Java byte code and the .NET IL. In this section we will only discuss the .NET IL. 6.1.1 Modules A .NET application consists of one or more managed executables, each of which carries meta- data and (optionally) managed code [51]. Managed executables are called modules and they basically contain two major components; metadata and IL code. Modules are being used by two components in the CLR (see Section 3.3); a loader and the just-in-time (JIT) compiler. 49
  • 72. 6.1 Inside the IL The loader is responsible for reading the metadata and creating an internal representation and layout of the classes and their members. Only when a class is needed, it is loaded. When loading a class, the loader runs a series of consistency checks on the related metadata. The JIT compiler compiles the methods encoded in IL into the native code of the underlying platform. The runtime does not execute the IL code, but the IL code is compiled in memory into the native code and this native code is executed. Only when a method is called, it is compiled. It is possible to precompile modules to native code for faster execution. The original file must still be present since it contains the metadata. When the IL code is compiled, it is also optimized by the JIT compiler. This means that the original IL code is almost not optimized because the target architecture is only know at run- time. The JIT compiler performs optimization algorithms like method inlining, constant fold- ing, dead code elimination, loop unrolling, constant and copy propagation, and so one. The file format of a managed module is based on the standard Microsoft Windows Portable Executable and Common Object File Format (PE/COFF). As such, a managed module is exe- cutable by the operating system. Figure 6.1 shows the structure of a managed module. When the module is invoked by the operating system, the .NET runtime can seize control over the execution. Figure 6.1: Structure of a managed executable module 50 Automatic Derivation of Semantic Properties in .NET
  • 73. 6.1 Inside the IL 6.1.2 Metadata Metadata is data that describes data. In the context of the common language runtime, metadata means a system of descriptors of all items that are declared or referenced in a module [51]. In a module there is a collection of tables containing different kinds of metadata. One table, the TypeDef table, lists all the types defined in the module. Another table lists the methods implemented by those types, another lists the fields, another lists the properties, and so on [65]. Additional tables hold collections of external type references (types and type members in other modules that are used by this module), the assemblies containing the external types, and so on. Metadata also describes the structural layout of all the types in the module. Besides the tables to store the metadata, there are also heaps in which a sequence of items is stored. For instance, a list of strings, binary objects, and so on. The runtime loader analyzes the consistency and protects the metadata headers and the metadata itself, making sure it can not be changed and pose security risks. 6.1.3 Assemblies The CLR can not use managed modules directly, but requires an assembly. An assembly is a deployment unit and contains metadata, (optionally) managed code and sometimes resources. The metadata is a system of descriptors of all the structural items of the application. It describes the classes, their members and attributes, relations, etcetera. Part of the metadata is the man- ifest. It provides information about the assembly itself like the name, version, culture, public key and so on. Furthermore it describes the relations to other files, like resources and other assemblies, and contains security demands. An example of an assembly is shown in Figure 6.2. Figure 6.2: Assembly containing multiple files An assembly contains one or more namespaces. A namespace is a collection of types that are semantically related. Apart from some syntax restrictions, developers can define their own M.D.W. van Oudheusden 51
  • 74. 6.1 Inside the IL namespace. The main purpose is to allow (meta) items to be unambiguously identifiable. How- ever, namespaces are not metadata items and hence are not stored in a metadata table. The .NET Framework uses an object-oriented programming model in which types are an im- portant concept. The type of an item, such as a variable, constant, parameter, and so on, de- fines both data representation and the behavioral features of the item. The Common Language Infrastructure standard (see Section 3.4) defines two kinds of types, namely value types and reference types as further explained in Section 5.3.6. The Framework supports only a single type inheritance and thus creats a hierarchical object model. At the top is the System.Object type, from which all the types are derived. In .NET the types can be divided in five categories; class, interface, structure, delegate and enu- merations. Types, fields and methods are the three important elements in managed program- ming [51]. The other elements are using metadata to provide additional information about these three. 6.1.4 Classes A class defines the operations an object can perform (methods, events, properties) and defines values that hold the state of the object (fields). A class also contains specific class metadata which can be divided into two concepts; type reference (TypeRef) and type definition (TypeDef). TypeDefs describe the types defined in the class whereas TypeRefs describe references to types that are declared somewhere else. In its simplest form we only get the TypeDef metadata table which contains flags about the visibility (like a public or private class) and a class reference to another type. There is more information if, for instance the class, implements other classes, uses custom attributes, is an enumeration or a value, et cetera. A class contains items like methods, properties, events or fields and these items are character- ized by signatures. A signature is a binary object containing one or more encoded types and resides in the metadata [51]. The first byte of a signature defines the calling convention and, in turn, identifies the type of the signature. Possible calling conventions are field, property, local variable, instance method or a method call identifiers. At line 3 in Listing 6.1 the syntax of a class definition is shown. The dotted_name defines the namespace this class belongs to. simple_name is the name of the class (the TypeDef) and class_ref is the name of the class it extends from. There are numerous flags to specify specific options for the type definition of the class. For instance, if the type is visible outside of the assembly or if it is an interface. 1 .namespace <dotted_name> 2 { 3 .class <flags> <simple_name> extends <class_ref> 4 { 5 6 } 7 } Listing 6.1: Syntax of a class definition 52 Automatic Derivation of Semantic Properties in .NET
  • 75. 6.1 Inside the IL 6.1.5 Fields Fields are, together with local variables inside a method, data locations. Information about the fields is stored in a fields metadata table. Additional tables specify data like the layout, mapping and constant values of the fields. The syntax for a field is listed in Listing 6.2. 1 .field <flags> <type> <name> Listing 6.2: Syntax of a field definition The owner of a field is the class or value type in the lexical scope of which the field is defined. The flags are used to specify extra options, the type defines the type of the field and finally the name indicates the name of the field. An example is given in Listing 6.3. 1 .field public string s 2 .field private int32 i Listing 6.3: Example of a field definition There are two types of fields; Instance fields These are created every time a type instance is created and they belong to this type instance. Static fields Fields which are shared by all instances of the type and are created when the type is loaded. The field signature does not contain an option to specify whether the field is static or instance. However, the compiler keeps separate tables for the two kinds and can classify the desired field. To load or store a field, there are two sets of instructions in the IL; one for static and one for instance fields. Fields can have default values and these are stored in the constant metadata table. Besides the default values for fields, this table can also contain default values for parameters of methods and properties. The syntax is listed in Listing 6.4 and an example is given in Listing 6.5. If the const_type is a null reference, a value is not mandatory. 1 .field <flags> <type> <name> = <const_type> [( <value> )] Listing 6.4: Syntax of a field definition with default value 1 .field private int32 i = int32(1234) Listing 6.5: Example of a field definition with default value A field declared outside a class is called a global field and belongs to the module in which it is declared. This module is represented by a special TypeDef record under the name <Module>. A global field is by definition static since only one instance of the module exists and no other instance can be created. 6.1.6 Methods A method has a couple of related metadata tables like the definition and reference information, implementation, security, semantics, and interoperability. The syntax for a method is listed in Listing 6.6. M.D.W. van Oudheusden 53
  • 76. 6.1 Inside the IL 1 .method <flags> <call_conv> <ret_type> <name>(<arg_list>) <impl> { 2 <method_body> 3 } Listing 6.6: Syntax of a method definition The flags define the options for the method such as the accessibility (private, public and so forth). The call_conv, ret_type, and arg_list are the method calling convention, the return type, and the argument list defining the method signature. The impl specifies additional im- plementation flags of the method. For example like whether the method is managed, in CIL or native format, if the method must be executed in single-threaded mode only, and so on. The name of the method is a name or one of the two keywords .ctor or .cctor. The instance constructor method (.ctor) is executed when a new instance of the type is created, the class constructor (.cctor) is executed after the type is loaded and before any one of the type members is accessed. The global .cctor can be used to initialize global fields. There are different kinds of methods as depicted in Figure 6.3. Static methods are shared by all instances of a type and do not require an instance pointer (referred to as this). They also cannot access instance members unless the instance pointer is provided explicitly. An instance method is instance-specific and has as its first argument the this instance pointer. Figure 6.3: Different kinds of methods Virtual methods can be overridden in derived classes, whereas non-virtual methods can still be overridden but have nothing to do with the method declared in the base class. This method is hidden, but can be called when the class name is specified. 6.1.7 Method Body The method body itself holds three parts, namely a header, IL code and an optional structured exception handling (SEH) table, see Figure 6.4. Currently, there are two types of headers: a fat and a tiny version indicated by the first two bits in the header. A tiny header is created by the 54 Automatic Derivation of Semantic Properties in .NET
  • 77. 6.1 Inside the IL compiler when the method does not use SEH nor local variables, has a default stack space of eight and its size is less then 64 bytes. Figure 6.4: Method body structure Local variables are declared and have their scope inside the method. Local variables only have names if the source code is compiled in debug mode. They are referenced in IL code by their zero based ordinals1. Unlike fields and method names, the names of local variables are not part of the metadata. Their names are stored inside a (separate) debug file, the program database (PDB file). If the keyword init has been added to the local’s declaration, the JIT compiler must initialize the local variables before the execution of the method. This means that for all the value types the constructor is called and all variables of object reference types are set to null. If the init flag has not been set, the code is regarded as unverifiable and can only run from a local drive with verification disabled. It should be noted that methods, just like global fields, can reside outside any class scope. These so called global methods are static and the same accessibility flags as for a global field apply. 6.1.8 IL Instructions Inside a method we find the method body which contains a header, IL code, and an optionally structured exception handling (SEH) table. IL code contains IL instructions, which are made up of an operation code (OpCode) and are sometimes followed by an instruction parameter. A list of all the available operational codes can be found in Appendix A. There are long and short parameter instructions. The long form requires a four byte integer, the short form only one byte. This can reduce the amount of space in the assembly, but the short form can only be used when the value of the parameter is in the range of 0 to 255 (for unsigned parameters). 1 numbers used to denote the position in an ordered sequence. M.D.W. van Oudheusden 55
  • 78. 6.1 Inside the IL IL is a stack based language, meaning that operands must be pushed onto the stack before an operation can use them. An operator grabs the values from the stack, performs the operation and (optionally) places the result back onto the stack. More generally, instructions take all required arguments from the stack and put the results onto the stack. If, for instance, a local variable is required by an instruction, then another instruction has to load this variable onto the stack. An exception to this rule are the instructions of the load and store group which are responsible for the pushing and popping of values onto and of the stack. Elements on the evaluation stack are called slots and can be one of the types listed in Ap- pendix B. Besides the evaluation stack, a method also contains an argument table and local variable table, both having slots with static types. We can distinguish different kinds of instructions which are listed in the sections below. 6.1.8.1 Flow Control Labels can be placed between the IL instructions to mark the first instruction that follows it. Labels are used by the control flow instructions to jump to a predefined part in the code. It is much easier and safer1 to use labels then offsets. The instructions dealing with control flow inside a method are the following; • Branching; • Switching; • Exception handling; • Returning. The branching instruction can be divided into three types of branching. First, we have uncondi- tional branching where the control flow jumps to another part of the code. Second, conditional branching where the control flow is directed to another location based on a true or false value on the stack. The third branching type is the comparative version where two values from the stack are compared according to the condition specified by the OpCode. This condition can be a greater then, not equal, less or equal and so forth. The switch instruction uses a jump table to determine where to jump to. It takes an unsigned integer from the stack and uses that number to jump to the target offset in the jump table sequence. A value of zero on the stack instructs the switch to jump to the first target offset in the list. In Listing 6.7 an example of an unconditional branching instruction and a switch instruction are listed. 1 Loop: 2 3 br Loop 4 5 switch(Label1, Label2, ,LabelN) 6 // Default case 7 Label1: 8 9 Label2: 1 Safer in the sense that when creating IL code by hand, it is safer to use labels instead of calculating, possibly incorrect, offsets. 56 Automatic Derivation of Semantic Properties in .NET
  • 79. 6.1 Inside the IL 10 11 LabelN: Listing 6.7: Control flow examples The exception handling instructions are divided into an exiting and an ending instruction. A block of code inside an exception handling clause cannot be entered or exited by simply branching. There are special state requirements prohibiting this. So there is a leave instruction to exit an exception handling block, which clears the stack space before branching. To indicate the end of an exception handling block a special endfinally instruction is used. This also clears the stack but does not jump. A method always ends with one or more ret instructions which returns the control flow to the call site. If there is a (single) value on the stack, the ret instruction will retrieve this value and push it back on the stack to be used by the calling method. 6.1.8.2 Arithmetical Instructions Arithmetical instructions are used for numeric data processing, stack manipulation, loading and storing of constants, arithmetical operations, bitwise operations, data conversion opera- tions, and logical condition check operations. Stack manipulation instructions perform an action on the evaluation stack and do not have a parameter. nop Performs no operation on the stack. Although not a stack manipulation instruction it is included in this list for lack of a better category; dup Duplicates the value on the top of the stack; pop Removes the value from the top of the stack. Constant loading instructions place the parameter with the constant value on the stack. There are instructions which directly specify the value to load so a parameter is not needed, as shown at line 2 in Listing 6.8. 1 ldc.i4 16 // Place the constant 16 onto the stack 2 ldc.i4.7 // Load the value 7 on the stack. Note; there is no parameter Listing 6.8: Constant loading instructions It is possible to load and store values using pointers. The value on top of the stack refers to a specific address where the value can be loaded from or stored to. Table 6.1 lists all the possible arithmetical operations. The overflow instructions raises an exception when the result does not fit the target type. Bitwise and shift operations have no parameters. They take one or two values from the stack, perform there action and place the result back onto the stack. Table 6.2 provides a summary of the bitwise and shift operations. The conversion instructions take a value from the top of the stack, convert it to the type spec- ified by the instruction and put the result back onto the stack. There are of course some rules; not every type can be converted to another type and information can get lost when converting M.D.W. van Oudheusden 57
  • 80. 6.1 Inside the IL OpCode Description add Addition sub Subtraction mul Multiplication div Division div.un Unsigned division rem Remainder, modulo rem.un The remainder of unsigned operands neg Negate thus invert the sign add.ovf Addition with overflow add.ovf.un Addition of unsigned operands with overflow sub.ovf Subtraction with overflow sub.ovf.un Subtraction of unsigned operands with overflow mul.ovf Multiplication with overflow mul.ovf.un Multiplication of unsigned operands with overflow Table 6.1: Aritmetical operations in IL to a narrowing type (for example converting a 32 bit integer to a 16 bit integer). Special over- flow conversion opcodes are available when there is a need to throw an exception whenever the value must be truncated to fit the target type. Logical condition check instructions are used to compare two values based on a certain opera- tor. The result is placed on the stack and is not directly used to branch to another location. A separate conditional branching instruction (branch true at line 3) can follow a logical condition check instruction (check equal at line 2) as shown in Listing 6.9. 1 Loop: 2 ceq 3 brtrue loop Listing 6.9: Condition check followed by a branching instruction 6.1.8.3 Loading and Storing Almost all the instructions in the CIL operate on values on the stack except for the loading and storing instructions. This group of instructions is used to load values from local variables, fields, and method arguments onto the stack and to store items from the stack into these local variables, fields, and arguments. The ldarg and starg instructions handle the argument loading and storing while the ldloc and stloc are used for local variables. The parameter indicates the ordinal of the argument or variable to load or store. Remember; the first (zero based) argument of an instance method is the object pointer. The ldfld and stfld are the instructions for field loading and storing. A field signature does not indicate whether the field is static or instance so there are special instructions to load and store field values (or a pointer to) from or onto the stack. For each static load/store function there is also an instance load/store function. 58 Automatic Derivation of Semantic Properties in .NET
  • 81. 6.1 Inside the IL OpCode Description and Bitwise AND (binary) or Bitwise OR (binary) xor Bitwise exclusive OR (binary) not Bitwise inversion (unary) shl Shift left shr Shift right shr.un Shift right, treating the shifted value as unsigned Table 6.2: Bitwise and shift operations in IL 6.1.8.4 Method Calling From within a method body it is possible to call other methods (see Listing 6.10). There are a number of instructions to use for method calling. These call instructions use a token as param- eter which contains either a MethodDef or a MethodRef of the method being called. Parameters of the calling method should be put into the stack in order of their appearance in the method signature before the actual call to the method. In case an instance method is being called, the instance pointer should be placed on the stack first. If the called method does not return void, it will place its return value back on the stack when returning. Methods can be called directly or indirectly. With an indirect call, not the method name, but a pointer to the method is used. 1 ldstr "Enter a number" 2 call void [mscorlib]System.Console::WriteLine(string) 3 call string [mscorlib]System.Console::ReadLine() Listing 6.10: Method call example 6.1.8.5 Exception Handling Exception handling is a feature of the managed runtime. The runtime is capable of detecting exceptions and finding a corresponding exception handler. SEH information is stored in a table after the IL code of the method body. There are two forms of SEH declaration; a label form as shown in Listing 6.11 with an example in Listing 6.12 and a scope form, shown in Listing 6.13. 1 .try <label> to <label> <EH_type_specific> handler <label> to <label> Listing 6.11: Exception handling in label form The EH_type_specific is either a catch, filter, fault or a finally type. It uses labels to define a guarded block of code and a block of code which handles the exception. This is called the labeled form of exception handling. With the scope form, a try instruction is placed before the actual instruction block and is followed by the exception handling catch block, see Listing 6.13. 1 BeginTry: 2 3 leave KeepGoing 4 BeginHandler: 5 M.D.W. van Oudheusden 59
  • 82. 6.1 Inside the IL 6 leave KeepGoing 7 KeepGoing: 8 9 ret 10 .try BeginTry to BeginHandler catch [mscorlib]System.Exception 11 handler BeginHandler to KeepGoing Listing 6.12: Exception handling in label form example 1 .try { 2 // Guarded code 3 leave KeepGoing 4 } 5 catch [mscorlib]System.StackOverflowException { 6 // The exception handler 1 code 7 leave KeepGoing 8 } 9 catch [mscorlib]System.Exception { 10 // The exception handler 2 code 11 leave KeepGoing 12 } Listing 6.13: Exception handling in scope form Exception handling structures can be nested either using the labeled or the scope form. It is illegal to branch in or out a guarded block of code. This type of block can only be entered from the top (where the guarded block is defined by a TryOffSet) and code inside the handler blocks can only be invoked by the exception handling system. Guarded and handler blocks can only be exited by using the leave, throw, or rethrow instruction. The evaluation stack will be cleared before the branching. 6.1.9 Normalization Because the IL is stack based, a simple expression can become very complex. Take for example the expression y := (x + 2) − 1. Adding the constant value 2 to the value of x and subtract the constant 1. The result is stored in the variable y. Listing 6.14 shows this expression in IL code. 1 ldloc x 2 ldc 2 3 add 4 ldc 1 5 sub 6 stloc y Listing 6.14: Stack based expression in the Common Intermediate Language Recognizing the expression is difficult. Temporary results are placed on the stack and are re- trieved by the next operation. For instance, the result of the add operation is not stored in a variable. To convert a stack based program to a semantical representation, it is necessary to perform a normalization step. The example in Listing 6.14 can be normalized by introducing a temporary variable. As such there is a single assignment for each statement. temp = x + 2 y = temp - 1 60 Automatic Derivation of Semantic Properties in .NET
  • 83. 6.1 Inside the IL This is only a simple example, but expression with multiple sequential operators can be nor- malized in more temporary assignments. 6.1.10 Custom Attributes The .NET Framework can be extended using custom attributes, special metadata items. Cus- tom attributes cannot change the metadata tables because those tables are hard-coded and part of the runtime. The information in the custom attributes can be used by the program itself, but also by compiler or debugger. A major disadvantage of custom attributes is the amount of resources they occupy in the source code and the fact that the IL is not able to access custom attributes directly so a reflection technique must be used, which is a relatively slow mechanism. Custom attributes can be attached to any item in the metatables except for custom attributes itself. Attaching an attribute to an instance is not possible, only an assignment to the type itself is allowed. The value of a custom attribute is a BLOB, a binary large object and contains the arguments of the attribute constructor and optionally a list of fields with values. 1 .custom instance void <class_ref>::.ctor(<arg_list>) [ = ( <hexbytes> ) ] Listing 6.15: Custom attribute syntax in IL Listing 6.15 shows the declaration of a custom attribute in IL code. The class_ref is the name of the class implementing the attribute, arg_list are the arguments for the constructor of the attribute and hexbytes contains the BLOB representation of the argument values. An example is shown in Listing 6.16. 1 .custom instance void MyAttribute::.ctor(bool) = (01 00 01 00 00) Listing 6.16: Custom attribute example in IL The position of the custom attribute in the code defines the owner of the attribute. All the attributes declared in the scope of an item, belong to that item. If there is no scope, then the attributes declared after a item belong to that item. This is the opposite of the use of custom attributes in a higher level programming language like C# or VB .NET where the custom at- tribute precedes the item it belongs to (see the example in Listing 6.17). There is another form of custom attribute declaration where the owner can be explicitly specified for metadata items, which are declared implicitly such as TypeRefs and MemberRefs. This form can appear anywhere in the source since the owner is specified. 1 public class ExampleClass 2 { 3 [MyAttribute(true)] 4 public void ExampleMethod( ) 5 { 6 // 7 } 8 } Listing 6.17: Example of a custom attribute in C# M.D.W. van Oudheusden 61
  • 84. 6.2 Access the IL 6.2 Access the IL In the previous section the Intermediate Language used by Microsoft is described. Used as an extra layer before compiling the instructions to machine code, it provides an accessible pro- gram representation of the software created by a higher level programming language like C# or VB.NET. However, the instructions together with the metadata is stored in a byte code for- mat. A format optimized for speed and efficiency, but not direct readability. This section lists a number of the ways to read the contents of the byte code so we have access to the instructions and data. 6.2.1 How to Read IL Basically to access IL code we have to parse the byte code and convert the code to their IL representation. For instance, the hexadecimal value 5A stands for the IL instruction Add. Using the ECMA 335 specification [25], the standard for the Common Language Infrastructure and thus the Intermediate Language, we can read and parse the byte code. However, it is not as simple as it looks. Information is contained in different metadata tables and must be associated with the correct elements. Instead of writing our own CIL byte code parser, it is more efficient to use an existing tool. Called IL readers, they parse the byte code and create a higher level representation of the instructions. How this is represented differs by the used tool. The process of analyzing a subject system to create representations of the system at a higher level of abstraction is called reverse engineering [14]. A tool to reverse engineer a program is called a decompiler, performing the reverse operation to that of a compiler. Microsoft even provides a free decompiler called ILDASM (Intermediate Language Disassembler), which is part of the Software Development Kit (SDK). It is the counterpart of ILASM, used to convert plain text IL code to byte code. Using both tools you can even perform a round trip compila- tion, so it is possible to decompile an assembly with ILDASM and recompile with ILASM and get a correct assembly again. ILDASM provides a Graphical User Interface (GUI), displayed in Figure 6.5, to show the contents of the .NET file in a tree structure. With special options it allows access to the metadata and the CIL code of the selected methods. Figure 6.5: Graphical User Interface of ILDASM 62 Automatic Derivation of Semantic Properties in .NET
  • 85. 6.2 Access the IL Besides the GUI, it allows output to a text file. By specifying the correct options, ILDASM generates a .IL file containing the Intermediate Language code. An example of this format is shown in Listing 3.1. Since this is a much easier representation to read than the byte code version it can be used as an input for an IL parser. We can even change this representation and recompile it using the ILASM program. A technique currently used by the Compose /.NET project [10]. Reflector1, a tool created by Lutz Roeder, is a commonly used tool to inspect .NET assemblies. Figure 6.6 shows a screen shot of this program. This tool can convert the analyzed IL code to a higher level language like C#, VB.NET or Delphi. With the use of plugins it is possible to perform analysis on the code or even to completely retrieve the source of an assembly in the language of your choice. Figure 6.6: Lutz Roeder’s Reflector With the help of these decompiler tools it is very easy for anybody to reverse engineer assem- blies back into readable source code. Malicious people can now crack the programs, exploit se- curity flaws and steal ideas. If it is imperative to protect your source code, obfuscation should be applied. Code obfuscation is the generation of code, which is still understandable by the compiler, but is very difficult for humans to comprehend. Some of the techniques used with obfuscation are removing nonessential metadata, control flow obfuscation, string encryption, reordering of elements, and size reduction. This is not applied on the source code, but on the assemblies. Although it is not a foolproof security for your assemblies, it makes it very difficult to reverse engineer an application. 1 http://www.aisto.com/roeder/dotnet/ M.D.W. van Oudheusden 63
  • 86. 6.2 Access the IL Besides tools for visualization of the source code, there are tools to perform specific analysis tasks on the code. For instance, the Fugue protocol checker1. By specifying custom attributes inside the source code, Fugue checks if the code conforms to the declarative specification. Pre- and postcondition, resource usage, database queries, and so on can be specified, as well as the different states a program must adhere to. Fugue performs a static analysis to find possible problems which could occur at runtime[22]. FxCop2 is another static analysis tool that checks .NET managed code assemblies for confor- mance to the Microsoft .NET Framework Design Guidelines. It generates reports on possible design, localization, performance, and security improvements based on the design guidelines written by Microsoft. Targets, the managed assemblies, are checked by rules. If a rule fails, a descriptive message is displayed revealing relevant programming and design issues. Figure 6.7 displays a screenshot of FxCop. Figure 6.7: Microsoft FxCop There are a lot of other tools using a representation of the IL byte code to perform an analysis task. However, the purpose of this assignment is to perform our own analysis. We not only need an IL reader, but also a program interface to access the IL representation so semantic analysis can be performed. The next sections list some possible tools which can be used for this goal. 6.2.2 Reflection Reflection is a method to inspect the metadata to get information about an assembly, module, or type [65, 4]. The .NET Framework Class Library contains special functionality in the Sys- tem.Reflection namespace to access the metadata without converting the byte code to another format. This is internally handled by the reflection classes. Reflection is used at runtime by 1 http://research.microsoft.com/˜maf/projects/fugue/index.htm 2 http://www.gotdotnet.com/team/fxcop/ 64 Automatic Derivation of Semantic Properties in .NET
  • 87. 6.2 Access the IL a program to analyze itself or other components. The structure of the analyzed component is converted to an object representation. There are three main classes; • System.Reflection.Assembly, which represents assemblies; • System.Reflection.Module, which represents managed modules; • System.Type, representing types. These classes contain properties and methods to get further information about the assemblies. For instance, the Assembly class contains functionality to get all the types in the assembly as System.Type objects. This type information is the root of all reflection operations and allows access to methods, fields, parameters, and so on. Besides the method in the Assembly class, there are other ways to dynamically load types. This is called late binding, binding to a type at run time rather than compile time. If we have a Type object, we can also invoke methods on this type, such as activating or execut- ing the method. In Listing 6.18 an example of using reflection to load an assembly, get the class ImageProcessing and the method GetImage is presented. This method is invoked and returns an Image object. 1 Assembly assm = Assembly.Load ("assembly.dll"); 2 Type type = assm.GetType ("ImageProcessing"); 3 MethodInfo method = type.GetMethod ("GetImage"); 4 Object obj = Activator.CreateInstance (type); 5 Image image = (Image) method.Invoke (obj, null); Listing 6.18: Reflection example With reflection it is possible to access the custom attributes, the elements providing extra infor- mation about other items. Because it is not possible to directly access the custom attributes in the IL code, reflection is the only way to retrieve these items. The reflection library has another component named Emit and it allows a compiler or tool to emit metadata and Intermediate Language code and execute this code or save it to a file. Reflection is a powerful way to perform static analysis on assemblies. However there are two major problems. The first one is speed. Reflection takes place at runtime and must parse and process the assembly before it can build a representation. This is a time consuming process. Speed improvements are included in .NET version 2.0. A more serious problem is the lack of method body information. Reflection can reveal almost all the information needed except the contents, the body, of a method with the actual instructions inside. This is the information we are mostly interested in. Again, in version 2.0, the reflection classes have been enhanced with a function to get the method body contents. However, this will only give the byte code, so we still have to parse and convert this to another representation. Reflection gives us a partial implemented tool to perform code analysis. If the reflection possi- bilities regarding speed and the method body are improved then it will be a good candidate to use. M.D.W. van Oudheusden 65
  • 88. 6.2 Access the IL 6.2.3 Mono Cecil Cecil1 is a .NET library to inspect and generate .NET assemblies. It provides more or less the same functionalities as reflection. You can load an assembly and browse through all the types. In addition to reflection, it can read and parse IL instructions and has functionality to change the code and save it back to disk. Listing 6.19 gives an example of opening an assembly and reading the types inside this assem- bly. If we have access to a type object, we can use that object to retrieve the fields, methods, constructors, and so on. 1 // Creates an AssemblyDefinition from the "assembly.dll" assembly 2 AssemblyDefinition assembly = AssemblyFactory.GetAssembly ("assembly.dll"); 3 4 // Gets all types which are declared in the Main Module 5 foreach (TypeDefinition type in assembly.MainModule.Types) { 6 // Writes the full name of a type 7 Console.WriteLine (type.FullName); 8 } Listing 6.19: Cecil get types example A method object has a property called Body which returns a MethodBody instance. We can use this object to access all the local variables, the exception handling information and the IL in- structions. Besides from directly accessing the instructions, we can also use the analysis tools of Cecil. The flow analysis class create basic instruction blocks in which instructions are grouped together in a block. A control flow instruction, like branching, exception throwing and return- ing, creates a new block and blocks are connected to each other. If the last instruction in a block is a branching instruction to another block, then the current block has a link to that block. This gives us the opportunity to trace the control flow of the instructions in the method. Inside the blocks are instructions and each instruction has an OpCode, indicating the type of instruction, and an operand, the parameter of the instruction (when available). The offset of the instruction, indicating the placement of the instruction inside the method, is also stored in the instruction. A next and previous method allows for navigation between the instructions and a visitor pattern can be used the visit all the different kinds of instructions. Not only can Cecil read IL instructions, it can also add or change instructions and save the changed assembly. Cecil is used in a number of analysis tools, for instance, tools which checks if code is type safe or for code optimization. Although Cecil had the ability to access the IL instructions, it is limited in its abilities. An instruction does not contain specific information about the data it is working on as specified by the operand and there is no direct link to this operand. We have to determine the type of the operand ourself. Support for .NET Framework version 2.0 was not available in the version of Cecil examined for the assignment. It is possible that newer versions can handle the next version of the Framework without any problems. 1 http://www.mono-project.com/Cecil 66 Automatic Derivation of Semantic Properties in .NET
  • 89. 6.2 Access the IL 6.2.4 PostSharp A similar program to Cecil is PostSharp1. This tool reads .NET assemblies, represents them as a Code Object Model, lets plug-ins analyze and transforms this model and writes it back to binary form. The two main purposes of this application are program analysis and program transformation. PostSharp is designed for .NET version 2.0 and supports the new language constructs in the CIL. Working in combination with the reflection capabilities of .NET it creates its own repre- sentation of the instructions inside a method body. Listing 6.20 gives an example of reading an assembly and printing all the instruction to the console. 1 // Get the assembly 2 System.Reflection.Assembly assembly = 3 System.Reflection.Assembly.LoadFrom("assembly.dll"); 4 System.Reflection.Module[] modules = assembly.GetModules(); 5 6 // Get all the modules 7 foreach (Module mod in modules) 8 { 9 // Open a module with PostSharp 10 PostSharp.ModuleReader.ModuleReader mr = 11 new PostSharp.ModuleReader.ModuleReader(mod); 12 PostSharp.CodeModel.ModuleDeclaration md = mr.ReadModule(); 13 14 // Get the types 15 foreach (TypeDeclaration t in md.Types) 16 { 17 // Get all the methods in the type 18 foreach (MethodDeclaration method in t.Methods) 19 { 20 // Get the body of the method 21 MethodBodyDeclaration b = method.Body; 22 23 // Print the method name 24 Console.WriteLine(method.Name); 25 26 // Enumerate through all the instructions 27 b.ForEachInstruction(delegate(InstructionReader instructionReader) 28 { 29 Console.WriteLine("Read instruction {0} as {1}", 30 instructionReader.OpCodeNumber, instructionReader.OperandType); 31 }); 32 33 method.ReleaseBody(); 34 } 35 } 36 } Listing 6.20: PostSharp get body instruction Just like Cecil, PostSharp has the ability to split the instructions into blocks for a representation of the control flow. Instructions are represented in the code model with detailed information about the instruction and their operands. Although the operands type information must still be resolved. 1 http://www.postsharp.org M.D.W. van Oudheusden 67
  • 90. 6.2 Access the IL PostSharp is more mature than Cecil, but is still under heavy development. At the time of implementation, PostSharp was not production ready. 6.2.5 RAIL The Runtime Assembly Instrumentation Library, called RAIL1, is a project of the University of Coimbra, Portugal. Like Cecil and PostSharp, it allows .NET assemblies to be manipulated and instrumented before they are executed. It fills the gap between .NET reflection and the emitting functionality. Its primary use is the transformation of assemblies by changing types, methods, fields or IL instructions [12]. RAIL creates an object-oriented representation of the assembly which can be manipulated to make changes in the code. Besides the structured view, RAIL also creates tables to hold the sequence of objects and object references that represent the applications IL code and all the exception handling related information. Applications of RAIL are runtime analysis tools, security verification, MSIL optimization, As- pect Oriented Programming, and so on. However at the time of writing, RAIL was immature and could not be used for even simple analysis tasks. 6.2.6 Microsoft Phoenix Phoenix2 is a framework for building compilers and a wide range of tools for program analysis, optimization, and testing [50]. Phoenix is a joint project between Microsoft Research and the Developer Division and is the basis for all future Microsoft compiler technology. It supports a wide range of hardware architectures and languages. Building blocks form the core of Phoenix, implemented around a common intermediate rep- resentation (IR). Those blocks are called phases and are executed one by one in a predefined order. Phases are used to build, modify or analyze the IR and in most cases, the final phase will write the IR to a specific output format. Figure 6.8 shows the components of the Phoenix platform with the IR as the main structure that the phases can use to interact with the data. Readers are used to process different types of input, like AST, C intermediate language, Common IL (MSIL), PE files and other binaries. The input is read into the IR that represent an instruction stream of a function as a series of dataflow operations and the phases are executed in sequence. Each phase performs some operations on the IR and hence the IR can be in different levels of abstraction. For instance, during a compi- lation with Phoenix, the IR is transformed from a high level IR, which is machine independent, to the final instructions and addresses, a low level IR, which is machine dependent. Finally, the writers are used to build an executable or library. The list of phases can be changed to include or replace other phases at someplace in the se- quence. If an analysis phase needs access to a high level representation of the code then include the phase at the start of the sequence. 1 http://rail.dei.uc.pt/ 2 http://research.microsoft.com/phoenix 68 Automatic Derivation of Semantic Properties in .NET
  • 91. 6.2 Access the IL Figure 6.8: Platform of Phoenix Besides the representation of the functions in the IR, there is also an API to perform analysis on the data, like data flow, control flow, graphs (inheritance, call and interference), exception handling, Static Single Assignment1, and so on. The IR can also be modified by adding or changing instructions of a function or by changing functions. This is ideal for instrumentation and profiling, but also for AOP code weaving. There are two ways to use Phoenix, either as a compiler back-end or as a standalone tool. As a compiler back-end it uses the Phoenix framework and the input and output modules with the custom phases to do a compilation. As a standalone tool it is possible to directly call the Phoenix API and implement your own phases and place those at the right place in the phase list. Each phase implements an Execute function. A Unit is passed as a parameter to this function and can contain any of the unit types listed in Table 6.3. It is up to the phase to determine Unit Description FuncUnit Encapsulates the information required during compilation of a single function or method. DataUnit Represents a collection of related data, such as a set of initialized vari- ables or the result of encoding a FuncUnit. ModuleUnit Represents a collection of functions. PEModuleUnit Represents a portable executable (PE) image, such as an EXE or a DLL. AssemblyUnit Represents a .NET Framework assembly compilation unit. ProgramUnit Represents an executable image compilation unit, an EXE or a DLL. GlobalUnit Represents the outermost compilation unit. Table 6.3: Phoenix unit hierarchy if the unit is of the type it is interested in. The FuncUnit is the fundamental compilation unit in Phoenix and contains all the information necessary to compile a single function. For code 1 An intermediate representation in which every variable is assigned exactly once. M.D.W. van Oudheusden 69
  • 92. 6.2 Access the IL Figure 6.9: Control flow graph in Phoenix [59] analysis at the instruction level, this is the most interesting unit to analyze. It does not only contain the instructions, but also graphs, exception handling information and type information. Each phase has access to the intermediate representation (IR), which represent an instruction stream of a function as a series of data-flow and/or control-flow operations. Instructions in the IR have an operator, a list of source operands, and list of destination operands. The IR is also strongly typed, meaning the types on the operands that reference data are stored. If, for instance, an integer is placed onto the stack, then the source operand of the load instruction is of the type integer. The type is determined by the operator and by the types of the source operands. The control flow in the IR is explicit, instructions are in sequence. Using the flows graphs functionality it is possible to create basic blocks representing a sequence of instructions with flow edges to other blocks. Each block starts with a unique label and ends with a branching instruction or, optionally, an exception raising instruction. Figure 6.9 shows an example of the control flow graph in Phoenix. The IR instructions can be divided in two different kinds; 70 Automatic Derivation of Semantic Properties in .NET
  • 93. 6.2 Access the IL Real Describes instructions that represent operations with dataflow or control flow, most of which map to one or more machine instructions. Pseudo Instructions that represent things such as labels and statically allocated data. Table 6.4 shows the different forms of instructions available in the Phoenix IR. The last three items are pseudo instructions, the rest are real instructions. Instruction Description ValueInstr Any arithmetic or logical operation that produces a value CallInstr Function invocation, either direct or indirect CmpInstr A compare instruction that generates a condition code BranchInstr Control flow for a branching, condition/unconditional and returning SwitchInstr Control flow for a switching, multi-way computed branch OutlineInstr Defines an outline summary instruction in the IR for code moved out of the main instruction stream: e.g. asm blocks LabelInstr User-defined labels and control flow merge points in the code stream. PragmaInstr Arbitrary user supplied directives. DataInstr Statically allocated data. Table 6.4: Phoenix instruction forms Each instruction object contains properties such as the OpCode indicating the kind of opera- tion, a source operand list, and a destination operand list. Based on the type of the instructions, more properties can be available. The BranchInstr will contain properties with links to the La- belInstr for a true and a false value of a condition. The CallInstr has a property indicating the name of the function being called. In Listing 6.21 part of a phase is listed. The execute function checks if the type of the Unit is a FuncUnit. If this is the case, it will build a flow graph and go through all the instructions in each block, while printing the OpCode value to the console. 1 protected override void Execute(Phx.Unit unit) 2 { 3 // Try casting to a FuncUnit 4 Phx.FuncUnit func = unit as Phx.FuncUnit; 5 if (func == null) 6 { 7 // Only interested in FuncUnits 8 return; 9 } 10 11 bool noPrevFlowGraph = func.FlowGraph == null; 12 if (noPrevFlowGraph) 13 { 14 // Build a control flow graph 15 func.BuildFlowGraphWithStyle(Phx.Graphs.BlockStyle.SplitEachHandlerEdge); 16 } 17 18 Phx.IR.Instr instr; 19 Phx.Graphs.BasicBlock basicBlock; 20 basicBlock = func.FlowGraph.StartBlock; 21 M.D.W. van Oudheusden 71
  • 94. 6.2 Access the IL 22 // Loop through all the basic blocks 23 while (basicBlock != func.FlowGraph.LastBlock) 24 { 25 26 // Begin the per-block instruction traversal. 27 instr = basicBlock.FirstInstr; 28 for (uint i = 0; i < basicBlock.InstrCount; i++) 29 { 30 // Write the OpCode to the console 31 Console.WriteLine(instr.Opcode); 32 33 // Get the next instruction in this block 34 instr = instr.Next; 35 } 36 37 // Get the next block 38 basicBlock = basicBlock.Next; 39 } 40 41 // Clean up the flowgraph 42 if (func.FlowGraph != null) func.DeleteFlowGraph(); 43 } Listing 6.21: Phoenix phase execute example Phoenix is the most extensive IL reader and writer discussed in this chapter. It supports the .NET Framework version 2.0, gives detailed information about the operands of an instruction, and has flow graphs capabilities. It is more mature than Cecil and PostSharp, but is still under development. However, it has the support of Microsoft and it is used by Microsoft to compile applications like Windows, PowerPoint, certain games, and for building test tools. There is more documentation than the other IL readers discussed, but still information and samples are scarce. 72 Automatic Derivation of Semantic Properties in .NET
  • 95. CHAPTER 7 Design and Implementation Chapter 5 discussed the semantics of programming languages and Chapter 6 described the in- ner workings of the language our target source code is in. This chapter brings the two together by showing the design of the Semantic Analyzer. 7.1 General Design A high level overview of the system is needed before we can elaborate on the more specific parts. This section presents a general overview of the complete system, describes the limita- tions we have to take into account, and the flow of the program and its various components. It also specifies the coding guidelines used for the implementation. 7.1.1 Introduction Basically, we want to read an assembly, perform an analysis task on it, and produce some output. To perform the querying of data, we store the semantic representation of the input assembly in a semantical metamodel first (more details in Section 7.2). We can use this model to reason about the behavior of the elements in the source. To archive this, we perform a number of steps; 1. Read a .NET assembly into memory; 2. Parse the intermediate language byte code to a readable representation; 3. Retrieve the structure of the assembly; 4. Build a semantic model based on the structure; 5. Convert IL instructions to a semantical representation and store in the model; 6. Provide a querying mechanism to search and reason about the program. 73
  • 96. 7.1 General Design The reading and parsing of an assembly is handled by an existing tool. There are multiple IL readers available as seen in Section 6.2 and since Phoenix is best suitable for the job, we use this tool to access the assemblies. The source code is converted to a metamodel, a higher level representation of the code, which provides information to determine the behavior of the program. An interface allows for the searching and working with the data inside this model. Two types of applications are created to test the semantical extractor. One is a command line utility where the input is one or more assemblies and the output is determined by the supplemented plugins. The other application is a Windows Forms application, which provides a GUI to browse the metamodel and see all its properties and graphs. 7.1.2 Design Limitations Before the semantic analyzer is designed, a number of design requirements have to be taken in consideration. 1. Because of the structure of the Compose compilation process, we do not have access to the full .NET assemblies at the start of the compilation (thus not before the actual weaving) [39]. The reason for this is the method introduction, which introduces new signatures making it impossible for the compiler to compile the source. By using dummy files and custom compilation the Compose compiler can create the assemblies. 2. The analyzer should be language independent. Compose is available for multiple plat- forms and although the focus is on the .NET version of Compose , it would be wise to consider a model capable of expressing multiple source languages. To take these points into account there are a number of possible solutions. Compose Compilation Process Looking at item one, we have to redesign the way the compila- tion process is working. If we have a .NET assembly at the first stage of the building process we can run the semantic analyzer at that point. To be able to do this we must directly compile the source and the dummy files so we have an assembly. However the dummy files only contain signatures and lack method body implementations. The analyzer needs the implementations, thus we cannot use these assemblies at this point in time. Another solution is to use the assemblies modified by ILICIT [10]. However, at that point the selection for the placement of the filters has already been completed. New selection points based on the semantics of methods are too late to be introduced. We can still use the analyzed assembly to perform other tasks, like resource usage checking, side effects determination, and so on. The third option is to only analyze the source files, as long as they are present as an IL repre- sentation. They do not contain the added aspect code, but can be used as a source. The most elegant solution is the possibility to switch off type checking so an assembly can be created by the standard .NET compilers as the first step in the Compose compilation process. This is not possible with the default implementation of the .NET compilers (writing our own compiler is an option to overcome this, but requires too much effort for each .NET language). 74 Automatic Derivation of Semantic Properties in .NET
  • 97. 7.1 General Design Because of this limitation, the first version of the semantic analyzer will primary be used for resource extraction and providing extra information to other Compose modules. This is an action which can be performed after (or separated from) the main Compose compilation pro- cess. The Semantic Analyzer will be placed after the ILICIT module. Language Independancy Regarding issue two, the language independency of the analyzer, we have to make sure we can store the semantics of any object oriented language. This means we have to distinguish the language specific parts from the common parts and only store the com- mon behavioral information. Type information, for instance, is language dependent (the type system differs in the different OO implementations) and must be stored in a special manner. Behavior is still very generic and not directly connected to the source. There are differences between the naming in Java and C# such as the ToLowerCase in Java and the Lower in C#, but the operations act the same. Section 7.2 gives more details about the implementation of a language independent system. Figure 7.1 provides a general overview of the system. Different Semantic Extractors can be used for specific source languages. They each use their own reader and parser to access the source code and convert the code to a semantical representation, which is stored in the Semantical Model. The Semantic Database is used to access this model and contains the querying mecha- nism. Plugins perform the specific tasks to get the required behavioral information. Figure 7.1: Process overview Since the Semantic Model is language independent it can theoretically handle all types of source M.D.W. van Oudheusden 75
  • 98. 7.1 General Design languages as long as they are object-oriented. Besides the Java elements, we could also add a Borland Delphi extractor to process Object Pascal source files and store the semantics in the model. 7.1.3 Control Flow Because of the design limitations discussed earlier there is a specific control flow the analyzer will use when we integrate the system into Compose .NET. The analyzer will be implemented as a stand-alone console tool. The main analyzer is in a separate assembly so in the future it can also be called from another front-end. The Semantic Extractor will return a metamodel of the analyzed assembly. This model is passed by the console application to the plugins, which can query this model using the Semantic Database API. This way there is no need to save the metamodel to file and read it again. Figure 7.2: Control flow of the console application with a plugin Figure 7.2 shows this flow with the Resource Checker plugin. It is possible to add multiple plugins to the system by using command switches. This pipeline flow of the analyzer is not the only way the application can be used. Since all the functionality is spread in different components, it is also possible to only use an extractor and work directly on the metamodel, not using the database or plugins. 76 Automatic Derivation of Semantic Properties in .NET
  • 99. 7.1 General Design 7.1.4 Building Blocks The requirements ensure a separation of functionality in different components. For instance, the extractor using the Phoenix library is in a different assembly so the plugins have no de- pendency on the Phoenix subsystem (or any other language dependent system). The next paragraphs describe the different components of the system. The whole system is called the Semantic Analyzer. 7.1.4.1 Semantic Extractor This component is responsible for the extraction of the semantics from a source file. This source file can be a .NET assembly or a Jar file for the Java version. Currently there only is a .NET Semantic Extractor. The Extractor uses the Microsoft Phoenix library for the actual reading and parsing of the IL code. The resulted semantics will be stored in the metamodel and returned to the caller. Another component, like a command line application, calls the extractor and passes the returned metamodel to the plugins. When it is possible to create IL code directly at the start of the Compose compilation process, the extractor can, in theory, be placed before the other Compose components to determine semantic weaving points. Besides Phoenix, it is also possible to use other IL readers for the same task. Using the provider design pattern [40], we can chose which specific extractor must handle the calls issued to the abstract base class, the SemanticExtractor class. The available providers are listed in a con- figuration file and only one provider provide the actual implementation of the abstract class. Support for the provider design pattern is part of the .NET Framework version 2.0 Class Li- brary. 1 // Change the default provider to a new provider 2 SemanticExtractor.SetProvider("phoenix"); 3 4 // Call the Semantic Extractor 5 IList<SemanticItem> semanticItems = SemanticExtractor.Analyse(assemblyName); Listing 7.1: Calling the SemanticExtractor Listing 7.1 shows how to change the default provider to another provider and how to start the actual analysis process. The value phoenix is one of the values defined in the applications configuration file and can be found in Appendix C. 7.1.4.2 Semantic Metamodel The Semantic Metamodel is an object model, created and filled by the Semantic Extractor and passed to the plugins as data source. It is contained in memory, but a representation of the model can be written to an xml format. Recreating the metamodel from the xml file is currently not supported. This model is language independent. It is a higher level view of the source code with the same structure (classes, methods, and so forth) as the original code. The behavior of a method body M.D.W. van Oudheusden 77
  • 100. 7.1 General Design is represented in a special way, using actions and blocks. More information about this model is given in Section 7.2. 7.1.4.3 Semantic Database The database allows for storing the model and to search for specific data in this model. It is called by the plugins and works directly on the model (it does not need the Phoenix library or the original source). To do this, it supplies a SemanticDatabaseContainer object. Not only is the metamodel stored in this database, it also provides a number of ways to access the data and to search in the database. Although this is written in a .NET language (namely C# version 2.0), it uses functionality which can be ported to Java (version 2 required). More information in Section 7.4. 7.1.4.4 Plugins A plugin is a piece of code which uses the semantic model for a specific purpose. The Semantic Analyzer is very general as where the plugins perform specific detailed actions using these general functions. For instance, resource usage will be collected by the Resource Checker plugin. This plugin pro- vides information for the SECRET module and needs to get all the methods with a parameter of type ReifiedMessage. Each method in the resulting method list is examined by determining the behavior of the parameter containing a ReifiedMessage. It is possible the parameter is assigned a different value or a method of the object is called. It is the task of the plugin to perform these service orientated queries while the Semantic Database and Metamodel perform and contain the general queries and data source. 7.1.5 Guidelines A general guideline is presented to which the application will adhere. By using a specified programming model the application will be consistent. Coding guidelines For the .NET platform there already is an exhausted design guideline created by Microsoft [58]. This guideline describes the naming of variables, methods, parameters etc. The API designed for this assignment will follow that guideline. Naming Conventions and Assemblies To provide a consistent programming model we have to place the code inside namespaces. Microsoft advises to use the following standard: CompanyName.TechnologyName[.Feature][.Design] In this case we will use UTwente for the company name and SemanticAnalyzer as the tech- nology. 78 Automatic Derivation of Semantic Properties in .NET
  • 101. 7.2 Semantic Model The metamodel and its API will be used from within the tools. The developers of these tools must program against the metamodel and not the extractors which depends heavily on the un- derlying supporting tools like the Phoenix framework. For this reason we place the metamodel in a separate assembly. Table 7.1 lists all the assemblies and their purposes. Each name begins with the UTwente.SemanticAnalyzer namespace1. Assembly name Purpose SemanticExtractorRail Provider using RAIL SemanticExtractorPostSharp Provider using PostSharp SemanticExtractorCecil Provider using Cecil SemanticExtractorPhoenix Provider using Phoenix SemanticModel Contains the metamodel and graph algorithms SemanticDatabase Database container and API to query the model SemanticLibrary Shared functionality, such as plugin interface and provider model SemanticPlugins Standard plugins which will process the metamodel SemanticComposeStarPlugins Specific plugins for Compose SemanticExtractorConsole Console application SemanticWinForms Windows Forms Application Table 7.1: Assembly naming 7.2 Semantic Model Information extracted by the Semantical Extractors is stored in a metamodel. This is an object oriented model containing classes representing the structure and the semantical information of the source code. The model can be stored in the SemanticalDatabaseContainer, which also provides an interface for searching through this model. Because multiple source languages can be converted to this model, we can not store any lan- guage specific information such as types. However, we do not want to lose that kind of infor- mation, so we have to convert this to a more general representation. The metamodel consists of elements used by most object oriented languages and it is up to the extractor to convert the specific language elements to the correct corresponding semantical elements in the model. 7.2.1 Overall Structure To make effective use of the model, we not only have to store the semantics, but also the location of these items in the original hierarchy. Meaning, if we have a function with certain behavior, we have to place this function in its context. A function will be in a class, and a class is in some sort of a container like an assembly (.NET) or JAR2 file (Java). We can distinguish three main elements in the metamodel; 1 Note; the current implemented code does not yet completely adhere to this standard. Also some class and function names are implemented using UK English instead of US English, used in this thesis. 2 JAR file (or Java ARchive) is a compressed file used to distribute a set of compiled Java classes and metadata. M.D.W. van Oudheusden 79
  • 102. 7.2 Semantic Model Unit This is the base for all the semantic items providing structure for the model. Such as classes, methods, and so on. Operand The element on which a mathematical or logical operation is performed. Like a field, a parameter, a local variable, or a constant value. Type Contains a language independent view to store type information. Besides these three core elements, there are elements to track source reference (the line numbers the action came from), attribute information, and so on. In the Section 7.2.5 a more detailed view of the model is given. 7.2.2 From Instructions to Actions An extractor not only builds the structure of the metamodel, like the layout of all the classes with their functions, it also converts the instructions, such as IL opcodes, to their semantical representations. Code is converted to actions and each action performs some sort of task. An action is represented in the model as a SemanticAction object and is placed inside a Semantical- Block. Both are elements of a function. Blocks The blocks are used for the control flow. Each block contains one or more actions and are linked together. Each block has a direct link to its previous and next block in the control flow. If the extractor supports exception handling extraction, we can also request the exception handling block for a specific block. That is the block being called when there was an exception. A simple example of the use of blocks is found in Listing 7.2. This for loop checks for a condition, performs an action, and returns to the condition check part. 1 int j = 0; 2 3 for (int i = 0; i < 10; i++) 4 { 5 j = j + i; 6 } 7 8 return; Listing 7.2: For loop in C#.NET Figure 7.3 shows the corresponding blocks for the code in Listing 7.2. The extractor is respon- sible for creating the blocks and connecting them to each other. If the information is available, the extractor can also indicate the start and end line number of the source code corresponding to a block. Blocks with no actions should be removed from the list of blocks and the links be- tween the blocks must be updated. It is possible an extractor has introduced more blocks than needed using its own control flow algorithm. 80 Automatic Derivation of Semantic Properties in .NET
  • 103. 7.2 Semantic Model Figure 7.3: Loop represented as blocks Actions If we look at the code in Listing 7.3, we see a number of different kind of actions. First the value of x is increased by two, second, the value of y is multiplied with three. A compari- son is performed on those two results and based on this comparison a branching operation is executed, either to label l1 or label l2. 1 if ((x + 2) > (y * 3)) { 2 // l1 3 } 4 else { 5 // l2 6 } Listing 7.3: Expression in C#.NET It is human readable in the source language C#.NET, but converted to IL code, shown in List- ing 7.4, it is more difficult to understand. 1 ldloc 0 2 ldc 2 3 add 4 ldarg 1 5 ldc 3 6 mul 7 cgt 8 brtrue l1 9 br l2 Listing 7.4: Expression in IL code We do not know what type of local variable the statement ldloc is loading. The result of the add operation is not stored in a variable, but only on the stack. Same with the result of the multiplication. Both values of the stack are used by the compare operation (cgt). The branching is more complex since it uses two different branch operations, one conditional branch and one unconditional jump. We would prefer a semantical representation of this expression in the way depicted by List- ing 7.5. 1 t1$ (loc) = add x(loc), 2(con) 2 t2$ (loc) = mul y(arg), 3(con) 3 t3$ (loc) = cmp t1$, t2$, gt 4 = branch (t3$, true), l1, l2 Listing 7.5: Semantical representation of the expression M.D.W. van Oudheusden 81
  • 104. 7.2 Semantic Model We can see directly the basic actions performed; adding, multiplying, comparing, and branch- ing. We also see the operands the actions are working on and the kind of operands, like a variable, argument, or constant value. The actions store their results in temporary variables, introduced by the Semantic Extractor. We can always trace the usage of an operand and where it originated from. Some actions do not only contain the source and destination operands, but also additional options. The compare action has information about the kind of comparison, like greater then (gt). The branch action has direct information where a true and false value of the corresponding comparison leads to. Since the Semantic Extractors are language dependent and thus know the language they are rea- soning about, they are responsible for converting one or more instructions to a corresponding action. Not all the instructions provide a meaningful action and as such they do not intro- duce new actions. On the other hand, multiple instructions can form one action. Loading two operands onto the stack, adding the values, and storing the results is represented as one add action. In Appendix D you can find all the available kinds of semantic actions the model can store. The arguments are listed with each item. Besides these arguments, an action can also have a link back to the original source line number. 7.2.3 Dealing with Operands An operand is the data used by the IL operation. An operation can use one operand, called a unary operation, or two operands, called binary. Some OpCodes do not use any operands, for instance a call to another function with no return value and no parameters. In our model we have four kinds of operands; Argument This is the argument, also called the parameter, of a function. Variable A local variable and it exists only inside a function. Field A variable defined outside the function or even outside a class. Constant A constant value, a value that does not change, such as a number or a text. The semantic actions can use these operands as their source or destination operands. It is obvious that a constant operand cannot be used as a destination as it is a read-only operand. We can access all the operands so we can follow them through a function. We might see a variable operand get a default value such as a constant operand, being used by another action and finally getting the value of an argument operand assigned to it. This is called data flow analysis. Each operand has a name and a type. The Semantic Extractor is responsible for assigning a unique name to each operand and determining a correct semantical type. A constant operand also contains the value it is assigned to, such as a number or a text. 82 Automatic Derivation of Semantic Properties in .NET
  • 105. 7.2 Semantic Model 7.2.4 Type Information Although types are directly related to a specific language, we still want to maintain type infor- mation in our model. This means we store type information in two ways; as a textual represen- tation and as one of a list of common types. It is up to the Semantic Extractor to map a language type to a common type representation in the model. Appendix E list all the possible common types the extractor can use. The original type is still conserved as text so applications capable of reasoning about the source language can use this representation. For example, the Compose plugin wanting to find the ReifiedMessage object can use the full name of the object type. A Semantic Type also contains metadata regarding the type, like if it is a class, an interface, a pointer, an array, and so on. Every operand has a Semantic Type, just as almost every unit in the model. 7.2.5 Model Layout The previous paragraphs gave some insight in the design of the model, more details are given in this section. Figure 7.4 shows a simplified view of the structure of the metamodel. This is a tree like structure and at the top there is a Semantic Container object. In .NET this is called an assembly, in Java JAR. A container holds zero or more classes. Each class can contain fields and operations. Operations are the methods or functions of a programming language. In .NET languages there are also properties, special functions which act as accessors for private fields. This kind of functions are actually normal functions with a prefix (get_ or set_) and a special reference in a metadata table. An operation has zero or more arguments, the parameters of a function. Inside the operation there are zero or more local variables and constants. If there are instructions in the function, then there will be one or more blocks with actions. The main unit blocks of the model all inherit from the SemanticItem class. This class has a collection of SemanticAttributes objects so custom attributes can be applied to any kind of item in the model. Figure 7.5 shows a class diagram of the SemanticItem class and its direct derived classes. The child class SemanticUnit contains the structural elements of the model. In Figure 7.6 a diagram is given of this class and its children. All the child classes have an interface and this interface is used by all the other components. Each unit also implements the interface IVisitable which contains an accept function with an IVisitor argument (see Figure 7.7). This visitor design pattern [31] can be used to visit all the elements in the model and process each element individual. Currently this is used for the search mechanism and the xml exporter. 7.2.5.1 SemanticContainer The SemanticContainer is the root element for the actual model and is like an assembly for .NET or a JAR file for Java. It has a name and filename of the original analyzed element. A strong typed collection of SemanticClass objects holds all the class information of the con- tainer. M.D.W. van Oudheusden 83
  • 106. 7.2 Semantic Model Figure 7.4: Structure of the metamodel. Blue is a unit, red is an operand and green are the parts actually providing semantical information. 7.2.5.2 SemanticClass A class is a collection of encapsulated variables (fields) and operations. Classes exists in a SemanticContainer and have a unique fully qualified name, type information, a scope and a collection of SemanticField and SemanticOperation objects. The scope indicates if this class is public or private accessible or if it is a shared class. Figure 7.8 shows the SemanticContainer and SemanticClass classes. 7.2.5.3 SemanticOperation A SemanticOperation represents a function containing a sequence of instructions. An oper- ation is contained inside a class and has a unique name in that class. If the operation returns data, then the type of this data is known. An operation has a number of collections; SemanticArguments The arguments or parameters of the function; SemanticVariables All the used local variables in the operation including the ,by the Semantic Extractor intro- duced, variables; SemanticConstants The constant values used in the operation; SemanticBlocks The blocks with the actions. 84 Automatic Derivation of Semantic Properties in .NET
  • 107. 7.2 Semantic Model Figure 7.5: Semantic Item and direct derived classes M.D.W. van Oudheusden 85
  • 108. 7.2 Semantic Model Figure 7.6: SemanticUnit and child classes 86 Automatic Derivation of Semantic Properties in .NET
  • 109. 7.2 Semantic Model Figure 7.7: Visitor pattern in the model Figure 7.8: SemanticContainer and SemanticClass classes M.D.W. van Oudheusden 87
  • 110. 7.2 Semantic Model Figure 7.9: SemanticOperation class 88 Automatic Derivation of Semantic Properties in .NET
  • 111. 7.2 Semantic Model Figure 7.10: SemanticOperand class and derived classes Figure 7.9 represents the SemanticOperation class with the associated collections. 7.2.5.4 SemanticOperand and Subclasses As indicated in Section 7.2.3 there are four kinds of operands; arguments, fields, local variables, and constants. They all inherit from the base class SemanticOperand which contains a name and type for the operand. Figure 7.10 shows the base class for the operands, the SemanticOperand class, and the child classes. The SemanticConstant class has the ability to store the constant value. The Se- mantic Extractor is responsible for suppling the correct value. The SemanticArgument and SemanticVariable allow for the specification of the sequence in the list since not always the name of the operand is available. In IL, these operands are addressed by their ordinal. M.D.W. van Oudheusden 89
  • 112. 7.2 Semantic Model Figure 7.11: SemanticBlock class with collection of SemanticAction 7.2.5.5 SemanticBlock The SemanticOperation class has a collection of SemanticBlock objects, which contain the SemanticAction objects. Figure 7.11 shows the SemanticBlock class with its col- lection of SemanticAction. Each block has a unique name inside the operation and a SourceLineNumber class provides a link back to the original source code. This can be useful when you want to add or replace code contained in the block. The class also contains functions to navigate to other blocks, such as the next and previous block in the sequence. This does not imply that the next block is also the next block in the control flow. If there is no control flow action (branching, returning, switching) as the last action in the block, then the next block is also the next one in the sequence. If the extractor has exception handling information, then it is possible to jump to the block which handles the exception for the current block. 7.2.5.6 SemanticAction This class contains the actual semantical action performed by one or more instruction in the source. A graphical representation of this class is found in Figure 7.12. Each SemanticAction object has an ActionType from the SemanticActionType enumeration. Based on the ActionType some additional properties have meaning. For instance, if the type is Branch, then the true and false label information makes sense. Also present are the source and destination operands. Only two source operands can exists, so 90 Automatic Derivation of Semantic Properties in .NET
  • 113. 7.2 Semantic Model Figure 7.12: The SemanticAction class with supporting types M.D.W. van Oudheusden 91
  • 114. 7.2 Semantic Model Figure 7.13: SemanticType class expressions with multiple operands have to be normalized to a binary form. 7.2.5.7 SemanticType The SemanticType class is the class used to store the type information. The extractor has to select a general type from the list in Appendix E (Table E.1) to map the source type to the corresponding common type. The extractor can also indicate if the type has a base type and/or implements interfaces. Figure 7.13 shows a graphical representation of this class. 92 Automatic Derivation of Semantic Properties in .NET
  • 115. 7.2 Semantic Model Figure 7.14: SemanticAttribute class 7.2.5.8 SemanticAttributes As described in Section 6.1.10 the IL support the concept of custom attributes, special metadata which can be applied to almost every item in the language. Other languages, such as Java where it is called annotations, also supports a similar system. The Semantic Metamodel supports this concept and allows the extractor to add multiple SemanticAttribute objects to all the SemanticItem types. As shown in Figure 7.14, the class has a SemanticType and a list of ISemanticOperand ob- jects. The operands are normally constant values used for setting the properties of the custom attribute. 7.2.6 Flow graphs The Semantic Metamodel does not only contain the data, it also provides graph func- tionality to work with the data. The SemanticOperation class has a function called RetrieveSemanticControlFlow, which generates a control flow graph for the actions in- side the operation. It uses a class called SemanticFlowGenerator which contains all the flow operations. We can also call this class directly by supplementing a SemanticOperation as parameter in the constructor. The function GenerateSemanticControlFlow in this class generates the control flow and re- turns this graph in the form of a SemanticControlFlow object. The SemanticControlFlow has a collection of FlowBlock objects and a start and end FlowBlock. Each FlowBlock has a unique name, a collection of SemanticBlock objects with the actions, and three lists. One has all the successors and the other one has the predecessors. The third list contains all the FlowBlock objects, which are control dependent for the current flowblock, e.g., these flow- blocks contain branching and can thus control if the flowblock is reached. The successors and predecessors are represented as FlowEdge objects. They indicate the target flowblock and the reason for the link. This reason can be conditional, unconditional, exception or fall through. The first two are used when the successor block is the result of a branch (con- ditional) or a jump (unconditional) action. The exception is used when an exception is raised before getting to the flowblock. The fall through simply connects blocks which follow each other in the normal sequence. M.D.W. van Oudheusden 93
  • 116. 7.2 Semantic Model Figure 7.15: The flow classes 94 Automatic Derivation of Semantic Properties in .NET
  • 117. 7.2 Semantic Model All the classes for the generation of flow graphs are depicted in Figure 7.15. The GenerateSemanticControlFlow functions uses various algorithms to generate the Semantic FlowGraph. It begins by splitting the SemanticBlock objects in the operation into flow blocks. The splitting is based on the control flow actions as described in algorith 1. input : A SemanticBlocks collection output: A collection of FlowBlocks semanticBlock ← SemanticBlocks[0];1 while semanticBlock = null do2 foreach semanticAction ∈ semanticBlock do3 hasLink ← false;4 switch actionType do5 case Jump6 Add unconditional flowedge;7 hasLink ← true;8 break;9 case Branch10 Add conditional flowedge to true block;11 Add conditional flowedge to false block;12 hasLink ← true;13 break;14 case Switch15 Add conditional flowedge for all switch labels;16 hasLink ← true;17 break;18 end19 end20 end21 if has exception handler link & different exception handler then22 hasLink ← true;23 Add exception flowedge;24 if next SementicBlock exists then25 Add fall through flowedge;26 end27 end28 if hasLink last SemanticBlock then29 Add flowBlock to output collection;30 else31 semanticBlock ← next block;32 end33 end34 Algorithm 1: GenerateSemanticControlFlow The next step is to connect the successors and predecessors. Algorithm 2 shows how this is M.D.W. van Oudheusden 95
  • 118. 7.2 Semantic Model executed. input : A flowBlock collection output: A flowBlock collection with connected successors and predecessors. // Connect successors;1 foreach flowBlock ∈ flowBlocks do2 foreach flowEdge ∈ flowBlock.Successors do3 Find SemanticBlock with target name;4 Set successor flowblock for flowEdge to found block;5 end6 end7 // Set the predecessors;8 foreach flowBlock ∈ flowBlocks do9 foreach flowEdge ∈ flowBlock.Successors do10 Add flowBlock to list of predecessors of the successor flowblock;11 end12 end13 Algorithm 2: Connect flow edges We now have the successors and predecessors of all the FlowBlocks and can create the control dependencies. Algorithm 3 shows how the DetermineControlDependency function is started and algorithm 4 describes the actual function. input : A flowBlock collection with connected successors and predecessors. output: A flowBlock collection with control dependency information foreach flowBlock ∈ flowBlocks do1 DetermineControlDependency (flowBlock);2 end3 Algorithm 3: Start DetermineControlDependency input : A flowBlock collection with connected successors and predecessors. output: A flowBlock collection with control dependency information // Search through all the predecessors;1 foreach flowBlock ∈ flowBlock.Predecessor do2 foreach flowEdge ∈ flowBlock.Successors do3 if flowEdge = conditional then4 Add to dependency list;5 end6 end7 if predecessor not visited before then8 DetermineControlDependency (flowBlock);9 end10 end11 Algorithm 4: DetermineControlDependency We also have a function to determine the flow paths, all the possible paths through the control 96 Automatic Derivation of Semantic Properties in .NET
  • 119. 7.2 Semantic Model flow. Algorithm 5 shows how these are calculated using the internal function BuildFlowPath. input : A flowBlock and a previousStack with flowblocks output: A stack with FlowBlocks that forms a control sequence if flowBlock.Successors.Count = 0 then1 return previousStack2 else3 foreach flowEdge ∈ flowBlock.Successors do4 Create temporaryStack;5 temporaryStack ← previousStack;6 if flowBlock ∈ temporaryStack then7 if flowEdge.Successor /∈ temporaryStack then8 Add flowBlock to temporaryStack;9 temporaryStack ← BuildFlowPath (flowEdge.Successor, temporaryStack);10 end11 else12 Add flowBlock to temporaryStack;13 temporaryStack ← BuildFlowPath (flowEdge.Successor, temporaryStack);14 end15 Add temporaryStack to global flow path collections;16 end17 end18 Algorithm 5: Determine Flow Paths We now have the flow paths and the control dependencies so we can add the access levels to the flow. An access level on a FlowBlock indicates the number of times the block is accessed. Some blocks are only accessed once, some are always accessed. A block in a loop with a con- dition at the start can be accessed multiple times, but a block in a loop with a condition at the end is accessed at least once and maybe more. Algorithm 6 describes how the access level is M.D.W. van Oudheusden 97
  • 120. 7.2 Semantic Model determined. input : A startFlowBlock, an endFlowBlock, flowPaths collection output: SemanticControlFlow with access levels foreach flowPath ∈ flowPaths do1 index ← flowPath.Count − 1;2 while index ≥ 0 do3 flowBlock ← flowPath [index ];4 if flowBlock = startFlowBlock flowBlock = endFlowBlock then5 accessLevel ← AtLeastOnce ;6 else7 Determine if flowBlock has itself as a parent;8 if HasParent then9 accessLevel ← MaybeMoreThenOnce ;10 else11 // Determine if flowPath is in a loop;12 loopBlock ← loop FlowBlock;13 if IsInLoop then14 if loopBlock has no control dependencies then15 accessLevel ← OnceOrMore ;16 else17 accessLevel ← MaybeMoreThenOnce ;18 end19 else20 if flowBlock has no control dependencies then21 accessLevel ← MaybeOnce ;22 end23 end24 end25 end26 index ← index − 1;27 end28 end29 Algorithm 6: Determine Access Levels Plugins can use the flow graph capabilities of the model directly in their own analysis. Each FlowBlock contains the Semantic Blocks and Actions so all the information is preserved. Al- though SemanticBlocks are usually also split on control flow conditions, the FlowBlocks pro- vide an optimized and more detailed representation of the control flow. The flow analysis classes can be extended with more flow generators such as data dependency graphs or call graphs. These are currently not implemented and the algorithms are also not optimized. 98 Automatic Derivation of Semantic Properties in .NET
  • 121. 7.3 Extracting Semantics 7.3 Extracting Semantics Extracting the semantics is the job of the Semantic Extractor. It reads and parses the source language, builds the metamodel and converts the statements to corresponding actions. 7.3.1 Semantic Extractor Class The SemanticExtractor class is responsible for the conversion. As described in section 7.1.4.1, the extraction system is designed as a provider pattern with SemanticExtractor as the base class. Applications wanting to convert code to the semantic model call this class, which on its turn uses a specific provider to do the actual transformation. This allows for the creation of multiple providers and switching between the providers is han- dled by specifying the provider name. The providers use their own tools to read the source code. Not only for .NET, but also providers for other source code like Java, Delphi, and so on can be created. Each provider must implement the SemanticExtractorProvider class and register itself in the configuration file (see Appendix C). The SemanticExtractor will then load this configu- ration file, initialize the providers and select the default provider. Figure 7.16: Semantic Extractor Classes Figure 7.16 shows the classes needed for the Semantic Extractor. An application can call the Analyze function with the assembly name as parameter. This is passed to the correct provider which returns a list of SemanticItem objects. Usually this will be a SemanticContainer as it is the root of the metamodel tree structure, but a provider can also chose to return a SemanticOperation or SemanticClass if this is more suitable. M.D.W. van Oudheusden 99
  • 122. 7.3 Extracting Semantics The application requesting the metamodel can work directly on the SemanticItem list or store the objects in the SemanticDatabaseContainer. By using the database container, the devel- oper has extra functionality to search and retrieve elements from the model. More about this container in Section 7.4. In Section 6.2, four different tools were introduced. These tools were used to create four differ- ent providers, which are discussed in the next sections. 7.3.2 Mono Cecil Provider The provider using Cecil, called SemanticProviderCecil, was relatively easy to create. Opening and reading an assembly is an one line statement. The Cecil library returns an AssemblyDefinition object where we can inquire for all the available types. By browsing the types and their properties we can generate the structure of the semantic metamodel. The important part is the method body. It provides direct access to the local variables de- clared inside the method. To help us iterate over the IL instructions, Cecil provides an analysis framework. This framework returns a control flow in the form of blocks with instructions. These blocks are mapped to SemanticBlocks and the instructions of a block are given to an AbstractInstructionVisitor. This visitor has a method for each OpCode and holds an operand stack. Stack loading instruc- tions, like loading constants, variables and so on, will place the operand onto the local stack after converting it to a SemanticOperand object. When, for instance, an And instruction is vis- ited, two operands will be popped from the stack and used as the source operands. The And instruction places a result on the stack so we store the SemanticAction object in a temporary variable. If a store instruction is visited, then it will find this previously stored action and assign the operand to store as the destination operand for the action. See Listing 7.6 for an example. 1 Stack<ISemanticOperand> operandStack; 2 private ISemanticAction prevAction; 3 4 public override void OnLdarg(IInstruction instruction) 5 { 6 // Load the argument indexed by the operand onto the stack 7 LoadArgument((int)(instruction.Operand)); 8 } 9 10 public override void OnAnd(IInstruction instruction) 11 { 12 // Create an Add action 13 ISemanticAction action = GetAction(SemanticActionType.And); 14 15 // Pop the two operands from the stack 16 action.SourceOperand2 = operandStack.Pop(); 17 action.SourceOperand1 = operandStack.Pop(); 18 19 // Store this action as the previous action 20 prevAction = action; 21 } 22 23 public override void OnStarg(IInstruction instruction) 24 { 25 // See if a previous action exists and assign the argument 100 Automatic Derivation of Semantic Properties in .NET
  • 123. 7.3 Extracting Semantics 26 // operand to the DestinationOperand property 27 if (prevAction != null) 28 { 29 prevAction.DestinationOperand = GetArgument((int)(instruction.Operand), _operation.SemanticArguments); 30 prevAction = null; 31 } 32 } Listing 7.6: Part of the Cecil Instruction Visitor Although a provider using Cecil could be created with little effort, it has some problems. We still have to retrieve the type of the operands. Sometimes this can be induced from the operator, such as OnLdc_I4 which loads a 32 bit integer onto the stack. However, in most cases it is difficult to get the correct type since this is not specified. The flow analysis library of Cecil had problems with more complex method bodies. For in- stance, methods with exception handling could not be converted to a block representation. Cecil could not cope with the new language elements of .NET version 2.0 such as generics. It is possible that this is corrected in a newer version of Cecil. 7.3.3 PostSharp Provider The PostSharp provider is not very different from the Cecil provider. It also has the possibility to create blocks separating the control flow. Instead of a Visitor pattern to visit each instruction, it uses an instruction reader, a stream of instructions. As long as there are instructions in the stream, we can read the current instruction, convert it and read the next one. 1 InstructionReader ireader = method.Body.GetInstructionReader(); 2 ForEachInstructionProcessBlock(b.RootInstructionBlock, ireader); 3 4 InstructionBlock iblock = b.RootInstructionBlock; 5 6 while (iblock != null) 7 { 8 ireader.EnterInstructionBlock(iblock); 9 InstructionSequence iseq = iblock.GetFirstInstructionSequence(); 10 ireader.EnterInstructionSequence(iseq); 11 12 while (ireader.ReadInstruction()) 13 { 14 Console.WriteLine("Read instruction {0}", ireader.OpCodeNumber); 15 if (ireader.OpCodeNumber == OpCodeNumber.Ldloc_0) 16 { 17 // perform actions based on OpCode 18 } 19 } 20 iblock = iblock.NextSiblingBlock; 21 } 22 method.ReleaseBody(); Listing 7.7: Using the instruction stream in PostSharp Listing 7.7 lists the statements needed to read the instructions in a method body. The Post- Sharp provider is not further developed. PostSharp was still in its early phases of development M.D.W. van Oudheusden 101
  • 124. 7.3 Extracting Semantics and did not always read assemblies correctly. Spending more time on this provider was not advisable at the time. At the time of writing of this thesis, PostSharp appears to be more mature and does support the .NET version 2.0 assemblies. 7.3.4 RAIL Provider RAIL, the Runtime Assembly Instrumentation Library, could not even load an assembly. The documentation and samples were, at the time, very limited and the program crashed fre- quently. The provider is thus not further implemented. 7.3.5 Microsoft Phoenix Provider Because Microsoft Phoenix was the best analysis tool to use at the time of implementation, a lot of effort has been put in the development of the Phoenix provider. It consists of two main parts, the provider itself and an analysis phase. When the provider is called to analyze an assembly, it will use a PEModuleUnit to read the assembly into memory (see Listing 7.8). It retrieves the Symbol for the unit and uses this symbol to walk through the assembly. Symbols are placed in the symbol tables of a unit and provide the metadata of the elements in a unit. 1 peModuleUnit = Phx.PEModuleUnit.Open(assemblyName); 2 peModuleUnit.LoadGlobalSyms(); 3 4 Phx.Syms.Sym rootSym = peModuleUnit.UnitSym; 5 6 WalkAssembly(rootSym); Listing 7.8: Loading the assembly using Phoenix The WalkAssembly method builds a SemanticContainer based on the information in the root symbol. It uses iterators to find all the sub elements like the classes, methods, fields, and so on. When the assembly walker finds a method it create a SemanticOperation object, re- trieves and assigns the parameters, return type, and other properties and adds this object to the SemanticClass it belongs to. To retrieve the message body and thus the instructions, the analysis phase is used. The function is raised to another representation level in the Intermediate Representation (IR). It will use a number of phases to perform this action and the provider adds our own phase to the list. Listing 7.9 shows the code performing this action. 1 // Get the FuncUnit and raise it 2 Phx.FuncUnit funcUnit = funcSym.FuncUnit; 3 funcUnit = peModuleUnit.Raise(funcSym, Phx.FuncUnit. LowLevelIRBeforeLayoutFuncUnitState); 4 5 // Prepare a config list 6 Phx.Phases.PhaseConfig config = 7 Phx.Phases.PhaseConfig.New(peModuleUnit.Lifetime, "temp"); 8 9 // Create a new phase 102 Automatic Derivation of Semantic Properties in .NET
  • 125. 7.4 Querying the Model 10 SemanticExtractorPhase semtexPhase = SemanticExtractorPhase.New(config); 11 semtexPhase.SemanticOperation = semOp; 12 Phx.Phases.PhaseList phaseList = Phx.Phases.PhaseList.New(config, "SemTex Phases"); 13 14 // Add our phase and provide IR dump 15 phaseList.AppendPhase(semtexPhase); 16 phaseList.DoPhaseList(funcUnit); 17 18 // Place the result in the semOp variable 19 semOp = semtexPhase.SemanticOperation; Listing 7.9: Starting a phase for a function using the Phoenix Extractor The SemanticExtractorPhase is the class responsible for converting the instructions to ac- tions. As its input, it uses a FuncUnit, the fundamental compilation unit in Phoenix, represent- ing a function. Using the graph functionalities of Phoenix, a control flow graph is created. This provides us with the Semantic Blocks. Converting IL instructions to actions is similar to Cecil. An internal stack is used to keep track of all the stack loading and storing actions. Instructions using operands can retrieve the correct operand from this stack. Determining the type of the operand values is aided by Phoenix own type system. Phoenix is able to retrieve the type of the operands so this information can be stored in a SemanticType object. A huge advantages above Cecil and PostSharp. Furthermore, Phoenix is able to include information stored in the associated debug files, into the analyzed assembly. This gives us information about the names of the local variables, which is not stored in the IL code. Another usage of this debug file is the ability to link actions and blocks to points in the original code since line numbers are available in the debug file. Phoenix creates blocks, where each block ends with either a branch instruction or an instruction that can cause an exception and each block starts with a unique label. This gives us more blocks then intended, so a special block optimization is performed. The optimization algorithm is listed in Algorithm 7 and removes empty blocks, which are a result of Phoenix, and updates all the references to the correct blocks. As input we have a list of label identifiers and associated blocks. Phoenix provides a lot of functionality to analyze .NET assemblies. For instance, the control flow support, the ability to read the debug files, direct access to type information, and so on. However, Phoenix is still under development and documentation is scarce. Because it is used internally by Microsoft, the support and development are still going on and getting better. 7.4 Querying the Model The Semantic Extractor converts the source code to the Semantic Metamodel. An application can now use this model for further analysis. However, this means the application has to traverse the whole model each time it wants to search for an item. To facilitate the use of the metamodel, a database container is created, which has a mechanism for searching in the model. This section provides more details about the database and the search options. The next chapter provides some practical examples how this search system can be used. M.D.W. van Oudheusden 103
  • 126. 7.4 Querying the Model input : A collection of labels with associated blocks output: Optimized collection of Semantic Blocks // Remove blocks with zero actions;1 foreach semanticBlock ∈ semanticBlocks do2 if semanticBlock has no actions then3 Get the next block;4 Update all references to current block to the next block;5 Remove the empty block;6 end7 end8 // Connect the blocks;9 foreach semanticBlock ∈ semanticBlocks do10 // Connect the next and previous blocks;11 if not the first semanticBlock then12 Connect previous block to semanticBlock;13 end14 if not the last semanticBlock then15 Connect next block to semanticBlock;16 end17 // Connect the exception handling block;18 if exception handling block exists then19 Connect exceptionhandlerblock to semanticBlock;20 end21 // Connect the actions to the correct blocks;22 foreach semanticAction ∈ semanticBlock do23 switch semanticAction.ActionType do24 case Jump RaiseException25 Set the semanticAction.LabelName to the correct block;26 case Branch27 Set the semanticAction.TrueLabelName to the correct block;28 Set the semanticAction.FalseLabelName to the correct block;29 case Switch30 foreach SwitchLabel ∈ semanticAction.SwitchLabels do31 Set the semanticAction.SwitchLabel to the correct block;32 end33 end34 end35 end36 end37 Algorithm 7: Optimization of Semantic Blocks 104 Automatic Derivation of Semantic Properties in .NET
  • 127. 7.4 Querying the Model Figure 7.17: SemanticDatabaseContainer class 7.4.1 Semantic Database The Semantic Database library contains the classes to store SemanticItem objects, which are the root of the semantic objects and provides a system to search in the metamodel. The SemanticDatabaseContainer holds a collection of SemanticItem objects indexed by their object hash code. An application wanting to store a metamodel in this database, calls the StoreItem function and passes the object to store as its parameter. The functions GetContainer and GetContainers provide direct access to the SemanticContainer objects in the database. The Query function allows the developer to search in the metamodel. Finally, the ExportToXml function exports the metamodel to an eXtensible Markup Language (XML) format. The purpose of this function is not to provide save functionality so the model can be loaded back from file, but only to create a more user friendly view of the model. Full support for serialization can be added, but is currently not implemented. Figure 7.17 shows the SemanticDatabaseContainer class and the predicate used by the query function. 7.4.2 What to Retrieve First we have to consider what type of information, and in what way, we want to retrieve from the Semantic Metamodel. For instance, we might want to find all the assignments of an argument to a global field (a setter method). This assignment resides inside a semantic operation. From this it is clear that we must search for an action with an action type of assignment inside an operation where the source operand of this action is an argument and the destination operand is of type field. The M.D.W. van Oudheusden 105
  • 128. 7.4 Querying the Model names of the source and destination are not important. We also do not know in which class this operation may be so we do not specify this. This should return a list of actions of the type assignment. Each action has a link to the block it resides in and the name of the operation, class and container. Using this information it is possible to find other elements related to this action. Instead of searching for actions it should also be possible to search for all blocks with an assignment action inside. This will return a list of all the blocks. At that point we can search for all the actions inside one of those blocks. The same applies for operations. By retrieving the link to the operation from an action we can get all the actions inside the operation. How to find out if an action depends on a comparison? This is based on the block where the action resides in. The action has a link to its parent block. Using this block, we have information about the control flow and can find which actions (the branching actions) lead to this block. Instead of retrieving the block we can also retrieve all the actions of the operation which are referring to this block. If there are none, then there was no branching to this block and the action inside. If the search returns actions which link to this block, we can find out if these branching actions where conditional or not. If so, we can retrieve the comparison type and the values used by the compare function. Of course it is also possible to retrieve information like which variables are declared inside an operation or what are the arguments of an operation. This are not directly semantically related data but can be needed for reasoning about the semantics. We could create a query to select all the operations where we look at the arguments and one of those arguments should be of type ReifiedMessage. Since we do not have types in the semantic model we pass this type as a string. As you can see there are three parts in the query needed to retrieve the correct data. The three types are as follows: What we want to return This is one of the child objects of the SemanticItem object. We always have to indicate what type of objects we need back from the database; Where we want to search in Search in actions, blocks, operations, and so on. Even if we search in operations, we can still return classes which are the parents of these operations; What we want to search for This specifies the values of the properties of the element to search for. For instance; the type of the destination property must be a field and the action type itself must be a com- parison. Every public property of the element we search in can be used in this condition. Another requirement of the search system, is the ability to search in the search results. So we can use the returned values for further queries. This allows us to work with already found data, instead of searching for the same data with more detailed search parameters. 7.4.3 Query Options To retrieve the correct data we must ask the semantic database to retrieve this from the object store using some sort of interface. There are various ways to implement a querying mechanism which will be discussed with their positive and negative points. 106 Automatic Derivation of Semantic Properties in .NET
  • 129. 7.4 Querying the Model 7.4.3.1 Predicate Language A predicate is an expression that can be true of something. For example, the predicate HasAc- tionOfType(X) can be true or false depending on the value of X. By using this type of logic and combining predicates we can indicate what type of data we want returned. However due to the complexity of all the semantic information it is labor intensive to create a predicate language to cover all the elements. All the information has to be converted to predicates which will lead to many different predicates to capture all the semantics available. Changes to the model also means making changes to the predicate language. Furthermore, support for predicates in the .NET language is not directly available. We have to use some sort of Prolog interface to integrate predicate queries in the database search system. 7.4.3.2 Resource Description Framework The Resource Description Framework1 (RDF) is a general-purpose language for representing knowledge on the Web. An xml syntax is used to define triplets of information and so creating RDF Graphs. This leads to relations between elements identifiable using Web identifiers (called Uniform Resource Identifiers or URIs). RDF does provide a way of defining relations, and thus extra data about objects, and has a standardized framework for working with this data. However, setting up a model with all the information before hand is time consuming and inefficient. This would mean we have to convert the metamodel to another xml based model. It has an advantage for interoperability since xml is language independent format and supported on a wide variety of platforms. 7.4.3.3 Traverse Over Methods This technique allows the developer to call a function of the SemanticDatabaseContainer and passing the search values as parameters. This method will then return the found values. There are various ways to pass the values. These could be delegates to other functions, a text based value (e.g. “type=’parameter’”) or events which will be raised by this function. Internally it will traverse all the information in the metamodel, checks the information against the supplied arguments and return the found items. By using this method, the Semantic Database does not have to create a new representation of its data. It will look through all the data inside the metamodel and passes this data to the call back functions to do actual comparison. It also provides the developer of the plug- in a lot of flexibility because he can create its own comparison functions. This could also be a disadvantage because of the complexity. Retrieving information this way means multiple functions or events have to be generated. Changes in the model cannot be detected at compile time while using text based values. 1 http://www.w3.org/RDF/ M.D.W. van Oudheusden 107
  • 130. 7.4 Querying the Model 7.4.3.4 Object Query Language An Object Query Language (OQL) based on SQL, the Structured Query Language for databases, gives a mechanism to define a query using standard query elements like SE- LECT, FROM, WHERE etc. The queries will be parsed and converted to an AST. The parsing should be performed by the Semantic Database. The advantage of this system is the expressiveness of OQL. The three parts of our query can be expressed using the OQL expression. The SELECT indicates what we want to return, the FROM part indicates where we want to search and the WHERE part specifies the conditions. An OQL expression can be passed to the Semantic Database query function. Since it is actually a string, it is also possible to pass this over another transport medium like a web service, files, and so on. This means more separation between different platforms because of the independence of certain languages. A disadvantage of using OQL is the fact that the query is composed of strings. So it is not strong typed and syntax checking must be performed by the parser. Any detected exceptions are raised and returned to the caller at runtime. Keep in mind that the implementation will not be standard ODMG1 OQL since we have a predefined object database which will be searched through by the Semantic Database using its own mechanism. The OQL language is only used to pass a search string to the underlying query function. 7.4.3.5 Simple Object Database Access With the Simple Object Database Access (SODA)2 system a Query object is used to find data in the underlying objects store. An example using SODA in C# can be seen in Listing 7.10. 1 Query query = database.Query(); 2 query.Constrain(typeof(Student)); 3 query.Descend("age").Constrain(20).Smaller(); 4 IList students = query.Execute(); Listing 7.10: SODA example in C#.NET By using the constrain and descend methods on the query object it is possible to add con- straints to the query and look up descendants of objects. SODA enforces a more strict integration with the query object then OQL since OQL uses simply a string to pass all the information instead of object operations. However, the values passed to the constrain and descend methods are not type safe and only tested at runtime. 1 Object Data Management Group The Standard for Storing Objects. http://www.odmg.org/ or http:// java.sun.com/products/jdo/overview.html 2 http://sourceforge.net/projects/sodaquery/ 108 Automatic Derivation of Semantic Properties in .NET
  • 131. 7.4 Querying the Model 7.4.3.6 Native Queries Native queries are a relative new technique to define search queries [18, 17]. Microsoft is ex- perimenting with this in the form of LINQ1, which is expected to be released in .NET version 3.0. Most query systems are string based. These queries are not accessible to development envi- ronment features like compile-time type checking, auto-completion and refactoring. The de- velopers must work in two different languages; the implementation language and the query language. Native queries provide an object-oriented and type-safe way to express queries native to the implementation language. A simple example in C# 2.0 is shown in Listing 7.11, while the same example in Java is listed in Listing 7.12. 1 List <Student> students = database.Query <Student> ( 2 Delegate(Student student) { 3 return student.Age < 20 4 && student.Name.Contains("f"); 5 }); Listing 7.11: LINQ query example in C#.NET 1 List <Student> students = database.query <Student> ( 2 new Predicate <Student> () { 3 public boolean match(Student student){ 4 return student.getAge() < 20 5 && student.getName().Contains("f"); 6 } 7 }); Listing 7.12: LINQ query example in Java A native query uses anonymous classes in Java and delegates in .NET. The use of generics is recommended for a strongly typed return value. It is still possible to use native queries with .NET 1.1 by means of a special Predicate class. An example in C# 1.1 (in Java this will also apply) shows this construction (Listing 7.13). 1 IList students = db.Query(new StudentSearch()); 2 3 public class StudentSearch: Predicate { 4 public boolean Match(Student student) { 5 return student.Age < 20 6 && student.Name.Contains("f"); 7 } 8 }; Listing 7.13: LINQ query example in C#.NET 1.1 For a programmer, the native queries are a safe and quick way to create queries as they are in the same native language. There is however the dependency on newer techniques, like delegates and generics. Still older systems can be supported by using the Predicate class. 1 http://msdn.microsoft.com/data/ref/linq/ M.D.W. van Oudheusden 109
  • 132. 7.4 Querying the Model Note that the latter may result in a sort of Traverse over Methods functionality where special functions have to be made to support the querying. 7.4.3.7 Natural Language Queries A natural language query is one that is expressed using normal conversational syntax; that is, you phrase your query as if creating a spoken or written statement to another person. There are no syntax rules or conventions to learn. For instance a natural language query can be “what are the write operations on field X in class Y?”. Although this gives a lot of possibilities and flexibility in asking questions, it is very difficult to extract the essential information from this query. The query has to be analyzed and the semantics must be extracted. The actual search can only be performed when there is enough information about the meaning of the query. Because of the complexity of Natural Language Queries it is out of the scope of this assignment. 7.4.4 Native Queries in Detail Bases on the advantages and disadvantages of the query options, a decision was made to use native queries. Its major advantage is that we do not have to parse a text based query and check for all kinds of constraints. Another advantage is the native support of the developer environ- ment (IDE) for this type of query. We can have type safe, object-oriented, and refactorable code and possible errors are visible on compile time instead of runtime. We can use elements of the IDE like intellisense for automatically completing of code, and tooltips for information about the code. Microsoft is using native queries in LINQ, their Language Integrated Query system and it will be available in the next version of the .NET Framework as it requires a new compiler. This is needed because it not only depends on existing technologies in .NET version 2.0, but also needs new technologies. The techniques from version 2.0 are generics, delegates and anonymous methods and allow us to create type safe, inline queries. For example, Listing 7.11 shows the use of a generic to create a list consisting of Student objects and the delegate makes it possible to define an anonymous method inside the code, instead of creating a separate new method. In the next version of the .NET Framework, we will also have anonymous types, lambda expres- sions and extension methods. Anonymous types have no meaning during programming, but get a real type during the compilation and the type will be deduced from the contents of the anonymous type. A lambda expression is a form of anonymous method with a more direct and compact syntax. These lambda expressions can be converted to either IL code or to an expres- sion tree, based on the context in which they are used. Expression trees are, for instance, used by DLINQ, a system to convert native queries to database SQL queries. With extension meth- ods, existing types can be extended with new functions at compile time. This is used in the next version of .NET to add query operations to the IEnumerable<T> interface, the interface used by almost all the collections. LINQ also introduces a new query syntax, which simplifies query expressions with a declar- ative syntax for the most common query operators: Where, Select, SelectMany, GroupBy, Or- 110 Automatic Derivation of Semantic Properties in .NET
  • 133. 7.4 Querying the Model derBy, ThenBy, OrderByDescending, and ThenByDescending. We can use the query expres- sions, the lambda expressions, or delegates to get the same results (see Listing 7.14). 1 IEnumerable<string> expr = 2 from s in names 3 where s.Length == 5 4 orderby s 5 select s.ToUpper(); (a) query expressions 1 IEnumerable<string> expr = names 2 .Where(s => s.Length == 5) 3 .OrderBy(s => s) 4 .Select(s => s.ToUpper()); (b) lambda expressions Listing 7.14: LINQ query examples LINQ can provide us with a technique to search in objects, however it is not yet available. A similar system, partly based on LINQ, is created for the Semantic Database. It uses the generics, delegates, and anonymous methods available in .NET version 2.0. Expression trees are not used and the query itself, called the predicate, is executed for each element, which complies to the requested type, in the metamodel. To determine which type of the semantic model should be searched in and what type should be returned, the query function requires two generic types as shown in Listing 7.15. 1 public ExtendedList<OutType> Query<InType, OutType>(Predicate<InType> match) Listing 7.15: Query function signature The InType determines the type we are looking for, such as a SemanticAction or a SemanticClass. This same type is also used in the predicate, a delegate returning a boolean. The predicate contains the query and must always return true or false based on some sort of comparison with the InType. OutType is used to indicate the type we want to return. We might be searching through all the actions inside an operation, but want to return the classes containing the found actions. This means we upcast the action to a class. It is not allowed to downcast so searching for classes and returning actions is not possible. The query function uses the visitor pattern to visit all the objects in the metamodel. The NativeQuerySearchVisitor is the visitor implementation responsible for executing the pred- icate for each relevant type in the model. The InType is used to determine if the visited element should be evaluated. If this is the case, the element is converted to the InType and the predicate is used to execute the query. If there is a match, the element is converted to the OutType and added to the result list. Listing 7.16 shows the code for this function. The conversion function, SafeConvert, knows about the relations between the elements and can perform the upcasting to other types. 1 private void CheckForMatch(SemanticItem si) 2 { 3 if (inType.Equals(si.GetType())) 4 { 5 InType x; 6 x = SafeConvert<InType>(si, false); 7 if (predicateMatch(x)) 8 { 9 OutType y; 10 y = SafeConvert<OutType>(si, true); 11 if (y != null) results.Add(y); M.D.W. van Oudheusden 111
  • 134. 7.4 Querying the Model 12 } 13 } 14 } Listing 7.16: Predicate matching An overloaded query function is available with a parameter used to provide a starting point for the search. This starting point is a SemanticItem and can be used to search in parts of the metamodel. The query function returns the found elements as a strong typed ExtendedList<T>. This is a extended List<T> class with functionality to further search in the results. LINQ uses method extensions to add the same functionality to the normal list classes, we have to create our own list. All the lists in the semantic model are an instance of the ExtendedList and as such are searchable. The ExtendedList<T> class is shown is Figure 7.18 and provides the following operators; Restriction operator Return a new ExtendedList filtered by a predicate using the Where operator. Projection operator The Select operator performs a projection over the list. Partitioning operators The Take, TakeWhile, Skip and SkipWhile operators partition the list by either taking elements from the list or skipping elements before returning the elements. Concatenation operator A Concat operator concatenates two lists together. Ordering operators Sorting of the lists is handled by the OrderBy and OrderByDescending operators. Grouping operators The GroupBy operator groups the elements of the list. Set operators To return unique elements, use the Distinct operator. An Union and Intersect oper- ator are used to return the union or intersection of two lists, while the Except operator produces the set difference between two lists. Conversion operators The ExtendedList can be converted to an array, dictionary, or a list. Equality operator To check whether two lists are equal, use the EqualAll operator. Element operators Used for returning the first, last or a specific element in the list. Generation operators Create a range of numbers or repeat the creation of an element a number of times. Quantifiers To check whether the list satisfies a condition for all or any element. Aggregate operators Used for counting the elements, getting the minimum or maximum, the sum of, or the average of values. A Fold operator applies a function over the elements in the list. Most of the operators return a new ExtendedList so it is possible to combine operators as shown in Listing 7.17. Because the ExtendedList<T> inherits from the List<T> class, it will 112 Automatic Derivation of Semantic Properties in .NET
  • 135. 7.4 Querying the Model Figure 7.18: ExtendedList class M.D.W. van Oudheusden 113
  • 136. 7.4 Querying the Model also contain the functionality specified in the base class. The ExtendedList contains full intel- lisense1 support to help developers understand the functions and the Semantic Metamodel has added functionality to make it efficient to use in the queries. For example, it contains function to check whether there is an operand, if an operand is a field, does an operation have a return type, and so on (see Figure 7.12). 1 ExtendedList<SemanticOperation> operations = 2 db.Query<SemanticAction, SemanticOperation>(delegate(SemanticAction sa) 3 { 4 return (sa.HasDestinationOperand && 5 sa.DestinationOperand.IsSemanticField && 6 sa.DestinationOperand.Name.Equals("name")); 7 }).Distinct(); Listing 7.17: Return all distinct operations containing actions assigning a value to a field named ”name” 1 A form of automated autocompletion and documentation for variable names, functions, and methods using metadata reflection inside the Visual Studio IDE. 114 Automatic Derivation of Semantic Properties in .NET
  • 137. CHAPTER 8 Using the Semantic Analyzer Chapter 7 provides us with the design of the Semantic Analyzer and introduces the query system based on native queries (Section 7.4.4). In this chapter, examples of the usage of the Semantic Database with the native queries are give, we examine the different applications using the ana- lyzer, and discuss the integration of the analyzer in the Compose /.NET project. 8.1 Semantic Database If the metamodel is stored in the Semantic Database, there three ways to retrieve it. Use the GetContainers function to get the root elements of the structure, implement a visitor to visit all the elements, or use the Query function to execute a search. The first method allows a developer to get access to the root element of the hierarchical tree of items in the model and he can iterate over all the elements. The second method does basically the same thing, but allows the developer to specify what type of action must be performed based of the different kinds of elements inside the model. The third method provides an effi- cient way to retrieve specific elements from the model and is described in more detail in this section. The Semantic Database uses native queries to specify the query expressions in the same native language as the development language. To execute a query, the search terms must be enclosed inside a delegate and supplied to the Query function of the SemanticDatabaseContainer. Instead of creating a separate delegate, it is possible to use anonymous methods, so the delegate can be passed to the Query function as a parameter. An example is displayed in Listing 8.1. 1 ExtendedList<SemanticAction> callActions = 2 db.Query<SemanticAction,SemanticAction> ( 3 delegate(SemanticAction sa) 4 { 5 return sa.ActionType==SemanticActionType.Call; 115
  • 138. 8.1 Semantic Database 6 }) 7 .OrderBy<string>(delegate(SemanticAction sa) 8 { 9 return sa.ParentBlock.ParentOperation.Name; 10 }); Listing 8.1: Search for all call actions and sort by operation name In Listing 8.1 a search for all the SemanticAction objects calling methods is performed. The delegate is used as an anonymous method directly as the parameter of the Query function. We indicate we want to search in all the SemanticAction objects in the model, by specifying this as the first generic type of the Query function. We want a list of SemanticAction objects to be returned, so we specify this type as the second generic type of the function (line 2). The delegate is the first parameter of the Query function and contains the actual query in the form of a boolean expression (lines 3–6). In this example, the ActionType of the SemanticAction must be a Call type. We also order the results by the name of the operation. As the Query function returns an ExtendedList<SemanticAction> we can access the OrderBy function of this list (lines 7–10). Again, we use a delegate to specify the string to order by. In this case, the semantic action has a link to its parent block and this block links back to the operation it resides in. From there, we can access the name of the operation and use it as the sort key. We can use all the public properties of the semantic items in our query. Special properties have been added to facilitate common tasks, like checking for null values, or determining the type of an operand. Some examples of this type of functions are HasDestinationOperand, HasArguments, IsSemanticField, or HasReturnType. Since the query is actually a boolean function, other operations can be included as long as the query returns a boolean. However, the query is evaluated for each selected type in the model and thus a query containing complex code is not advisable, due to processing costs. Listing 8.2 is another example and shows the ability to return a different type than searched for. 1 ExtendedList<SemanticOperation> ops = 2 db.Query<SemanticAction,SemanticOperation>( 3 delegate(SemanticAction sa) 4 { 5 return (sa.HasDestinationOperand && 6 sa.DestinationOperand.IsSemanticField && 7 sa.DestinationOperand.Name.Equals("value")); 8 } 9 ).Distinct(); Listing 8.2: Find all operations using a field named value as their destination operand In this example, we want to find all the operations with an action where the destination operand is a field named value. Because this field is the destination operand, it is potentially1 written to. We supply two different types to the Query function. The first one is the type we are looking for (actions) and the second one is the type we want to have returned (operations). In the 1 The action with the destination operand might be in a conditional block and as such is not always executed. 116 Automatic Derivation of Semantic Properties in .NET
  • 139. 8.1 Semantic Database search query, we indicate that the SemanticAction object must contain a destination operand, that this operand must be a field, and the name of this field must be equal to value. Keep in mind, that if we change the order of the elements, we can get null reference exceptions if we are trying to access non-existent destination operands. The Distinct command at line 9 signals the ExtendedList object to remove duplicate elements. Placing this command here has the same effect as applying it directly to the ops variable. To show the usage of the grouping operator, see Listing 8.3. This example finds all the actions performing a jump to another block and group these actions by the name of the operation they are in. Lines 12 to 19 show how to use the Grouping class to display these items. 1 ExtendedList<Grouping<string, SemanticAction>> groupedBy = 2 db.Query<SemanticAction, SemanticAction>( 3 delegate(SemanticAction sa) 4 { 5 return (sa.ActionType == SemanticActionType.Jump); 6 }) 7 .GroupBy<string>(delegate(SemanticAction a) 8 { 9 return a.ParentBlock.ParentOperation.Name; 10 }); 11 12 foreach (Grouping<string, SemanticAction> element in groupedBy) 13 { 14 Console.WriteLine("Jumps in operation {0}", element.Key); 15 foreach (SemanticAction sa in element.Group) 16 { 17 Console.WriteLine("--Jump to {0}", sa.LabelName); 18 } 19 } Listing 8.3: Group jump labels by operation name The last example, Listing 8.4, shows how to find all actions assigning a value to an operand of type integer. To determine the type, we use the SemanticType object of the operand and indicate we want the type to be an integer (line 8). The check for the existence of a destination operand (line 6) is not really needed, since an assignment always uses a destination operand. However, if the Semantic Extractor did not assign a destination operand, the query can raise an exception at runtime. 1 ExtendedList<SemanticAction> actions = 2 db.Query<SemanticAction, SemanticAction>( 3 delegate(SemanticAction sa) 4 { 5 return (sa.ActionType == SemanticActionType.Assign && 6 sa.HasDestinationOperand && 7 sa.DestinationOperand.SemanticType.CommonType == 8 SemanticCommonType.Integer); 9 } 10 ); Listing 8.4: Retrieve all the assignments where an integer is used M.D.W. van Oudheusden 117
  • 140. 8.2 Applications 8.2 Applications Besides the metamodel, the extractors and the database, two other programs were created. One is a command line utility primarily used for testing, the other is a Windows Forms application showing the contents of the metamodel and the control flow in a graphical way. The console application, called SemanticExtractorConsole, accepts as its command line argu- ments a number of assemblies, a list of plugins, and optional settings. It uses the default Semantic Extractor Provider to analyze the .NET assemblies, stores the results in the Semantic Database, and executes the plugins. Each plugin is called with a pointer to the database and can perform its own analysis. This tool is primarily used for testing the complete system because tasks can be automated using the command line switches and the plugins. Plugins are .NET assemblies implementing a certain interface. It is also possible to use a source file as a plugin and the console application will compile this source in-memory first. The source file must be a valid C# or VB.NET class implementing the plugin interface. The Windows Forms application provides a graphical user interface (GUI) to open an assembly. Internally, it calls the default Semantic Extractor Provider and creates a graphical representation of the model. Figure 8.1 shows a screenshot of this applications. In the tree view at the up- per left part of the window, the metamodel is displayed. By selecting an element in this tree, detailed information in the property grid in the lower left part is shown. Additional support for displaying the correct data and providing metadata like descriptions of each property has been added to the Semantic Metamodel using custom attributes. If the selected element is an operation, a control flow graph is created and displayed in the upper right part. If the user clicks on a flow block, then the contents (the blocks with actions) is listed in the actions pane. Figure 8.1: Windows Forms Semantic Analyzer application 118 Automatic Derivation of Semantic Properties in .NET
  • 141. 8.3 Plugins 8.3 Plugins Various plugins are created to test the system and to provide specific analysis tasks. Each plu- gin must implement the SemanticLibrary.Plugin.IPlugin interface as shown in Figure 8.2. Figure 8.2: Plugin interface The SemanticExtractorConsole calls the Execute function for each plugin with as its arguments the SemanticDatabaseContainer, which holds all the metamodels, and a dictionary of com- mand line options not interpreted by the console application itself. This allows the user to supply settings for the plugin. A plugin can write to the console window and when it has an exception, it can simply raise this exception so the SemanticExtractorConsole can handle the er- ror and display a message. The read-only IsEnabled property allows a plugin to be disabled when needed. As a result it will not be executed by the SemanticExtractorConsole application. Plugins provide a way to create separate analysis tasks and allow for the execution of multiple plugins with one command. They are not part of the metamodel nor the Semantic Database and are only used by the console application. The next sections give more details about the implemented plugins. 8.3.1 ReifiedMessage Extraction This plugin is used to provide a partially solution to the problem discussed in Section 4.1.2, the ReifiedMessage. The use of aspects on a certain piece of code, changes the behavior of this code. Not all the behavioral changes are expected and may even lead to conflicts [24, 72]. In Compose we use the FILTH module, with order specification, to compute all the possible orders of the filter modules and select one of those orders. The SECRET module, the Semantic Reasoning Tool, reasons about the possible semantic conflicts in the ordering of filter modules. It does this by analyzing all the possible actions of the filters, which either accept or reject a message. However, the meta filter introduces behavior that the SECRET module cannot handle, because the function called by the meta filter is defining the semantics of the filter. The function executed by the meta filter has an argument containing a ReifiedMessage object, representing the message. It can retrieve the target or the arguments, but also change the execution of the message, like resume the regular execution of the filterset or reply, which returns the message to the sender. Currently developers writing a function using a ReifiedMessage should define the behavior of this message as a custom attribute (see Section 6.1.10). This is a time consuming process, often M.D.W. van Oudheusden 119
  • 142. 8.3 Plugins not executed or updated and error prone. The Semantic Analyzer might be helpful here. The task of this plugin is to determine the behavior of the usage of the ReifiedMessage and report this to the developer. He/she can then add the custom attributes to the code. Because it is not possible to capture all the intended behavior, the custom attributes are not automatically inserted so they can be reviewed by the developer. We first need to find all the operations using a ReifiedMessage object in their arguments. List- ing 8.5 shows the query used to find this operations. The reifiedMessageType constant (line 7) contains the full name of the type used by Compose for the ReifiedMessage. 1 ExtendedList<SemanticOperation> ops = 2 db.Query<SemanticOperation, SemanticOperation>( 3 delegate(SemanticOperation obj) 4 { 5 foreach (SemanticArgument arg in obj.SemanticArguments) 6 { 7 if (arg.SemanticType.FullName.Contains(reifiedMessageType)) 8 return true; 9 } 10 return false; 11 }); Listing 8.5: Find operations using a ReifiedMessage The next step is is to iterate through all the found operations and get the specific argument containing the ReifiedMessage as shown in Listing 8.6. 1 // Find the argument 2 // p is one of the SemanticOperations in ops 3 4 SemanticArgument reifiedMessageArg = 5 p.SemanticArguments.First(delegate(SemanticArgument arg) 6 { 7 return (arg.SemanticType.FullName.Contains(reifiedMessageType)); 8 }); Listing 8.6: Find the argument using a ReifiedMessage If we have the specific argument containing the ReifiedMessage object, we have to look which actions in the operation are using this argument.We use a query, listed in Listing 8.7 to find those actions. This time we search for all the call actions because properties and functions of the ReifiedMessage object are used. Of course, this call action should have as its first argument the ReifiedMessage operand we found earlier. Since we do not want to specify that the action should be in the operation we are currently analyzing, we supply the operation itself (represented as the p variable at line 20) as the second parameter to the Query function. It is then used as the starting point for the native search visitor, creating a more time efficient search. 1 // Find the actions which are performing an operation on the argument 2 // passing the operation p as the start point 3 IList<SemanticAction> actions = 4 db.Query<SemanticAction, SemanticAction>( 5 delegate(SemanticAction sa) 6 { 7 if (sa.ActionType == SemanticActionType.Call) 8 { 9 if (sa.HasArguments) 10 { 120 Automatic Derivation of Semantic Properties in .NET
  • 143. 8.3 Plugins 11 // The first argument is the object on which we are 12 // calling the function 13 // This should be the reifiedmessage 14 return (sa.Arguments[0].IsSemanticArgument && 15 sa.Arguments[0].Equals(reifiedMessageArg)); 16 } 17 } 18 return false; 19 } 20 , p); // <-- we start at the operation 21 // so we do not specify the operation name, class etc Listing 8.7: Retrieve all the calls to methods of the ReifiedMessage argument The collections of actions is now used to determine the behavior of the reified message us- ing the rules specified by Staijen’s work [72]. For example; when the getTarget function is called, we add the target.read semantic to the list of semantics for this operation. A call to the reply function introduces the returnvalue.write, target.dispose, selector.dispose, args.dispose and message.return semantics. This temporary list is then converted to a custom attribute and shown to the developer. One of the future ideas of Staijen is to annotate methods invoked after a proceed call with a semantic specification. This is implemented in this plugin as shown in Listing 8.8, which also displays some of the possibilities of the ExtendedList object. 1 // Get all the semantic actions 2 ExtendedList<ISemanticAction> allActions = p.RetrieveAllSemanticActions(); 3 4 // Skip the first items until we find the proceed call 5 allActions = allActions.SkipWhile(delegate (ISemanticAction sa) 6 { 7 return !sa.ActionId.Equals(proceedActionId); 8 }); 9 10 // Filter for only call actions 11 allActions = allActions.Where(delegate (ISemanticAction sa) 12 { 13 return sa.ActionType == SemanticActionType.Call; 14 }); Listing 8.8: Retrieve other methods which should be analyzed after a proceed call If we find a proceed call, we store the unique action id so we can retrieve all the call actions occurring after the proceed action. The methods called by the found action are displayed to the developer so he/she can have a look at those methods. Tests using the ReifiedMessage Extraction plugin on the examples accompanying Compose showed that the plugin retrieves the same semantics as already specified in the examples. It does not had any problems in automatically inducing the behavior of the reified message based on the calls to the functions of this object. However, there is still no control flow information used in this analysis. Certain call actions may never happen or may be executed multiple times. Currently this cannot be expressed in the custom attribute and is not further implemented. To do this in the future, we can use the flow capabilities of the metamodel to generate a control flow with access level information. See Section 7.2.6 for more information about control flows. M.D.W. van Oudheusden 121
  • 144. 8.3 Plugins 8.3.2 Resource Usage The Resource Usage plugin creates a list containing all the operands used in an operation. Not only the name of the operand is shown, but also if the operand is created, read from or written to. The plugin uses the flow blocks generated by the control flow generator and adds the access level to the found operands. The result of this plugin will be in the form of the following output: Read/write set of checkEmpty [pacman.World]: ([SemanticVariable]j.create)1 ([SemanticVariable]i.create)1 ([SemanticVariable]V_2.create)1 ([SemanticConstant]C_0.read)1 ([SemanticVariable]i.write)1 ([SemanticConstant]C_1.read)* ([SemanticVariable]j.write)* ([SemanticField]screenData.read)* ([SemanticVariable]ArrayResB_8_Action0.write)* ([SemanticVariable]ArrayResB_8_Action0.read)* ([SemanticVariable]ArrayResB_10_Action0.write)* ([SemanticVariable]ArrayResB_10_Action0.read)* ([SemanticVariable]ResultOfB_11_Action0.write)* ([SemanticVariable]ResultOfB_11_Action0.read)* ([SemanticConstant]C_2.read)* ([SemanticVariable]V_2.write)* ([SemanticVariable]ResultOfB_11_Action1.read)* The variables are created first because the IL standard enforces this rule. Constant values are read and there are some read and write actions to the variables. The last character of each line indicate the access level as explained in the following list: 1 The operand is accessed at least once; ? The operand might be accessed. It could be conditional; * A conditional operand inside a loop. It can be accessed maybe more than once; + It is executed at least once and maybe more. A loop with a conditional at the end; 0 Unreachable code. It is never accessed. The plugin iterates through all the flow blocks of all the operations in the database. Each flow block contains semantic blocks with actions. The operands of each action are collected and if it is a destination operand it is a write action and if the operand is a source operand, then it is a read action. The access level of the flow block is added to the retrieved operand and the list is presented to the user. A collection of local variables is added at the top of the list as they are created when the operation is entered. The read/write sets returned from the examples are similar to the operands used in the actual source code. However, the Semantic Extractor and the compiler have introduced extra variables for normalization or optimization purposes. 122 Automatic Derivation of Semantic Properties in .NET
  • 145. 8.4 Integration with Compose 8.3.3 Export Types Compose uses a separate .NET application, called TypeHarvester, to retrieve the types from the source assemblies. The harvester uses .NET reflection to load and parse the assemblies, iterate through all the type information, and store this information in a types.xml file. All the information found with the reflection API is added to this file, which can become quiet large. The xml file is imported during the Compose compilation process and stored in the repository. A language model is created with this data to be used in the selector language. The export types plugin performs the same action but uses its metamodel as the source. It does not have all the properties reflection returns, but still have extensive type information available. The GetContainers function of the SemanticDatabaseContainer is called and returns a col- lection of SemanticContainer objects. Now we can use the collections inside the containers to find the classes, fields, operations, and so on. This plugin does not use the search functionality of the database, but only retrieves the top elements of the metamodel contents and use their properties to further export the data. The XmlTextWriter of the .NET Framework performs the actual writing to an xml file with the same structure as used by the TypeHarvester application. Although it does not export all the information, it does provide an example how the metamodel can be used to replace the TypeHarvester application and how the metamodel can be browsed instead of being searched. 8.3.4 Natural Language The metamodel provides information about the behavior of the functions in the form of actions. The natural language plugin converts the actions to a natural language representation for each operation. An example of the output of this plugin is listed below. Description of operation ’isPlaying’ in Jukebox.Player [Jukebox.Player] ======================================================================= Assigning SemanticField named playing to SemanticVariable Local0 Jumping to block B_6 where we perform the following: Exiting operation ’isPlaying’ and returning the contents of Local0 Each action and accompanying operands are converted to a textual representation. Jumps and branching operations are handled in such a way that the flow of the function is still visible and loops are only shown once. Although the practical use of this plugin is little, it can also serve as a basis for a conversion of actions to a more formal and standard representation. For instance, an xml representation or a standard for specifying software. Allowing for automatically checking of requirements to the actual implemented code is a more practical example. 8.4 Integration with Compose The Semantic Analyzer designed for this assignment is a general purpose analyzer and can be used for different kinds of tasks. However, one of the reasons to create this analyzer was to M.D.W. van Oudheusden 123
  • 146. 8.4 Integration with Compose provide more information to be used in Compose . As explained in the design limitations (Section 7.1.2) we only have access to a .NET assembly (with method implementations) at the end of the Compose compilation process. At that point, all the analysis tasks are already exe- cuted and semantical join points can no longer be selected. This makes it difficult to integrate the analyzer in the compilation process. Another problem is the mismatch between the .NET Framework versions of Compose /.NET and the Semantic Analyzer. The first runs under version 1.1 and the second uses version 2.0 be- cause it depends on Microsoft Phoenix, which needs the latest version of the runtime. Because of the dependency of Phoenix, which has rather large libraries (10 MB) and restrictive license, we cannot make the analyzer part of the standard Compose /.NET install. Based on the limitations we can make the following requirements for the integration; 1. The Semantic Analyzer should be optional. If it is not installed, then Compose must still work correctly; 2. The retrieved information must be placed in the internal database, called the repository, so the components of Compose can access the data from one central store; 3. The analyzer must extract the semantics of the usage of the ReifiedMessage object and supply this information in such a way the SECRET module can work with it; 4. Information about the resource usage of fields and arguments should be extracted from the sources and placed in the repository; 5. Per function, retrieve the calls made to other functions and store them in the repository. To be able to satisfy these requirements we had to make some changes to the compilation or- der of the modules in Compose /.NET. We first generate a weave file specification and call ILICIT [10] to weave the assemblies. At that point we can start the two SEMTEX modules. The DotNETSemTexRunner runs the SemanticExtractorConsole application (see Section 8.2) with a special Composestar plugin. This plugin creates an xml file with the analyzed data. The sec- ond module, DotNETSemTexCollector, loads the xml file and places the data in the Compose datastore. The next module to be executed is the SECRET module, which can now access the semantical data. Finally, we create a runtime repository and copy the assemblies to the output folder. If the DotNETSemTexRunner cannot find the required files, it will simply skip the creation of the xml file and an information message is shown to the developer indicating why this module is not executed and how to remedy this. When the next module, DotNETSemTexCollector, cannot find the xml file, it will continue to the following module. This takes care of point one of the requirements. The DotNETSemTexRunner uses the SemanticExtractorConsole application to analyze the assem- blies and execute a special plugin. This plugin contains elements of the various separate plug- ins described in Section 8.3. It determines the usage of the ReifiedMessage, the read, write and create actions of fields and arguments in the functions of the assemblies, and the calls to other functions. The techniques to retrieve this type of data are described in the Plugins section. An example of the resulting xml file is listed in Appendix F. The plugin satisfies requirements three, four and five. To add the information to the datastore of Compose , we extended the existing MethodInfo 124 Automatic Derivation of Semantic Properties in .NET
  • 147. 8.4 Integration with Compose class to hold additional collections of CallToOtherMethod objects, ResourceUsage objects and a list of strings containing the ReifiedMessage usage. The MethodInfo class is automatically stored in the datastore and can be used by the other components. The DotNETSemTexCollector class parses the generated semtex.xml file, creates the necessary objects such as CallToOtherMethod or ResourceUsage, assign the values from the xml file to the properties of these objects and adds them to the corresponding MethodInfo object already in the datastore. To be able to use the automatically extracted semantics of the ReifiedMessage in the SECRET module, we changed the MetaFilterAction class to use the found semantics stored in the MethodInfo object. It will still use the semantics defined by the developer first before import- ing the automatically created semantics. M.D.W. van Oudheusden 125
  • 148. CHAPTER 9 Conclusion, Related, and Future Work In this final chapter, the results of the research and implementation of the Semantic Analyzer are evaluated and conclusions are drawn. First, the related work is discussed and we end with a look at future work. 9.1 Related Work Automatically analyzing source code is certainly not a new field in computer science, there are various applications which use the resulting data. For example, finding design patterns in the source code, reverse engineer design documentation [69], generating pre- and postcondi- tions [53], verifying software contracts [9], or checking behavioral subtyping [30]. Some ana- lyzers are relatively simple, they might only count programming elements like the number of lines, methods, or classes or determine the complexity of functions. Other analyzers, like the Semantic Analyzer, convert the source code to a higher level view, a metamodel, to reason about behavior. This section discusses a number of analyzers, which are basically all static code analyzers work- ing with the semantics of the code. 9.1.1 Microsoft Spec# Spec# (Spec sharp)1 is a Microsoft Research project attempting to provide a more cost efficient way to develop and maintain high-quality software [6]. Spec# extends the C# programming language with specification constructions like pre- and postconditions, non-null types, checked exceptions, and higher-lever data abstraction. The Spec# compiler statically enforces non-null types, emits run-time checks for method contracts and invariants, and records the contracts as 1 http://research.microsoft.com/specsharp/ 126
  • 149. 9.1 Related Work metadata for consumption by other tools in the process. Another application, the Spec# static program verifier, generates logical verification conditions from a Spec# program and analyzes the verification conditions to prove the correctness of the program or find errors in the code. The language enhancements of Spec# are in the form of annotations the developer can add to the existing code. Of course, this is not possible for code in third party libraries, such as the .NET Framework Class Library. Microsoft is working on a project for semi-automatically generating contracts for existing code. Spec# is a tool for correctness and verification checks. Although a part of Spec# uses runtime checking (using inlined code for pre- and postconditions), the static checking is not much dif- ferent from the Semantic Analyzer. The static program verifier constructs a metadata view of the code in its own intermediate language, called BoogiePL. It consists of basic blocks with four kinds of statements: assignments, asserts, assumes, and function calls. Extensive analyz- ing systems processes the IL code and extracts additional properties which are added to the program in the form of asserts and assumes statements. An automatic theorem prover is then used to verify the conditions in the program. The verification systems of Spec# are very advanced, although the metamodel is not very differ- ent from the model used in the Semantic Analyzer. Our metamodel also contains basic building blocks with assignments and calls, but we lack the assume and assert statements. 9.1.2 SOUL A project similar to the Semantic Analyzer is the SOUL logic metaprogramming system [28]. This system is designed to reason on a metalevel, about the structure of object-oriented source code in a language independent way using logic meta-programming [21]. By using logic rules it is possible to reason about the structure of object-oriented programs. For the SOUL system the languages Smalltalk and Java are used. To reason about Smalltalk code, the SOUL system was built. It consists of a Prolog-like lan- guage and an associated logic inference engine [86]. Logic facts and rules could be used to query about Smalltalk programs. The mapping between the logic meta language and the object- oriented source language is handled by a metalevel interface (MLI) implemented as a hierarchy of classes. The reflection capabilities of Smalltalk are used to build the MLI and method body statements are converted to functors1 and can be queried for. A logic repository contains logic predicates to use with the reasoning engine. An extension to SOUL for the Java platform is called SOULJava2, which contains its own parser to convert Java code to the repository and adds new methods to the MLI, for instance to query for interfaces. Some applications of SOUL are detecting patterns like double dispatch or getters, or finding de- sign patters such as the visitor pattern. Using the logic rules, the LMI can be searched for certain constructions. For example, the rule for detecting getting methods is listed in Listing 9.1. 1 gettingMethod(?class,?method,?varname) if 2 method(?class,?method), 1 Functors are objects that model operations that can be performed. In their simplest form they are somewhat like function pointers. 2 Evolved to Irish; http://prog.vub.ac.be/˜jfabry/irish/index.html M.D.W. van Oudheusden 127
  • 150. 9.1 Related Work 3 methodSelector(?method,?gettingname), 4 instVar(?class,?varname), 5 gettingMethodName(?varname,?gettingname), 6 varName(?var,?varname), 7 methodStatements(?method,<return(?var)>) Listing 9.1: Selecting getters in SOUL using Prolog Idiom rules, like the methodSelector and gettingMethodName, are used to provide a mapping between the source language and the items in the model to cope with the language specific differences between Smalltalk and Java. SOUL and the Semantic Analyzer share some properties, like creating a higher level metamodel, being language independent, and providing a search mechanism. The implementation is how- ever different. The Semantic Analyzer has a language independent metamodel and the Semantic Extractors are responsible for the correct conversion, whereas SOUL uses idioms to deal with different language constructs. Furthermore, SOUL relies on the Prolog predicates for searching in the MLI. Our analyzer has a similar search system, but based on native queries. The ma- jor difference is the conversion of statements to actions, the behavioral representation, in the Semantic Analyzer. SOUL is more a syntactical than a semantical analyzer. 9.1.3 SetPoints SetPoints is a system designed to evolve structure-based pointcuts into semantical pointcuts, called SetPoints [3]. The SetPoints project identifies the same problems as described in the motivation chapter (Section 4.1.1); join points should be applied based on the behavior of the code and not on naming conventions. The SetPoint Framework for .NET tries to solve this problem by linking program semantics, or views, to source code through metadata elements, such as custom attributes. The pointcut definitions are based on these annotations. SetPoint uses OWL1 and RDF2 to represent the ontologies. Developers have to specify the relations between base code and the views of the system using custom attributes. The SetPoints developers are working on a version where Microsoft Phoenix is used to find program annotations. SetPoints differs from the Semantic Analyzer in a number of ways. SetPoints is primarily de- signed to define semantical join points and performs the actual weaving of aspects, whereas the Semantic Analyzer only provides a system to retrieve semantics. The later is a more general purpose semantics extractor and eventually could serve as an automated tool to retrieve the views for SetPoints. Selecting the correct join points is handled in SetPoint by RDF bindings. The Semantic Analyzer uses native queries to find information in the model and could be used to add semantical information to the selector language of the Compose project. 1 Web Ontology Language; http://www.w3.org/TR/owl-features/ 2 Resource Description Framework; http://www.w3.org/RDF/ 128 Automatic Derivation of Semantic Properties in .NET
  • 151. 9.1 Related Work 9.1.4 NDepend NDepend1 is a static analysis tool to generate reports, diagrams, and warnings about .NET assemblies. Internally it employs Cecil to parse IL byte code and create a representation of the code. Users can query the model with the Code Query Language (CQL), based on the SQL syntax. A separate tool, called VisualNDepend, offers a graphical user interface to edit CQL and shows the results using diagrams. Information about the inner workings of NDepend could not be found as it is not an open source project, but the Cecil library is used to analyze the assemblies and the CQL2 operates on the results to generate metrics. Some examples of CQL queries are listed in Listing 9.2. 1 WARN IF Count > 0 IN SELECT METHODS WHERE NbILInstructions > 200 2 ORDER BY NbILInstructions DESC 3 -- Warn if a method has more then 200 IL instructions 4 SELECT METHODS WHERE ILCyclomaticComplexity > 40 5 ORDER BY ILCyclomaticComplexity DESC 6 -- Return methods where the CC is greater then 40 7 SELECT TYPES WHERE DepthOfInheritance > 6 8 ORDER BY DepthOfInheritance DESC 9 -- Select the types with an inheritance level > 6 10 SELECT TYPES WHERE Implement "System.Web.UI.IDataSource" 11 -- Select the types implementing the IDataSource interface 12 SELECT TOP 10 METHODS WHERE IsPropertyGetter OR IsPropertySetter 13 ORDER BY NbILInstructions DESC 14 -- Return a top 10 of property setters or getters order by the number of instructions Listing 9.2: Examples of CQL queries NDepend offers a good framework for analyzing and querying assemblies. The CQL provides almost the same functionality as the native queries used in the Semantic Analyzer and the ex- amples in Listing 9.2 could also be rewritten to native queries to be used in our analyzer. The instruction level capabilities in NDepend are limited. There are some parameters available to use in search queries, such as finding property setters or IL complexity, but most of the search operations focus on metrics and are very structural oriented. The Semantic Analyzer goes fur- ther and allows searching for actions, the behavior of the code. 9.1.5 Formal Semantics Besides the static analyzers described above, there is also related work to be found if we con- sider the semantics itself. There are three major approaches to semantics [61, 49]; Operational semantics The computation of a construction specifies the meaning. How the effect of an operation is produced is important. Denotational semantics Mathematical objects are used to represent the effect of executing the constructs. The effect is important, not how it is obtained. 1 http://www.ndepend.com/ 2 CQL is defined in the standard located at http://www.ndepend.com/CQL.htm M.D.W. van Oudheusden 129
  • 152. 9.2 Evaluation and Conclusion Axiomatic semantics Assertions are used to express certain properties of a construct. One way to formalize the semantics of constructions using an operational explanation is Struc- tural Operational Semantics, also known as small steps semantics. Small steps semantics de- scribes the individual steps of the computations. There is also big step semantics, in which natural language is used to represent the overall results of the executions. With the denotational semantics, mathematical functions are used and we can calculate the effects of execution with a certain state using these functions. We do not look at the execution itself, but only at a mathematical abstraction of the program. Although it is relatively simple to reason with mathematical objects, converting a program to mathematics is not. The axiomatic semantics approach is often used for proving the correctness of a program. For instance, checking pre- and postconditions. The Semantic Analyzer uses a form of operational semantics. Like the Structural Operational Semantics, the emphasis is on the individual steps of the execution. However, our semantic model is more a means to allow other tools to reason about software than a formal specification of all the semantics in a source file. The control flow graph capabilities of the model and the availability of operand data are an important part in specifying the semantics. A related system, using both small step as big step semantics is presented in the paper “A formal executable semantics for Java” [5]. A system of 400 inference rules is used to describe the operational semantics of Java using the Typol logical framework. A syntactically correct Java program, an abstract syntax tree, is converted to a semantical representation and a graphical environment shows this representation. 9.2 Evaluation and Conclusion There were multiple reasons to create the Semantic Analyzer. Besides the general purpose static analyzer, three main issues were identified for the Compose project; the need for semantic join points, program analysis, and fine grained join points. After determining the meaning of semantics and how different kinds of semantic constructs are represented in the target language IL, it was possible to design the Semantic Analyzer. Ba- sically this system can be divided into three parts; extractors, the metamodel, and the search mechanism. The semantics used in the metamodel are based on the execution of their corresponding source code constructions. We distinguish the basic elements used by programming languages, like mathematical functions, control flow constructs, conversions, assignments, testing, and so on. However, there is no formal specification of this metamodel and there is no evidence that the model is correct and complete. The model is one of the elements in the whole Semantic Analyzer system. We use it to store the retrieved semantics and search for specific behavior. The added abilities to the metamodel, such as the control flow graphs and operand information, help us in determining the behavior of code. The emphasis of the model was more on the usability by the developers, than providing a complete formal model for semantics. The Semantic Analyzer uses a static analyzer instead of a dynamic one. As discussed before, 130 Automatic Derivation of Semantic Properties in .NET
  • 153. 9.2 Evaluation and Conclusion this has its advantages. For one, the source code does not have to be executed and we can use the Common Intermediate Language to cover a wide range of higher level .NET program- ming languages. Another advantage is the ability to compute all the possible paths in the code. However, we try to use static analysis to obtain runtime behavior. Certain information, such as the actual executed functions, the control flow paths taken, the real values of the variables are only known at runtime and can only be determined using dynamic analysis. At runtime it is possible to tell what the actual behavior of a function is. The behavior can only be guar- anteed for that specific execution run, because the user input can be different the next time. Static analysis was selected, because this could give all the possible occurring actions. If we combine the static analysis with dynamic analysis we might get additional information about polymorphism, inheritance, and dynamic binding, but at a higher effort. Dynamic analysis is, compared with static analysis, difficult to do because the application must be executed and all the paths must be visited to get a complete overview of the code. This is not practical to do and is very time consuming. To perform the actual code analysis, we use Semantic Extractors. The Provider design pattern allows us the select a specific provider and four different kinds of providers were created. Each provider reads the source code, parses it, builds up a metamodel, and converts the statements to actions. How this is implemented differs per provider. Four different types of IL readers were discussed in Section 6.2 and for each type a provider has been created. Parsing IL byte code is a complex operation because of the amount of metadata available. The four different IL readers can deal with this complexity and offer a form of code object model containing all the elements of the source code. At the time of implementation, the used IL readers were lim- ited and error prone. The only one really capable of getting all the information was Microsoft Phoenix. Although very advanced in its capabilities, it was difficult to use. The documentation was scarce, the samples were limited, and the system is still undergoing changes with every new release. The PostSharp system has evolved into a much better usable IL reader and it can be interesting to see if it is possible to use this reader to get the semantics. A problem with the Phoenix implementation was the extraction of custom attributes. This was not yet imple- mented in Phoenix and a trick involving the default reflection capabilities of .NET had to be used. This is relatively slow operation and better build-in support of Phoenix for attributes is advisable. Converting statements into actions is, even with Phoenix, still a difficult process. We raise the actual code back to a higher level representation, the metamodel. This means we are losing some information as we are combining statements into a more general action. It is up to the applications of the metamodel to use the data to deduce semantics from it. The model lacks actions such as assume and assert, which are present in Spec#. One can argue that the metamodel is not really a semantical model, but merely a simplified code representation. However, the available actions in the model represent the behavior of a function in the source code. As such, we can reason about the intended semantics of the function using the model. So the metamodel alone is not enough, we need applications operating on the model to do the real semantic analysis. The metamodel is only a general purpose collection of semantical related actions combined with control flow and operand information. The metamodel contains enough information to reason about code and the possible execution of code. We have information about the control flow and we can retrieve the data flow. Because of the normalization of the statements we can use the extensive operand information to track the usage of certain data. For example, we can follow an argument in a function and see the M.D.W. van Oudheusden 131
  • 154. 9.2 Evaluation and Conclusion possible changes made to that operand. The model also contains flow graph capabilities. A de- cision was made to include this type of functionality directly inside the model instead of using plugins. The flow graphing capabilities are frequently used operations. They use the blocks and actions to generate the flow blocks and need direct access to the model. The flow graph functions operate on parts of the model and as such should also be present in the model. They extend the models capabilities and are placed inside their own namespace in the metamodel library. Besides the actions, the model also contains a complete hierarchical structure representing the original code structure. The main purpose of this tree-like structure is to place the actions in context. Actions belong to a function, a function resides inside a class, and the class is inside a container. We need this type of information to map elements to source code, but it does not provide a direct semantical purpose. The question arises if it possible to place the function elements in another form of behavior describing the class. Strictly speaking, a class contains related functions with certain behavior. The combined behavior can be used as the parent of a single function. For example; a Car class can have the combined behavior of the function Brake, Accelerate, Go left, and Go right. Finding and representing this behavior however is a difficult problem. Not all the functions are implemented in this class, or they could be inherited, overridden, or in a base class. Creating a notation for the semantics of a class requires extended knowledge about the intended behaviors of all the functions, how they operate together, and what type of added functionality they provide to the class. To search the model, a number of different search techniques were discussed. Because of the capabilities of native queries, this search system was selected. Although searching in this man- ner is not new (Smalltalk has a similar system) it is currently gaining a lot of popularity in the .NET programming world. Mostly because Microsoft is developing LINQ, which is built on the underlying technique of native queries. The capabilities of the search function of the Semantic Database is very similar to LINQ, the actual implementation is not. Our database uses delegates to check each element in the database for a match. This is not an efficient method and it would be better to create some sort of an expression tree based on the query function and use indexes in finding the information. This entails the need to parse the native query and convert the statements to an expression tree. This solution was not chosen, because of the extra work involved. To optimize the database in the future, we could switch to LINQ when it is released or make use of an object-oriented database like DB4O1, which contains an extensive native query framework. A number of plugins were created to test the capabilities of the analyzer and to perform some of the tasks defined in the motivation chapter. One of the reasons to create the analyzer was to determine the behavior of the ReifiedMessage object. The plugin is now capable of reasoning about the usage and generates the correct semantics to be used by the SECRET module. Cur- rently it does not take into account if the behavior of the ReifiedMessage is conditional and how the control flow is organized. It is not that difficult to gather this type of information, but it is at the moment not possible to represent control flow data in the format used by SECRET. Another plugin provides information about the resource usage and depends heavily on the flow graph capabilities of the metamodel. Test runs with source code containing a large num- ber of control flow statements showed that the control flow path algorithm had some serious problems during the execution. The algorithms took a long time when the source code con- 1 http://www.db4o.com/ 132 Automatic Derivation of Semantic Properties in .NET
  • 155. 9.2 Evaluation and Conclusion tained multiple control flow paths (more than 50, so this were certainly not optimized methods to start with). It would be wise to invest in stronger and more efficient algorithms for control flow analysis. In addition, the flow graph capabilities can be extended by adding algorithms for data flow and method call flow. These frequently used types of flow analysis can then be accessed directly from the applications without implementing their own algorithm. By combining the plugins, it was possible to integrate the analyzer with Compose .NET. Be- cause of the dependencies on the Phoenix libraries and the .NET Framework 2.0 it was not feasible to package the analyzer directly with the default Compose installation and it is now an optional part. The information found by the plugin is added to the Compose repository, but the actual usage of this data is still limited. The SECRET module can now take the extra found semantics of the ReifiedMessage into account, but this is not extensively tested. In theory, the added semantical information should partly take care of one of the problems discussed in the motivation, namely the need for program analysis data. There is now more information available about the usage of variables, the methods being called, and the ReifiedMessage behav- ior, that this can be used to reason about potential conflicts and problems. However, it must be noted that there is currently not a module in Compose .NET which uses this data for further analysis. The information is available for future use. Another problem identified in the motivation chapter was the problem of semantic join points; the need to apply aspects based on semantical information instead of naming conventions or structural properties. The Semantic Analyzer alone does not solve this, but can certainly be used to help with this problem. The primary reason this is not yet implemented, is the compilation order used by Compose .NET. We only have access to a compiled .NET assembly with method bodies at the end of the process. At that time it is too late to perform the analysis and calculate the join points based on semantical properties. If we really want to use the Semantic Analyzer we will have to change the compilation process and perform multiple phases so we have direct access to the assemblies, extract the semantics, add those semantics to the selector language, and perform the actual weaving. Although at this time not yet implemented, the Semantic Analyzer can certainly be used for se- mantical join points determination. Semantical join points provide a better alternative to using syntactical selection criteria like naming conventions, an opinion shared by others. Gybels and Brichau [35] argue that we should write a crosscut as a specification in the form of patterns as close to the intent of the crosscut as possible to make the crosscut robust to program evolutions. Tourwe [78] proposes the need for more sophisticated and expressive crosscut languages. This can partially solve the fragile pointcut problem. Some possible solutions have been presented, for example SetPoint [3]. The problem with the semantic join points also applies to the fine grained join points, applying aspects at statement level. The compilation structure does not allow us to perform fine grained join point weaving, but this can be added in the future. While the statements are converted to actions, we still save information to map the action back to the source code in the form of line numbers and file names. The weaver can use this information to add additional code around the original statements based on certain actions. Automatically deriving semantical properties by analyzing the source code and the semantics of the source language can certainly not solve all the problems. Some intent is not present in the code but in the underlying system and getting a clear behavioral description of a function is not possible. A call to a method in another library does not give us any semantical information if we M.D.W. van Oudheusden 133
  • 156. 9.3 Future Work do not have the code of this function for further analysis. However, the Semantic Analyzer offers developers an extensive metamodel with basic behavioral actions, control flow information, and operand data to reason about the possible intended behavior of the code. 9.3 Future Work As with almost any research and software product, there is always room for more work. This section offers some possible suggestions for future work. 9.3.1 Extractors Currently there are four different Semantic Extractors, each with their own IL reader. Only the Phoenix extractor is working properly, although more testing is certainly advisable. There are also some improvements to be gained by using more Phoenix functionality like the control flow capabilities instead of creating our own algorithms. Using other IL readers, such as the updated PostSharp library or the .NET reflection capabilities in .NET version 2.0, it should be possible to create good working extractors for .NET assem- blies. More interesting is developing an extractor capable of converting other object-oriented languages to the semantic metamodel. For instance, analyzing Java source or byte code, or Bor- land Delphi (Object Pascal). The metamodel should be language independent and the extractor can map the language specific parts to the corresponding elements in the model. 9.3.2 Model The Semantic Metamodel is based on programming language constructions found in most lan- guages. However, there is no formal specification for the metamodel available and investing more time in this area can make the model more complete and concise. The model was largely designed with usability and flexibility in mind. The cooperation with the native query search mechanism is very extensive and using the model intuitive. This is achieved by the added functionality, metadata, and comments. Creating a more formal semantic model, using techniques described in [61] or [49], can lead to a more concise metamodel with better abilities to reason about than the one currently available. 9.3.3 Graphs Only control flow graphs are now available in the model. The algorithms used for this can certainly be optimized, but also other graph capabilities can be added. For instance data flow, to make it possible to track the flow of an operand in the model, or call graphs, to represent calling relationships among operations in the model. The metamodel should contain enough information to create these kinds of graphs. 134 Automatic Derivation of Semantic Properties in .NET
  • 157. 9.3 Future Work 9.3.4 Applications Only a small number of plugins make use of the Semantic Analyzer. It would be very interest- ing to develop more applications for the metamodel. From simple structural applications like calculation metrics, to more advanced behavioral tools. For instance, the detection of design patterns, automatically generating pre- and postconditions based on the contents of functions, checking for security, performance, or other problems, and so on. Another interesting application is to see if the model is really language independent by con- verting a program written in one language to a program written in another language using the metamodel. For example, using a .NET program as the input source and applying a plugin to generate a Java application with the same behavior. 9.3.5 Integration The integration with Compose .NET is now limited. The analyzer is used to extract some basic elements from the source assemblies, but the resulting information is not further applied in any Compose analysis task, other than SECRET. A part of the reason is the compilation process of Compose , making it difficult to get a complete assembly at the start of the analysis modules, but also the two different programming languages used. It is not possible to use the rich metamodel and search functions to get the exact data needed, but we have to use different subsystems to get the data. The analyzer is written in the .NET language C#, the Compose modules are in the Java lan- guage. Communication between the programs is handled by the use of xml files. It is not possible to directly call the Semantic Analyzer and work with the metamodel and native query search functions from within the Compose process. One solution for this problem is to port the analyzer to Java. The difficult part is the creation of good .NET parsers. The native query functionality can be ported to Java, but need the latest Java version supporting generics and anonymous methods to work correctly. Another solution is to create more extensive plugins to gather information from the source code. It might be wise to store the resulting data in another format than xml, for instance in an object-oriented database like DB4O or ObjectStore1, for performance reasons. 1 http://www.objectstore.com/datasheet/index.ssp M.D.W. van Oudheusden 135
  • 158. Bibliography [1] Ada. Ada for the web, 1996. URL http://www.acm.org/sigada/wg/web_ada/. [2] M. Aks¸it, editor. Proc. 2nd Int’ Conf. on Aspect-Oriented Software Development (AOSD-2003), Mar. 2003. ACM Press. [3] R. Altman, A. Cyment, and N. Kicillof. On the need for setpoints. In K. Gybels, M. D’Hondt, I. Nagy, and R. Douence, editors, 2nd European Interactive Workshop on Aspects in Software (EIWAS’05), Sept. 2005. URL http://prog.vub.ac.be/events/eiwas2005/ Papers/EIWAS2005-AlanCyment.pdf. [4] T. Archer and A. Whitechapel. Inside C#,Second Edition. Microsoft Press, Redmond, WA, USA, 2002. ISBN 0735616485. [5] I. Attali, D. Caromel, and M. Russo. A formal executable semantics for Java. Proceedings of Formal Underpinnings of Java, an OOPSLA, 98. [6] M. Barnett, K. Leino, and W. Schulte. The Spec# programming system: An overview. Con- struction and Analysis of Safe, Secure, and Interoperable Smart Devices: International Workshop, CASSIS, pages 49–69, 2004. [7] L. Bergmans. Composing Concurrent Objects. PhD thesis, University of Twente, 1994. URL http://trese.cs.utwente.nl/publications/paperinfo/bergmans.phd. pi.top.htm. [8] L. Bergmans and M. Aks¸it. Composing crosscutting concerns using composition filters. Comm. ACM, 44(10):51–57, Oct. 2001. [9] A. Beugnard, J. Jezequel, N. Plouzeau, and D. Watkins. Making components contract aware. Computer, 32(7):38–45, 1999. ISSN 0018-9162. doi: http://doi.ieeecomputersociety. org/10.1109/2.774917. [10] S. R. Boschman. Performing transformations on .NET intermediate language code. Mas- ter’s thesis, University of Twente, The Netherlands, Aug. 2006. [11] R. Bosman. Automated reasoning about Composition Filters. Master’s thesis, University of Twente, The Netherlands, Nov. 2004. 136 BIBLIOGRAPHY
  • 159. BIBLIOGRAPHY [12] B. Cabral, P. Marques, and L. Silva. RAIL: code instrumentation for .NET. Proceedings of the 2005 ACM symposium on Applied computing, pages 1282–1287, 2005. [13] W. Cazzola, J. Jezequel, and A. Rashid. Semantic Join Point Models: Motivations, Notions and Requirements. SPLAT 2006 (Software Engineering Properties of Languages and Aspect Technologies), March 2006. [14] E. Chikofsky and I. JH. Reverse engineering and design recovery: a taxonomy. Software, IEEE, 7(1):13–17, 1990. [15] Columbia University. The Columbia Encyclopedia, Sixth Edition. Columbia University Press., 2001. [16] O. Conradi. Fine-grained join point model in Compose*. Master’s thesis, University of Twente, The Netherlands, 2006. To be released. [17] W. Cook and S. Rai. Safe query objects: statically typed objects as remotely executable queries. Proceedings of the 27th international conference on Software engineering, pages 97– 106, 2005. [18] W. Cook and C. Rosenberger. Native Queries for Persistent Objects A Design White Pa- per. Dr. Dobbs Journal (DDJ), February 2006. URL http://www.cs.utexas.edu/˜wcook/ papers/NativeQueries/NativeQueries8-23%-05.pdf. [19] C. De Roover. Incorporating Dynamic Analysis and Approximate Reasoning in Declarative Meta-Programming to Support Software Re-engineering. PhD thesis, Vrije Universiteit Brussel Faculteit Wetenschappen Departement Informatica en Toegepaste Informatica, 2004. [20] F. de Saussure. Course in General Linguistics (trans. Wade Baskin). Fontana/Collins, 1916. [21] K. De Volder. Type-Oriented Logic Meta Programming. PhD thesis, Vrije Universiteit Brussel, 1998. [22] R. DeLine and M. Fahndrich. The Fugue protocol checker: Is your software baroque. Unpublished manuscript, 2003. [23] D. Doornenbal. Analysis and redesign of the Compose* language. Master’s thesis, Uni- versity of Twente, The Netherlands, 2006. To be released. [24] P. E. A. D¨urr. Detecting semantic conflicts between aspects (in Compose*). Master’s thesis, University of Twente, The Netherlands, Apr. 2004. [25] ECMA-335. Standard ECMA-335, 2006. URL http://www.ecma-international.org/ publications/standards/Ecma-335.htm. [26] T. Elrad, R. E. Filman, and A. Bader. Aspect-oriented programming. Comm. ACM, 44(10): 29–32, Oct. 2001. [27] M. D. Ernst. Static and dynamic analysis: Synergy and duality. In WODA 2003: ICSE Workshop on Dynamic Analysis, pages 24–27, Portland, OR, May 9, 2003. [28] J. Fabry and T. Mens. Language-independent detection of object-oriented design patterns. Computer Languages, Systems, and Structures, 30:21–33, 2004. [29] N. Fenton and S. Pfleeger. Software metrics: a rigorous and practical approach. PWS Publishing Co. Boston, MA, USA, 1997. M.D.W. van Oudheusden 137
  • 160. BIBLIOGRAPHY [30] R. B. Findler, M. Latendresse, and M. Felleisen. Behavioral contracts and behavioral sub- typing. In Proceedings of ACM Conference Foundations of Software Engineering, 2001. URL citeseer.ist.psu.edu/article/findler01behavioral.html. [31] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: elements of reusable object-oriented software. Addison Wesley, 1995. [32] H. Giese, J. Graf, and G. Wirtz. Closing the Gap Between Object-Oriented Modeling of Structure and Behavior. UML, pages 534–549, 1999. [33] M. Glandrup. Extending C++ using the concepts of composition filters. Master’s the- sis, University of Twente, 1995. URL http://trese.cs.utwente.nl/publications/ paperinfo/glandrup.thesis.pi.top.htm. [34] J. D. Gradecki and N. Lesiecki. Mastering AspectJ: Aspect-Oriented Programming in Java. John Wiley and Sons, 2003. ISBN 0471431044. [35] K. Gybels and J. Brichau. Arranging language features for pattern-based crosscuts. In Aks¸it [2], pages 60–69. [36] B. Harbulot and J. R. Gurd. Using AspectJ to separate concerns in parallel scientific Java code. In K. Lieberherr, editor, Proc. 3rd Int’ Conf. on Aspect-Oriented Software Development (AOSD-2004), pages 122–131. ACM Press, Mar. 2004. doi: http://doi.acm.org/10.1145/ 976270.976286. [37] W. Havinga. Designating join points in Compose* - a predicate-based superimposition language for Compose*. Master’s thesis, University of Twente, The Netherlands, May 2005. [38] A. Heberle, W. Zimmermann, and G. Goos. Specification and Verification of Compiler Frontend Tasks: Semantic Analysis. 04/96 Verifix Report UKA, 7, 1996. [39] F. J. B. Holljen. Compilation and type-safety in the Compose* .NET environment. Master’s thesis, University of Twente, The Netherlands, May 2004. [40] R. Howard. Provider Model Design Pattern and Specification, Part 1. Technical re- port, Microsoft Corporation, 2004. URL http://msdn.microsoft.com/library/en-us/ dnaspnet/html/asp02182004.asp. [41] R. L. R. Huisman. Debugging Composition Filters. Master’s thesis, University of Twente, The Netherlands, 2006. To be released. [42] S. H. G. Huttenhuis. Patterns within aspect orientation. Master’s thesis, University of Twente, The Netherlands, 2006. To be released. [43] E. International. Common language infrastructure (CLI). Standard ECMA-335, ECMA In- ternational, 2002. URL http://www.ecma-international.org/publications/files/ ecma-st/Ecma-335.pdf. [44] Jeffrey Richter. Type Fundamentals. Technical report, Microsoft Corporation, 2000. URL http://msdn.microsoft.com/msdnmag/issues/1200/dotnet/toc.asp. [45] Jython. Jython homepage. URL http://www.jython.org/. 138 Automatic Derivation of Semantic Properties in .NET
  • 161. BIBLIOGRAPHY [46] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G. Griswold. An overview of AspectJ. In J. L. Knudsen, editor, Proc. ECOOP 2001, LNCS 2072, pages 327–353, Berlin, June 2001. Springer-Verlag. [47] P. Koopmans. Sina user’s guide and reference manual. Technical report, Dept. of Computer Science, University of Twente, 1995. URL http://trese.cs.utwente.nl/ publications/paperinfo/sinaUserguide.pi.top.htm. [48] C. Koppen and M. St¨orzer. PCDiff: Attacking the fragile pointcut problem. In K. Gy- bels, S. Hanenberg, S. Herrmann, and J. Wloka, editors, European Interactive Workshop on Aspects in Software (EIWAS), Sept. 2004. URL http://www.topprax.de/EIWAS04/ EIWAS-Papers.zip. [49] S. Krishnamurthi. Programming Languages: Application and Interpretation. January 2006. URL http://www.cs.brown.edu/˜sk/Publications/Books/ProgLangs/. [50] J. Lefor. Phoenix as a Tool in Research and Instrumentation. 2004. URL http://research.microsoft.com/phoenix/Johns%20Phoenix%20Backgrounder% 2%0-%20External.doc. [51] S. Lidin. Inside Microsoft .NET IL Assembler. Microsoft Press, Redmond, WA, USA, 2002. ISBN 0-7356-1547-0. [52] C. Lopes, L. Bergmans, M. D’Hondt, and P. Tarr, editors. Workshop on Aspects and Di- mensions of Concerns (ECOOP 2000), June 2000. URL http://trese.cs.utwente.nl/ Workshops/adc2000/papers/adc2000_all.zip. [53] C. Marti. Automatic contract extraction: Developing a cil parser. Master’s thesis, ETH Z¨urich, September 2003. [54] Microsoft Corporation. Overview of the .NET framework. Technical report, Microsoft Corporation, 2003. URL http://msdn.microsoft.com/library/default.asp?url= /library/en-us/cpguide/html/cpovrintroductiontonetframeworksdk.asp. [55] Microsoft Corporation. What is the common language specification. Technical report, Mi- crosoft Corporation, 2003. URL http://msdn.microsoft.com/library/default.asp? url=/library/en-us/cpguide/html/cpconwhatiscommonlanguagespecification. asp. [56] Microsoft Corporation. .NET compact framework - technology overview. Technical report, Microsoft Corporation, 2003. URL http://msdn.microsoft.com/mobility/ prodtechinfo/devtools/netcf/overview/default.aspx. [57] Microsoft Corporation. What’s is .NET? Technical report, Microsoft Corporation, 2005. URL http://www.microsoft.com/net/basics.mspx. [58] Microsoft Corporation. Design Guidelines for Class Library Developers. Technical re- port, Microsoft Corporation, 2006. URL http://msdn.microsoft.com/library/en-us/ cpgenref/html/cpconnetframeworkdesignguidelines.asp. [59] Microsoft Corporation. Phoenix Documentation, 2006. [60] I. Nagy. On the Design of Aspect-Oriented Composition Models for Software Evolution. PhD thesis, University of Twente, The Netherlands, June 2006. M.D.W. van Oudheusden 139
  • 162. BIBLIOGRAPHY [61] H. R. Nielson and F. Nielson. Semantics with applications: a formal introduction. John Wiley & Sons, Inc., New York, NY, USA, 1992. ISBN 0-471-92980-8. [62] H. Ossher and P. Tarr. Multi-dimensional separation of concerns and the Hyperspace approach. In M. Aks¸it, editor, Software Architectures and Component Technology. Kluwer Academic Publishers, 2001. ISBN 0-7923-7576-9. [63] A. Popovici, T. Gross, and G. Alonso. Dynamic weaving for aspect-oriented programming. In G. Kiczales, editor, Proc. 1st Int’ Conf. on Aspect-Oriented Software Development (AOSD- 2002), pages 141–147. ACM Press, Apr. 2002. [64] A. Popovici, G. Alonso, and T. Gross. Just in time aspects. In Aks¸it [2], pages 100–109. [65] J. Prosise. Programming Microsoft .NET. Microsoft Press, Redmond, WA, USA, 2002. ISBN 0-7356-1376-1. [66] H. Rajan and K. J. Sullivan. Generalizing aop for aspect-oriented testing. In In the proceed- ings of the Fourth International Conference on Aspect-Oriented Software Development (AOSD 2005), 2005. [67] T. Richner. Recovering Behavioral Design Views: a Query-Based Approach. PhD thesis, Uni- versity of Berne, May 2002. [68] T. Richner and S. Ducasse. Recovering high-level views of object-oriented applications from static and dynamic information. Proceedings ICSM, 99:13–22, 1999. [69] C. D. Roover, K. Gybels, and T. D’Hondt. Towards abstract interpretation for recovering design information. Electr. Notes Theor. Comput. Sci., 131:15–25, 2005. [70] P. Salinas. Adding systemic crosscutting and super-imposition to Composition Filters. Master’s thesis, Vrije Universiteit Brussel, Aug. 2001. [71] D. R. Spenkelink. Compose* incremental. Master’s thesis, University of Twente, The Netherlands, 2006. To be released. [72] T. Staijen. Towards safe advice: Semantic analysis of advice types in Compose*. Master’s thesis, University of Twente, Apr. 2005. [73] D. Stutz. The Microsoft shared source CLI implementation. 2002. [74] P. Tarr, H. Ossher, S. M. Sutton, Jr., and W. Harrison. N degrees of separation: Multi- dimensional separation of concerns. In R. E. Filman, T. Elrad, S. Clarke, and M. Aks¸it, editors, Aspect-Oriented Software Development, pages 37–61. Addison-Wesley, Boston, 2005. ISBN 0-321-21976-7. [75] J. W. te Winkel. Bringing Composition Filters to C. Master’s thesis, University of Twente, The Netherlands, 2006. To be released. [76] F. Tip. A survey of program slicing techniques. Journal of programming languages, 3:121–189, 1995. URL citeseer.ist.psu.edu/tip95survey.html. [77] M. A. Tobias Rho, Gnter Kniesel. Fine-grained generic aspects. 2006. URL http://www. cs.iastate.edu/˜leavens/FOAL/papers-2006/RhoKnieselAppeltauer.pdf. [78] T. Tourw´e, J. Brichau, and K. Gybels. On the existence of the AOSD-evolution paradox. In L. Bergmans, J. Brichau, P. Tarr, and E. Ernst, editors, SPLAT: Software engineering Properties 140 Automatic Derivation of Semantic Properties in .NET
  • 163. BIBLIOGRAPHY of Languages for Aspect Technologies, Mar. 2003. URL http://www.daimi.au.dk/˜eernst/ splat03/papers/Tom_Tourwe.pdf. [79] M. D. W. van Oudheusden. Automatic Derivation of Semantic Properties in .NET. Mas- ter’s thesis, University of Twente, The Netherlands, Aug. 2006. [80] C. Vinkes. Superimposition in the Composition Filters model. Master’s thesis, University of Twente, The Netherlands, Oct. 2004. [81] N. Walkinshaw, M. Roper, and M. Wood. Understanding Object-Oriented Source Code from the Behavioural Perspective. Program Comprehension, 2005. IWPC 2005. Proceedings. 13th International Workshop on, pages 215–224, 2005. [82] D. Watkins. Handling language interoperability with the Microsoft .NET framework. Technical report, Monash Univeristy, Oct. 2000. URL http://msdn.microsoft.com/ library/default.asp?url=/library/en-us/dndotnet/html/interopdotnet.asp. [83] D. A. Watt. Programming language concepts and paradigms. Prentice Hall, 1990. [84] M. D. Weiser. Program slices: formal, psychological, and practical investigations of an automatic program abstraction method. PhD thesis, 1979. [85] J. C. Wichman. The development of a preprocessor to facilitate composition filters in the Java language. Master’s thesis, University of Twente, 1999. URL http://trese.cs. utwente.nl/publications/paperinfo/wichman.thesis.pi.top.htm. [86] R. Wuyts. A Logic Meta-Programming Approach to Support the Co-Evolution of Object- Oriented Design and Implementation. PhD thesis, 2001. URL citeseer.ist.psu.edu/ wuyts01logic.html. M.D.W. van Oudheusden 141
  • 164. APPENDIX A CIL Instruction Set In the following table, all the available operational codes of the CIL instruction set are listed with their description. OpCode Description nop Do nothing break Inform a debugger that a breakpoint has been reached. ldarg.0 Load argument 0 onto stack ldarg.1 Load argument 1 onto stack ldarg.2 Load argument 2 onto stack ldarg.3 Load argument 3 onto stack ldloc.0 Load local variable 0 onto stack. ldloc.1 Load local variable 1 onto stack. ldloc.2 Load local variable 2 onto stack. ldloc.3 Load local variable 3 onto stack. stloc.0 Pop value from stack into local variable 0. stloc.1 Pop value from stack into local variable 1. stloc.2 Pop value from stack into local variable 2. stloc.3 Pop value from stack into local variable 3. ldarg.s Load argument numbered num onto stack, short form. ldarga.s fetch the address of argument argNum, short form starg.s Store a value to the argument numbered num, short form ldloc.s Load local variable of index indx onto stack, short form. ldloca.s Load address of local variable with index indx, short form stloc.s Pop value from stack into local variable indx, short form. ldnull Push null reference on the stack ldc.i4.m1 Push -1 onto the stack as int32. ldc.i4.0 Push 0 onto the stack as int32. ldc.i4.1 Push 1 onto the stack as int32. ldc.i4.2 Push 2 onto the stack as int32. 142
  • 165. OpCode Description ldc.i4.3 Push 3 onto the stack as int32. ldc.i4.4 Push 4 onto the stack as int32. ldc.i4.5 Push 5 onto the stack as int32. ldc.i4.6 Push 6 onto the stack as int32. ldc.i4.7 Push 7 onto the stack as int32. ldc.i4.8 Push 8 onto the stack as int32. ldc.i4.s Push num onto the stack as int32, short form. ldc.i4 Push num of type int32 onto the stack as int32. ldc.i8 Push num of type int64 onto the stack as int64. ldc.r4 Push num of type float32 onto the stack as F. ldc.r8 Push num of type float64 onto the stack as F. dup Duplicate value on the top of the stack pop Pop a value from the stack jmp Exit current method and jump to specified method call Call method calli Call method indicated on the stack with arguments de- scribed by callsitedescr. ret Return from method, possibly returning a value br.s Branch to target, short form brfalse.s Branch to target if value is zero (false), short form brtrue.s Branch to target if value is non-zero (true), short form beq.s Branch to target if equal, short form bge.s Branch to target if greater than or equal to, short form bgt.s Branch to target if greater than, short form ble.s Branch to target if less than or equal to, short form blt.s Branch to target if less than, short form bne.un.s Branch to target if unequal or unordered, short form bge.un.s Branch to target if greater than or equal to (unsigned or unordered), short form bgt.un.s Branch to target if greater than (unsigned or unordered), short form ble.un.s Branch to target if less than or equal to (unsigned or un- ordered), short form blt.un.s Branch to target if less than (unsigned or unordered), short form br Branch to target brfalse Branch to target if value is zero (false) brtrue Branch to target if value is non-zero (true) beq Branch to target if equal bge Branch to target if greater than or equal to bgt Branch to target if greater than ble Branch to target if less than or equal to blt Branch to target if less than bne.un Branch to target if unequal or unordered bge.un Branch to target if greater than or equal to (unsigned or unordered) bgt.un Branch to target if greater than (unsigned or unordered) M.D.W. van Oudheusden 143
  • 166. OpCode Description ble.un Branch to target if less than or equal to (unsigned or un- ordered) blt.un Branch to target if less than (unsigned or unordered) switch jump to one of n values ldind.i1 Indirect load value of type int8 as int32 on the stack. ldind.u1 Indirect load value of type unsigned int8 as int32 on the stack. ldind.i2 Indirect load value of type int16 as int32 on the stack. ldind.u2 Indirect load value of type unsigned int16 as int32 on the stack. ldind.i4 Indirect load value of type int32 as int32 on the stack. ldind.u4 Indirect load value of type unsigned int32 as int32 on the stack. ldind.i8 Indirect load value of type int64 as int64 on the stack. ldind.i Indirect load value of type native int as native int on the stack ldind.r4 Indirect load value of type float32 as F on the stack. ldind.r8 Indirect load value of type float64 as F on the stack. ldind.ref Indirect load value of type object ref as O on the stack. stind.ref Store value of type object ref (type O) into memory at ad- dress stind.i1 Store value of type int8 into memory at address stind.i2 Store value of type int16 into memory at address stind.i4 Store value of type int32 into memory at address stind.i8 Store value of type int64 into memory at address stind.r4 Store value of type float32 into memory at address stind.r8 Store value of type float64 into memory at address add Add two values, returning a new value sub Subtract value2 from value1, returning a new value mul Multiply values div Divide two values to return a quotient or floating-point result div.un Divide two values, unsigned, returning a quotient rem Remainder of dividing value1 by value2 rem.un Remainder of unsigned dividing value1 by value2 and Bitwise AND of two integral values, returns an integral value or Bitwise OR of two integer values, returns an integer. xor Bitwise XOR of integer values, returns an integer shl Shift an integer left (shifting in zeros), return an integer shr Shift an integer right (shift in sign), return an integer shr.un Shift an integer right (shift in zero), return an integer neg Negate value not Bitwise complement conv.i1 Convert to int8, pushing int32 on stack conv.i2 Convert to int16, pushing int32 on stack conv.i4 Convert to int32, pushing int32 on stack 144 Automatic Derivation of Semantic Properties in .NET
  • 167. OpCode Description conv.i8 Convert to int64, pushing int64 on stack conv.r4 Convert to float32, pushing F on stack conv.r8 Convert to float64, pushing F on stack conv.u4 Convert to unsigned int32, pushing int32 on stack conv.u8 Convert to unsigned int64, pushing int64 on stack callvirt Call a method associated with obj cpobj Copy a value type from srcValObj to destValObj ldobj Copy instance of value type classTok to the stack. ldstr push a string object for the literal string newobj Allocate an uninitialized object or value type and call ctor castclass Cast obj to class isinst test if obj is an instance of class, returning null or an in- stance of that conv.r.un Convert unsigned integer to floating-point, pushing F on stack unbox Extract the value type data from obj, its boxed representa- tion throw Throw an exception ldfld Push the value of field of object, or value type, obj, onto the stack ldflda Push the address of field of object obj on the stack stfld Replace the value of field of the object obj with val ldsfld Push the value of field on the stack ldsflda Push the address of the static field, field, on the stack stsfld Replace the value of field with val stobj Store a value of type classTok from the stack into memory conv.ovf.i1.un Convert unsigned to an int8 (on the stack as int32) and throw an exception on overflow conv.ovf.i2.un Convert unsigned to an int16 (on the stack as int32) and throw an exception on overflow conv.ovf.i4.un Convert unsigned to an int32 (on the stack as int32) and throw an exception on overflow conv.ovf.i8.un Convert unsigned to an int64 (on the stack as int64) and throw an exception on overflow conv.ovf.u1.un Convert unsigned to an unsigned int8 (on the stack as int32) and throw an exception on overflow conv.ovf.u2.un Convert unsigned to an unsigned int16 (on the stack as int32) and throw an exception on overflow conv.ovf.u4.un Convert unsigned to an unsigned int32 (on the stack as int32) and throw an exception on overflow conv.ovf.u8.un Convert unsigned to an unsigned int64 (on the stack as int64) and throw an exception on overflow conv.ovf.i.un Convert unsigned to a native int (on the stack as native int) and throw an exception on overflow conv.ovf.u.un Convert unsigned to a native unsigned int (on the stack as native int) and throw an exception on overflow box Convert valueType to a true object reference M.D.W. van Oudheusden 145
  • 168. OpCode Description newarr Create a new array with elements of type etype ldlen Push the length (of type native unsigned int) of array on the stack ldelema Load the address of element at index onto the top of the stack ldelem.i1 Load the element with type int8 at index onto the top of the stack as an int32 ldelem.u1 Load the element with type unsigned int8 at index onto the top of the stack as an int32 ldelem.i2 Load the element with type int16 at index onto the top of the stack as an int32 ldelem.u2 Load the element with type unsigned int16 at index onto the top of the stack as an int32 ldelem.i4 Load the element with type int32 at index onto the top of the stack as an int32 ldelem.u4 Load the element with type unsigned int32 at index onto the top of the stack as an int32 ldelem.i8 Load the element with type int64 at index onto the top of the stack as an int64 ldelem.i Load the element with type native int at index onto the top of the stack as an native int ldelem.r4 Load the element with type float32 at index onto the top of the stack as an F ldelem.r8 Load the element with type float64 at index onto the top of the stack as an F ldelem.ref Load the element of type object, at index onto the top of the stack as an O stelem.i Replace array element at index with the i value on the stack stelem.i1 Replace array element at index with the int8 value on the stack stelem.i2 Replace array element at index with the int16 value on the stack stelem.i4 Replace array element at index with the int32 value on the stack stelem.i8 Replace array element at index with the int64 value on the stack stelem.r4 Replace array element at index with the float32 value on the stack stelem.r8 Replace array element at index with the float64 value on the stack stelem.ref Replace array element at index with the ref value on the stack conv.ovf.i1 Convert to an int8 (on the stack as int32) and throw an exception on overflow conv.ovf.u1 Convert to a unsigned int8 (on the stack as int32) and throw an exception on overflow 146 Automatic Derivation of Semantic Properties in .NET
  • 169. OpCode Description conv.ovf.i2 Convert to an int16 (on the stack as int32) and throw an exception on overflow conv.ovf.u2 Convert to a unsigned int16 (on the stack as int32) and throw an exception on overflow conv.ovf.i4 Convert to an int32 (on the stack as int32) and throw an exception on overflow conv.ovf.u4 Convert to a unsigned int32 (on the stack as int32) and throw an exception on overflow conv.ovf.i8 Convert to an int64 (on the stack as int64) and throw an exception on overflow conv.ovf.u8 Convert to a unsigned int64 (on the stack as int64) and throw an exception on overflow refanyval Push the address stored in a typed reference ckfinite Throw ArithmeticException if value is not a finite number mkrefany Push a typed reference to ptr of type class onto the stack ldtoken Convert metadata token to its runtime representation conv.u2 Convert to unsigned int16, pushing int32 on stack conv.u1 Convert to unsigned int8, pushing int32 on stack conv.i Convert to native int, pushing native int on stack conv.ovf.i Convert to an native int (on the stack as native int) and throw an exception on overflow conv.ovf.u Convert to a native unsigned int (on the stack as native int) and throw an exception on overflow add.ovf Add signed integer values with overflow check. add.ovf.un Add unsigned integer values with overflow check. mul.ovf Multiply signed integer values. Signed result must fit in same size mul.ovf.un Multiply unsigned integer values. Unsigned result must fit in same size sub.ovf Subtract native int from a native int. Signed result must fit in same size sub.ovf.un Subtract native unsigned int from a native unsigned int. Unsigned result must fit in same size endfinally End finally clause of an exception block leave Exit a protected region of code. leave.s Exit a protected region of code, short form stind.i Store value of type native int into memory at address conv.u Convert to native unsigned int, pushing native int on stack arglist Return argument list handle for the current method ceq Push 1 (of type int32) if value1 equals value2, else 0 cgt Push 1 (of type int32) if value1 > value2, else 0 cgt.un Push 1 (of type int32) if value1 > value2, unsigned or un- ordered, else 0 clt Push 1 (of type int32) if value1 < value2, else 0 clt.un Push 1 (of type int32) if value1 < value2, unsigned or un- ordered, else 0 ldftn Push a pointer to a method M.D.W. van Oudheusden 147
  • 170. OpCode Description ldvirtftn Push address of virtual method mthd on the stack ldarg Load argument numbered num onto stack. ldarga fetch the address of argument argNum. starg Store a value to the argument numbered num ldloc Load local variable of index indx onto stack. ldloca Load address of local variable with index indx stloc Pop value from stack into local variable indx. localloc Allocate space from the local memory pool. endfilter End filter clause of SEH exception handling unaligned. Subsequent pointer instruction may be unaligned volatile. Subsequent pointer reference is volatile tail. Subsequent call terminates current method initobj Initialize a value type cpblk Copy data from memory to memory initblk Set a block of memory to a given byte rethrow Rethrow the current exception sizeof Push the size, in bytes, of a value type as a unsigned int32 refanytype Push the type token stored in a typed reference Table A.1: CIL instruction set 148 Automatic Derivation of Semantic Properties in .NET
  • 171. APPENDIX B Evaluation Stack Types The execution engine of the common language runtime implements a coarse type system for the evaluation stack. Only the types listed in Table B.1 can be present on the stack. Type Description int32 Signed 4-byte integer native int Native integer, size dependent on the underlying platform int64 Signed 8-byte integer Float 80-bit floating point number (covering both 32 and 64 bit) & Managed or unmanaged pointer o Object reference Table B.1: Evaluation Stack types 149
  • 172. APPENDIX C Semantic Extractor Configuration File The Semantic Extractor uses a provider design pattern to specify the provider used to handle the calls to the base class. Different types of providers are defined and stored in the app.config file. The contents of this file is shown in Listing C.1. 1 <?xml version="1.0" encoding="utf-8" ?> 2 <configuration> 3 <configSections> 4 <section 5 name="SemanticExtractors" 6 type="SemanticLibrary.SemanticExtractorSection, SemanticLibrary" 7 allowLocation="true" allowDefinition="Everywhere" /> 8 </configSections> 9 <!-- Semantic Extractor Provider Settings --> 10 <SemanticExtractors defaultProvider="phoenix"> 11 <providers> 12 <clear/> 13 <add name="cecil" description="Mono Cecil 0.2" 14 type="SemanticExtractorCecil.SemanticExtractorCecil, 15 SemanticExtractorCecil" /> 16 <add name="phoenix" description="Microsoft Phoenix" 17 type="SemanticExtractorPhoenix.SemTexPhoenix, 18 SemanticExtractorPhoenix" /> 19 <add name="rail" description="Runtime Assembly Instrumentation Library" 20 type="SemanticExtractorRail.SemTexRail, SemanticExtractorRail" /> 21 <add name="postsharp" description="PostSharp reads .NET binary modules, 22 represents them as a Code Object Model, lets plug-ins analyze and 23 transforms this model and writes it back to the binary form." 24 type="SemanticExtractorPostSharp.SemTexPostSharp, 25 SemanticExtractorPostSharp" /> 26 </providers> 27 </SemanticExtractors> 28 </configuration> Listing C.1: Contents of the app.config file 150
  • 173. APPENDIX D Semantic Actions Table D.1 list all the available semantic action kinds and their properties. Action Properties Description Assign Source, destination Assignment of a value. The value of source is assigned to destination. Negate Source, destination Negates a value Add Source1, source2, destination Add the value of source1 to source2 and store result in destination Not Source1, destination Bitwise complement of the source Multiply Source1, source2, destination The values of source1 and source2 will be mul- tiplied and placed in the destination. Divide Source1, source2, destination The value of source1 will be divided by the value of the source2 and the result will be placed in the destination. Remainder Source1, source2, destination The remainder action divides source1 by source2 and places the remainder result in the destination. Subtract Source1, source2, destination The value of source1 will be subtracted from the value of source2 and the result is placed in the destination. And Source1, source2, destination An and operation is performed on the values of source1 and source2 and the result is placed in the destination. Or Source1, source2, destination An or operation is performed on the values of source1 and source2 and the result is placed in the destination. Xor Source1, source2, destination A bitwise Xor operation is performed on the values of source1 and source2 and the result is placed in the destination. 151
  • 174. Action Properties Description Jump Labelname A jump to the LabelName is performed. Branch ConditionAction, truelabel, falselabel A branch action is performed and based on the condition it will jump either to the TrueLabel- Name or the FalseLabelName. The condition can be found in the ConditionAction property. Compare ComparisonType, source1, source2, destination A comparison is performed on the source1 and source2 values using the ComparisonType. The resulted boolean value is placed in the destination. Create Destination, operationname An new instance or a new array is created. The new object or array is placed in the destination and in this operand you can find the type and name. If the create operation calls a construc- tor, then you can find this constructor in the operationname property. Convert Source, destination A conversion is performed on the source to the destination. The new type can be found in the destination properties while the old type infor- mation is still present in the source. Call OperationName, destination A call to another operation is made. The name of this operation can be found in the OperationName property. It’s return value, when available, will be placed in the destination. Return Source The control flow is returned to the calling op- eration. This means the end of the current op- eration. The return value can be placed in the source. RaiseException Source Raises an exception. The exception type is placed in source. Test Source1, source2, destination Test if source1 is equal to source2 and store the result in the destination. Switch Source1, SwitchLabels A switch construction where the source1 de- fines a value indication the label to jump to. The labels can be found in the SwitchLabels. Table D.1: Available semantic actions kinds 152 Automatic Derivation of Semantic Properties in .NET
  • 175. APPENDIX E Semantic Types Table E.1 list all the available semantic common type kinds. Type name Description Unknown It is an unknown type. If a type cannot be determined or is subject to change, set the semantic type to Un- known. Char A single character. String Strings are used to hold text. Byte Represents an 8-bit signed integer. Short Represents a signed 16-bit integer. Integer Represents a signed 32-bit integer. Long Represents a signed 64-bit integer. Float The float keyword denotes a simple type that stores 32-bit floating-point values. Double The double keyword denotes a simple type that stores 64-bit floating-point values. Boolean Represents a boolean value. Object Represents a general object. Unsigned Short Represents an 16-bit unsigned integer. Unsigned Integer Represents an 32-bit unsigned integer. Unsigned Long Represents an 64-bit unsigned integer. Unsigned Byte Represents an 8-bit unsigned integer. DateTime A date and/or time field. Table E.1: Available semantic common types 153
  • 176. APPENDIX F SEMTEX Generated File The ComposeStarAnalyserPlugin is called during the compilation process and analyzes the as- semblies created by the weaver. Basically it provides three main tasks; extracting the semantics of the ReifiedMessage, the resource usage of fields and arguments, and the calls made to other functions. See Section 8.4 for more information. The information is stored in an xml file so it can be imported and used by the other Compose modules. An example of this xml file is shown in Listing F.1. 1 <?xml version="1.0" encoding="utf-8"?> 2 <SemanticContainers> 3 <SemanticContainer name="pacman.ConcernImplementations.ScoreIncreaser" sourcefile ="Pacman/obj/Weaver/pacman.ConcernImplementations.ScoreIncreaser.dll"> 4 <SemanticClass name="pacman.ConcernImplementations.ScoreIncreaser"> 5 <FullName>[pacman.ConcernImplementations.ScoreIncreaser.dll]pacman. ConcernImplementations.ScoreIncreaser</FullName> 6 <BaseType>[mscorlib]System.Object - Object</BaseType> 7 <SemanticMethod name="increase"> 8 <ReturnType>System.Void</ReturnType> 9 <CallsToOtherMethods> 10 <Call operationName="getArgs" className="Composestar.RuntimeCore.FLIRT. Message.ReifiedMessage" /> 11 <Call operationName="toString" className="com.ms.vjsharp.lang.ObjectImpl" /> 12 <Call operationName="parseInt" className="java.lang.Integer" /> 13 <Call operationName="handleReturnMethodCall" className="Composestar. RuntimeCore.FLIRT.MessageHandlingFacility" /> 14 <Call operationName="setArgs" className="Composestar.RuntimeCore.FLIRT. Message.ReifiedMessage" /> 15 <Call operationName="append" className="java.lang.StringBuffer" /> 16 <Call operationName="getSelector" className="Composestar.RuntimeCore.FLIRT .Message.ReifiedMessage" /> 17 <Call operationName="append" className="java.lang.StringBuffer" /> 18 <Call operationName="handleReturnMethodCall" className="Composestar. RuntimeCore.FLIRT.MessageHandlingFacility" /> 19 <Call operationName="append" className="java.lang.StringBuffer" /> 154
  • 177. 20 <Call operationName="append" className="java.lang.StringBuffer" /> 21 <Call operationName="handleReturnMethodCall" className="Composestar. RuntimeCore.FLIRT.MessageHandlingFacility" /> 22 <Call operationName="println" className="java.io.PrintStream" /> 23 <Call operationName="handleVoidMethodCall" className="Composestar. RuntimeCore.FLIRT.MessageHandlingFacility" /> 24 </CallsToOtherMethods> 25 <ReifiedMessageBehaviour> 26 <Semantic value="selector.read" /> 27 </ReifiedMessageBehaviour> 28 <ResourceUsages> 29 <ResourceUsage name="message" operandType="SemanticArgument" accessType=" read" accessOccurence="AtLeastOnce" /> 30 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read " accessOccurence="AtLeastOnce" /> 31 <ResourceUsage name="this" operandType="SemanticArgument" accessType=" write" accessOccurence="AtLeastOnce" /> 32 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read " accessOccurence="AtLeastOnce" /> 33 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read " accessOccurence="MaybeMoreThenOnce" /> 34 <ResourceUsage name="message" operandType="SemanticArgument" accessType=" read" accessOccurence="MaybeMoreThenOnce" /> 35 <ResourceUsage name="message" operandType="SemanticArgument" accessType=" read" accessOccurence="MaybeMoreThenOnce" /> 36 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read " accessOccurence="MaybeMoreThenOnce" /> 37 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read " accessOccurence="MaybeMoreThenOnce" /> 38 <ResourceUsage name="this" operandType="SemanticArgument" accessType="read " accessOccurence="MaybeMoreThenOnce" /> 39 </ResourceUsages> 40 </SemanticMethod> 41 </SemanticClass> 42 </SemanticContainer> 43 </SemanticContainers> Listing F.1: Part of the SEMTEX file for the pacman example M.D.W. van Oudheusden 155