Data structures in c#

Data Structures in C#
SIVASANKAR GORANTLA

Asymptotic notation
 Before writing any program, we write some blueprint which is called as an
algorithm.
 We can have many solutions for each algorithm like A1, A2, A3 … etc
 Analyze the algorithm in terms of Time and Space complexity. Based on that
we will select the best algorithm.
 There are some notations created by scientists in order to denote these
complexities in simple terminology called as Asymptotic notation.
Types:
 Big oh notation (O notation) – Used to denote the worst case / upper bound of the
algorithm. We are always interested in this.
 Omega notation (Ω notation) – Used to denote the best case/ lower bound of the
algorithm
 Theta notation ( notation) – Used to denote average case of the algorithm
 Ex with array : 5,4,2,6,8,9 best case Ω(1), worst case O(n) , average analysis
(n/2) = (n)

Mostly used Asymptotic notations
constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n
2
)
cubic − Ο(n
3
)
polynomial − n
Ο(1)
exponential − 2
Ο(n)

What is ADT ?
 To manage the complexity of problems and the problem-solving process,
computer scientists use abstractions to allow them to focus on the “big
picture” without getting lost in the details.
 An abstract data type, sometimes abbreviated ADT, is a logical description of
how we view the data and the operations that are allowed without regard to
how they will be implemented.
 Example : List, Map
 One ADT can have several implementations

Example of ADT
 Lets consider the interface System.Collections.IList
 The basic operations, which it defines, are:
 int Add(object) – adds element in the end of the list
 void Insert(int, object) – inserts element on a preliminary chosen position
in the list
 void Clear() – removes all elements in the list
 bool Contains(object) – checks whether the list contains the element
 void Remove(object) – removes the element from the list
 void RemoveAt(int) – removes the element on a given position
 int IndexOf(object) – returns the position of the element
 this[int] – indexer, allows access to the elements on a set position

What is data structure and it’s need?
 Data structure is a systematic way of organizing data in order to use it efficiently.
 Choosing right data structure makes program much more efficient – We could
save memory and execution time. Sometimes even the amount of code that we
write.
Need:
 As applications are getting complex, data also getting increased. Due to this,
below are the three common problems that we are facing today.
 Data Search
 Processing Speed
 Multiple requests

Basic data structures in programming.
 Linear – these include arrays(Array), lists(ArrayList, List<T>), stacks(Stack<T>),
queues(Queue<T>) and linked lists(LinkedList<T>)
 Non-Linear:
 Dictionaries – key-value pairs organized in hash tables (HashTable and
Dictionary<T>)
 Tree-like – Tree, Binary tree, AVL tree, Spanning tree and Heap
 Sets – unordered bunches of unique elements
 Others – multi-sets, bags, multi-bags, priority queues, Graphs…

Motivation behind inventing the array
 Let’s say you have a requirement to store 100 values into the memory. How can
we store these many values into the memory with out using arrays.
 What is the basic thing required to store some value into the memory in high
level languages?
A variable, which holds the address location of the memory.
 In order to store 100 values into the memory, we need to create 100 variables
in the program ?
 100 variable is fine, what if you want to store/access 10000 elements ?

Array
 Arrays are one of the simplest and most commonly used data
structure in computer programming.
 All the elements of array must be of same type. Hence arrays are
homogenous (Why?)
 The contents of the array is stored in contiguous memory
block.(Why?)
 All the elements can be directly accessed with index. (How?)
 Let’s take an example to understand how array stored into the
heap.
 Ex: bool [] booleanArray;
FileInfo [] files;
booleanArray = new bool[10];
files = new FileInfo[10];

Two dimensional arrays
 Two dimensional arrays.
 For example , if I create multi dimensional array with mxn values then this is how it is
going to store the data in memory

 3D array :

Basic operation on Array
 Read elements by index O(1)
Ex: int valueAtIndexTwo = array[2];
 Write element by specifying the index
Ex: array[10] = 12; O(1)
 Search for an element by value O(n)
 Search for an element by value using Binary search O(log n) only
when array is sorted
 http://eli.thegreenplace.net/2015/memory-layout-of-multi-
dimensional-arrays/

Array analysis
 Ordering – Guaranteed
 Contiguous –Yes
 Direct access –Yes via index O(1)
 Look up efficiency – O(1)
 ArrayList has O(n) time complexity for arbitrary indices of add/remove, but O(1) for
the operation at the end of the list.
 The running time of an array access is denoted O(1) because it is constant. That is,
regardless of how many elements are stored in the array, it takes the same amount
of time to look up an element.
 This constant running time is possible solely because an array's elements are stored
contiguously, hence a lookup only requires knowledge of the array's starting
location in memory, the size of each array element, and the element to be
indexed.
 The .NET Framework does an automatic check on each element access attempt,
whether the index is valid or it is out of the range of the array.

Limitations of Array
 The size of the array is fixed while declaration itself.
 Can store only similar data items

Array List
 The ArrayList maintains an internal object array and provides
automatic resizing of the array as the number of elements added to
the ArrayList grows.
 Because the ArrayList uses an object array, developers can add any
type—strings, integers, FileInfo objects, Form instances, anything.
 Therefore, even if you have an ArrayList that stores nothing but value
types, each ArrayList element is a reference to a boxed value type,
as shown below.
 The boxing and unboxing, along with the extra level of indirection,
that comes with using value types in an ArrayList can hamper the
performance of your application when using large ArrayLists with
many reads and writes.

ArrayList memory representation

Basic operation on ArrayList
 Add(object) – adding a new element
 Insert(int, object) – adding a new element at a specified position
(index)
 Count – returns the count of elements in the list
 Remove(object) – removes a specified element
 RemoveAt(int) – removes the element at a specified position
 Clear() – removes all elements from the list
 this[int] – an indexer, allows accessing the elements by a given
position (index)

 ArrayList.Insert():
if (_size == _items.Length)
{
EnsureCapacity(_size + 1);
}
if (index < _size)
{
Array.Copy(_items, index, _items, index + 1, _size - index);
}
_items[index] = value;
_size++;
 Copies a range of elements from System.Array starting at the specified source index and pastes them to
another System.Array starting at the specified destination index. The length and the indexes are specified as 32-
bit integers.

 ArrayList.RemoveAt():
_size--;
if (index < _size)
{
Array.Copy(_items, index + 1, _items, index, _size - index);
}
Copy(sourceArray, sourceIndex, destinationArray, destinationIndex,
length, false);
 Copies a range of elements from an System.Array starting at the
specified source index and pastes them to another System.Array
starting at the specified destination index. The length and the indexes
are specified as 32-bit integers.

Analysis of ArrayList
 ArrayList has O(n) time complexity for arbitrary indices of
add/remove, but O(1) for the operation at the end of the list

Limitations of ArrayList
 The main problem with ArrayList is that is uses object - it means you
have to cast to and from whatever you are encapsulating.
 Implicit boxing will happen whenever you use a value type - it will
be boxed when put into the ArrayList and unboxed when
referenced.
 Since generics came in, this object has become obsolete and
would only be needed in .NET 1.0/1.1 code.

List<T>
 The List C# data structure was introduced in the .NET Framework 2.0 as part of
the new set of generic collections.
 The List<T> class is a generic equivalent type of ArrayList.
 It implements the IList<T>generic interface by using an array whose size is
dynamically increased as required.
 It keeps its elements in the memory as an array.
 It can be extremely efficient data structure when it is necessary to add elements
fast, extract elements and access the elements by index. Still, it is pretty slow in
inserting and removing elements unless these elements are at the last position.
 Represents a strongly typed list of objects that can be accessed by index.
Provides methods to search, sort, and manipulate lists.
 Elements in this collection can be accessed using an integer index. Indexes in
this collection are zero-based.

Operations on List<T>
 We already explained that the List<T> class uses an inner array for keeping
the elements and the array doubles its size when it gets overfilled. Such
implementation causes the following good and bad sides:
 - The search by index is very fast – we can access with equal speed each
of the elements, regardless of the count of elements.
 - The search for an element by value works with as many comparisons as
the count of elements (in the worst case), i.e. it is slow.
 - Inserting and removing elements is a slow operation – when we add or
remove elements, especially if they are not in the end of the array, we
have to shift the rest of the elements and this is a slow operation.
 - When adding a new element, sometimes we have to increase the
capacity of the array, which is a slow operation, but it happens seldom
and the average speed of insertion to List does not depend on the count
of elements, i.e. it works very fast.

Analysis of List
 Best for small list where direct access is required

Linked List
 A linked-list is a sequence of data structures which are connected together via
links.
 Linked List is a sequence of links which contains items.
 Each link contains a connection to another link. Linked list the second most used
data structure after array.
 Following are important terms to understand the concepts of Linked List.
 Link − Each Link of a linked list can store a data called an element.
 Next − Each Link of a linked list contain a link to next link called Next.
 LinkedList − A LinkedList contains the connection link to the first Link called First.

Advantages of LinkedList<T>
 The append operation is very fast, because the list always knows its
last element (tail).
 Inserting a new element at a random position in the list is very fast
(unlike List<T>) if we have a pointer to this position, e.g. if we insert at
the list start or at the list end.
 Searching for elements by index or by value in LinkedList is a slow
operation, as we have to scan all elements consecutively by
beginning from the start of the list.
 Removing elements is a slow operation, because it includes
searching.

Analysis of LinkesList
 Ordering – User has control over precise control over element over
ordering
 Contiguous – No
 Direct access – No
 Look up efficiency – O(n)
 Best for lists where inserting/deleting in middle is common and no
direct access required

Queue
 Queue is an abstract data type, in which the first element is inserted
from one end called REAR(also called tail), and the deletion of
existing element takes place from the other end called
as FRONT(also called head)
 This makes queue as FIFO data structure, which means that element
inserted first will also be removed first.
 The process to add an element into queue is called Enqueue
 The process of removal of an element from queue is
called Dequeue.
 The process of reading the element at head node is called Peek.

The Queue – Basic Operations
 Queue<T> class provides the basic operations, specific for the data
structure queue. Here are some of the most frequently used:
 - Enqueue(T) – inserts an element at the end of the queue
 - Dequeue() – retrieves the element from the beginning of the
queue and removes it
 - Peek() – returns the element from the beginning of the queue
without removing it
 - Clear() – removes all elements from the queue
 - Contains(T) – checks if the queue contains the element
 - Count – returns the amount of elements in the queue

.NET implementation of the Queue
 In C# queue is implemented using Circular buffer.
 Circular buffer: A circular buffer is a memory allocation scheme where memory is
reused (reclaimed) when an index, incremented modulo the buffer size, writes over
a previously used location.
 Internally it uses array to implement the queue. So it looks like this

.NET implementation of the Queue
 Is full : _tail = (_tail + 1) % _array.Length;
_head = (_head + 1) % _array.Length;
 Is Empty:

Analysis on Queue
 Enqueue : O(1)
 Dequeue : O(1)

Stack
 Stack is an abstract data type or a linear data structure, in which
last element will be removed first.
 This makes Stack as LIFO data structure, which means that element
inserted last will be removed first.
 The process to add an element into stack is called Push
 The process of removal of an element from stack is called Pop.

Stack<T> – Basic Operations
 Push(T) – adds a new element on the top of the stack
 Pop() – returns the highest element and removes it from the stack
 Peek() – returns the highest element without removing it
 Count – returns the count of elements in the stack
 Clear() – retrieves all elements from the stack
 Contains(T) – check whether the stack contains the element
 ToArray() – returns an array, containing all elements of the stack

.NET implementation of Stack
Push :
// Pushes an item to the top of the stack.
//
public virtual void Push(Object obj) {
//Contract.Ensures(Count == Contract.OldValue(Count) + 1);
if (_size == _array.Length) {
Object[] newArray = new Object[2*_array.Length];
Array.Copy(_array, 0, newArray, 0, _size);
_array = newArray;
}
_array[_size++] = obj;
_version++;
}

.NET implementation of Stack
Pop :
// Pops an item from the top of the stack. If the stack is empty, Pop
// throws an InvalidOperationException.
public virtual Object Pop() {
if (_size == 0)
throw new
InvalidOperationException(Environment.GetResourceString("InvalidOperation_Empty
Stack"));
//Contract.Ensures(Count == Contract.OldValue(Count) - 1);
Contract.EndContractBlock();
_version++;
Object obj = _array[--_size];
_array[_size] = null; // Free memory quicker.
return obj;
}

Dictionary data structures
 Hash Table
 Dictionary<T>

What is Hash- Table
 Problem with Ordinal indexing ?

 Hash table combines the random access ability of array with the dynamism of
linked list.
i.e. Insertion/Deletion and Lookup can be done with O(1)
complexity if it is implemented correctly
 To achieve this we can create a data structure where while inserting data, the
data itself gives us some clue about where we can store the data.
 A Hash table is a combination of two things
 First, a hash function which return a non negative value called Hash code.
 Second, an array capable of storing the data that we want to place into the structure.
 The idea is that we run our data through the hash function and then store the
data in the element of an array represented by the returned hashcode.

 As elements are added to a Hashtable, the actual load factor of
the Hashtable increases. When the actual load factor reaches the specified
load factor, the number of buckets in the Hashtable is automatically increased
to the smallest prime number that is larger than twice the current number
of Hashtable buckets.
 For very large Hashtable objects, you can increase the maximum capacity to 2
billion elements on a 64-bit system by setting the enabled attribute of the
configuration element to true in the run-time environment.

 How insertion happens in Hashtable
 How lookup works in hash table
 Ex: if you want search for “John” in the hashtable, we pass key and it hashes that key and gets
the same hash code which was generated while inserting “John” in the hash table. That is 4 .
 It searches “John” at the 4 index of hashtable and returns true as “John” is present at 4th index of
hashtable.
 Each element is a key/value pair stored in a DictionaryEntry object.
 private struct DictionaryEntry{
public TKey key;
public TValue value;
public int hashCode;
public int next;
}

How to define the Hash function?
 There is no limit number of possible hash functions.
 However there are some characteristics expected to qualify it as an
efficient hash function.
 Deterministic – Every time pass the exact the same piece of data into the
hash function, we always get same hash code.
 Uniformly distributed data – You should not get same hash code for different
values every time
 Ex of hash function

What if we came across this situation
 Do you see any problem in the following hastable
 We call this as collision.
 A collision occurs when two pieces of data run through the hash function and
get the same hash code.
 We want to store both pieces of data and don’t want to override the existing
one with new one.

Collision resolution techniques
 Linear probing : in this method if collision occurs we try to place the data in the
next consecutive index until we find the vacancy.It has clustering problem .
 Quadratic probing : If slot s is taken, rather than checking slot s + 1, then s + 2,
and so on as in linear probing, quadratic probing checks slot s + 12 first, then s –
12, then s + 22, then s – 22, then s + 32, and so on. However, even quadratic
hashing can lead to clustering.
 Chaining (Used in Dictionary<T>): Here linked list comes into picture. Instead of
storing one value in each element of hashtable, it contains pointer to the
linked list. So each element of array is a pointer to head of linked list.
 Rehashing (Used in HashTable): It has different hash functions (H1,H2..Hn) when
collision occurs.
 Ex: Hk(key) =
[GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize

When to use what?
 Do you need a sequential list where the element is typically discarded after its
value is retrieved?
 If yes, consider using the Queue class or the Queue<T> generic class if you need first-in,
first-out (FIFO) behavior. Consider using theStack class or the Stack<T> generic class if
you need last-in, first-out (LIFO) behavior. For safe access from multiple threads, use the
concurrent versions ConcurrentQueue<T> and ConcurrentStack<T>.
 If not, consider using the other collections.
 Do you need to access the elements in a certain order, such as FIFO, LIFO, or
random?
 The Queue class and the Queue<T> or ConcurrentQueue<T> generic class offer FIFO
access. For more information, see When to Use a Thread-Safe Collection.
 The Stack class and the Stack<T> or ConcurrentStack<T> generic class offer LIFO
access. For more information, see When to Use a Thread-Safe Collection.
 The LinkedList<T> generic class allows sequential access either from the head to the tail,
or from the tail to the head.

 Do you need to access each element by index?
 The ArrayList and StringCollection classes and the List<T> generic class offer access
to their elements by the zero-based index of the element.
 The Hashtable, SortedList, ListDictionary, and StringDictionary classes, and
the Dictionary<TKey, TValue> and SortedDictionary<TKey, TValue> generic classes
offer access to their elements by the key of the element.
 The NameObjectCollectionBase and NameValueCollection classes, and
the KeyedCollection<TKey, TItem> and SortedList<TKey, TValue>generic classes
offer access to their elements by either the zero-based index or the key of the
element.
 Will each element contain one value, a combination of one key and one
value, or a combination of one key and multiple values?
 One value: Use any of the collections based on the IList interface or
the IList<T> generic interface.
 One key and one value: Use any of the collections based on
the IDictionary interface or the IDictionary<TKey, TValue> generic interface.
 One value with embedded key: Use the KeyedCollection<TKey, TItem> generic
class.
 One key and multiple values: Use the NameValueCollection class.

 Do you need to sort the elements differently from how they were entered?
 The Hashtable class sorts its elements by their hash codes.
 The SortedList class and the SortedDictionary<TKey, TValue> and SortedList<TKey,
TValue> generic classes sort their elements by the key, based on implementations
of the IComparer interface and the IComparer<T> generic interface.
 ArrayList provides a Sort method that takes an IComparer implementation as a
parameter. Its generic counterpart, the List<T> generic class, provides
a Sort method that takes an implementation of the IComparer<T> generic
interface as a parameter.
 Do you need fast searches and retrieval of information?
 ListDictionary is faster than Hashtable for small collections (10 items or fewer).
The Dictionary<TKey, TValue> generic class provides faster lookup than
the SortedDictionary<TKey, TValue> generic class. The multi-threaded
implementation isConcurrentDictionary<TKey,
TValue>. ConcurrentBag<T> provides fast multi-threaded insertion for unordered
data. For more information about both multi-threaded types, see When to Use a
Thread-Safe Collection.

Data structures in c#

More Related Content

What's hot (20)

Similar to Data structures in c# (20)

Recently uploaded (20)

Data structures in c#