A data structur linked list implimentation users freindly
C++ using poiters that insert first,last;
get first,last
remove first,last,set,indess of,size,clear

Table of contents [showhide]
1 Introduction
1.1 The Node
1.2 Building a Chain from Nodes
1.3 The Principle of Induction
1.4 Induction on a Summation
2 Asymptotic Notation
2.1 How asymptotic notation relates to analyzing complexity
2.2 A few examples of asymptotic notation
3 Arrays
3.1 type
3.2 bounds
3.3 bounds check
3.4 Array Operations
3.5 Declaring Array Types
3.6 Array Acccess
3.7 Language Specifics
4 List Structures and Iterators
4.1 Syntactic Sugar
4.2 Implementations
4.2.1 Singly Linked List
4.2.2 Vector
4.3 Bidirectional Lists
4.4 Doubly Linked List Implementation
4.5 Vector Implementation
4.6 Tradeoffs
4.7 Advantages / Disadvantages
5 Stacks and Queues
5.1 Stacks
5.1.1 Linked List Implementation
5.1.1.1 Performance Analysis
5.1.2 Array Implementation
5.1.2.1 Performance Analysis
5.1.3 Related Links
5.2 Queues
5.2.1 Linked List Implementation
5.2.1.1 Performance Analysis
5.2.2 Ulol
5.2.2.1 Performance Analysis
5.2.3 Related Links
5.3 Dequeues
6 Trees
7 Min and Max Heaps
7.1 Compute the extreme value
7.2 Removing the Extreme Value
7.3 Inserting a value into the heap
7.4 TODO
8 Graphs
8.1 Representations
8.2 Graph searching
8.2.1 Depthfirst
8.2.2 Breadthfirst
9 Hash Tables
10 Sets
11 Tradeoffs
[edit]
Introduction
Data structures are abstractions we use to manage large amounts of information and the relationships different pieces of information have with each other. Sometimes we use data structures to allow us to do more: for example, to accomplish fast searching or sorting of data. Other times, we use data structures so that we can do less: for example, the concept of the stack is a limited form of a more general data structure. These limitations provide us with guarantees that allow us to reason about our programs more easily. Data structures also provide guarantees about algorithmic complexity choosing an appropriate data structure for a job is crucial to writing good software.
Because data structures are higherlevel abstractions, they present to us operations on groups of data, such as adding an item to a list, or looking up the highestpriority item in a queue. When a data structure provides operations, we can call the data structure an abstract data type (sometimes abreviated as ADT). Abstract data types can minimize dependencies in your code, which is important when your code needs to be changed. Because you are abstracted away from lowerlevel details, some of the higherlevel commonalities one data structure shares with a different data structure can be used to replace one with the other.
Our programming languages come equipped with a set of builtin types, such as integers and floatingpoint numbers, that allow us to work with data objects for which the machine's processor has native support. These builtin types are abstractions of what the processor actually provides because builtin types hide details both about their execution and limitations.
For example, when we use a floatingpoint number we are primarily concerned with its value and the operations that can be applied to it. Consider computing the length of a hypotenuse:
let c := sqrt(a * a + b * b)
The machine code generated from the above would use common patterns for computing these values and accumulating the result. In fact, these patterns are so repetitious that highlevel languages were created to avoid this redundancy and to allow programmers to think about what value was computed instead of how it was computed.
Two useful and related concepts are at play here:
Abstraction is when common patterns are grouped together under a single name and then parameterized, in order to achieve a higherlevel understanding of that pattern. For example, the multiplication operation requires two source values and writes the product of those two values to a given destination. The operation is parameterized by both the two sources and the single destination.
Encapsulation is a mechanism to hide the implementation details of an abstraction away from the users of the abstraction. When we multiply numbers, for example, we don't need to know the technique actually used by the processor, we just need to know its properties.
A programming language is both an abstraction of a machine and a tool to encapsulateaway the machine's inner details. For example, a program written in a programming language can be compiled to several different machine architectures when that programming language sufficiently encapsulates the user away from any one machine.
In this book, we take the abstraction and encapsulation that our programming languages provide another step further: When applications get to be more complex, the abstractions programming languages provide become too lowlevel to effectively manage. Thus, we build our own abstractions on top of these lowerlevel constructs. We can even build further abstractions on top of those abstractions. Each time we build upwards, we lose access to the lowerlevel implementation details. While losing such access might sound like a bad trade off, it is actually quite a bargain: We are primarily concerned with solving the problem at hand than we are with any trivial decisions that could have just as arbitrarily been replaced with a different decision. When we can think on higher levels, we relieve ourselves of these burdens.
Each data structure that we cover in this book can be thought of as a single unit that has a set of values and a set of operations that can be performed to either access or change these values. The data structure itself can be understood as a set of the data structure's operations together with each operation's properties (i.e., what the operation does and how long we could expect it to take).
[edit]
The Node
The first data structure we look at is the node structure. A node is simply a container for a value, plus a pointer to a "next" node (which may be null).
Node Operations
makenode(v, node next):node
Create a new node, with v as its contained value and next as the value of the next pointer
getvalue(node n): element
Returns the node n's contained value
getnext(node n): node
Returns the value of node n's next pointer
setvalue(node n, v)
Sets the contained value of n to be v
setnext(node n, node newnext)
Sets the value of node n's next pointer to be newnext
All operations can be performed in time that is O(1).
The above is an abstraction of a structure:
structure node {
element value // holds the value
node next // pointer to the next node; possibly null
}
In some languages, structures are called records or classes. Some other languages provide no direct support for structures, but instead allow them to be built from other constructs (such as tuples or lists).
Here, we are only concerned that nodes contain values of some form, so we simply say its type is "element" because the type is not important. In some programming languages no type ever needs to be specified (as in dynamically typed languages, like Scheme, Smalltalk or Python). In other languages the type might need to be restricted to integer or string (as in statically typed languages like C). In still other languages, the decision of the type of the contained element can be delayed until the type is actually used (as in languages that support generic types, like C++ and Java). In any of these cases, translating the pseduocode into your own language should be relatively simple.
Each of the node operations specified can be implemented quite easily:
function makenode(v, node next): node function setvalue(node n, v)
let result := new node {v, next} n.value := v
return result end
end
function getvalue(node n): element function setnext(node n, newnext)
return n.value n.next := newnext
end end
function getnext(node n): node
return n.next
end
Principally, we are more concerned with the operations and the implementation strategy than we are with the structure itself and the lowlevel implementation. For example, we are more concerned about the time requirement specified, which states that all operations take time that is O(1). The above implementation meets this criteria, because the length of time each operation takes is constant. Another way to think of constant time operations is to think of them as operations whose analysis is not dependent on any variable. (The notation O(1) is mathematically defined in the next chapter. For now, it is safe to assume it just means constant time.)
[edit]
Building a Chain from Nodes
While the node structure is simple, it actually allows us to compute things that we couldn't have computed with just fixedsize integers alone. First, we'll look at a program that doesn't need to use nodes. The following program will read in (either from the user or a file) a series of numbers until the endoffile is reached and output the largest number and the average of all numbers:
program(inputstream in, outputstream out)
let total := 0
let count := 0
let largest :=
while hasnextinteger(in):
let i := readinteger(in)
total := total + i
count := count + 1
largest := max(largest, i)
repeat
println out "Maximum: " largest
if count != 0:
println out "Average: " (total / count)
fi
end
But now consider solving a similar task: read in a series of numbers until the endoffile is reached, and output the largest number and the average of all numbers that evenly divide the largest number. This problem is different because it's possible the largest number will be the last one entered: if we are to compute the average of all numbers that divide that number, we'll need to somehow remember all of them. We could use variables to remember the previous numbers, but variables would only help us solve the problem when there aren't too many numbers entered.
For example, suppose we were to give ourselves 200 variables to hold the state input by the user. And further suppose that each of the 200 variables had 64bits. Even if we were very clever with our program, it could only compute results for different types of input. While this is a very large number of combinations, a list of 300 64bit numbers would require even more combinations to be properly encoded. (In general, the problem is said to require linear space. All programs that need only a finite number of variables can be solved in constant space.)
Instead of building in limitations that complicate coding (such as having only a constant number of variables), we can use the properties of the node abstraction to allow us to remember as many numbers as our computer can hold:
program(inputstream in, outputstream out)
let largest :=
let nodes := null
while hasnextinteger(in):
let i := readinteger(in)
nodes := makenode(i, nodes) // contain the value n,
// and remember the previous numbers too
largest := max(largest, i)
repeat
println out "Maximum: " largest
// now compute the averages of all factors of largest
let total := 0
let count := 0
while nodes != null:
let j := getvalue(nodes)
if j divides largest:
total := total + j
count := count + 1
fi
nodes := getnext(nodes)
repeat
if count != 0:
println out "Average: " (total / count)
fi
end
Above, if n integers are successfully read there will be n calls made to makenode. This will require n nodes to be made (which require enough space to hold the value and next fields of each node, plus internal memory management overhead), so the memory requirements will be on the order of O(n). Similarly, we construct this chain of nodes and then iterate over the chain again, which will require O(n) steps to make the chain, and then another O(n) steps to iterate over it.
Note that when we iterate the numbers in the chain, we are actually looking at them in reverse order. For example, assume the numbers input to our program are 4, 7, 6, 30, and 15. After EOF is reached, the nodes chain will look like this:
Such chains are more commonly referred to as linkedlists. However, we generally prefer to think in terms of lists or sequences, which aren't as lowlevel: the linking concept is just an implementation detail. While a sequence can be made with a chain, in this book we cover several other ways to make a sequence. For the moment, we care more about the abstraction capabilities of the node than we do about one of the ways it is used.
[edit]
The Principle of Induction
The chains we can build from nodes are a demonstration of the principle of mathematical induction:
Mathematical Induction
Suppose you have some property of numbers P(n)
If you can prove that when P(n) holds that P(n + 1) must also hold, then
All you need to do is prove that P(1) holds to show that P(n) holds for all n
For example, let the property P(n) be the statement that "you can make a chain that holds n numbers". This is a property of natural numbers, because the sentence makes sense for specific values of n:
you can make a chain that holds 5 numbers
you can make a chain that holds 100 numbers
you can make a chain that holds 1,000,000 numbers
Instead of proving that we can make chains of length 5, 100, and one million, we'd rather prove the general statement P(n) instead. Step 2 above is called the Inductive Hypothesis, let's show that we can prove it:
Assume that P(n) holds. That is, that we can make a chain of n elements. Now we must show that P(n + 1) holds.
Assume chain is the first node of the nelement chain. Assume i is some number that we'd like to add to the chain to make an n + 1 length chain.
The following code can accomplish this for us:
let biggerchain := makenode(i, chain)
Here, we have the new number i that is now the contained value of the first link of the biggerchain. If chain had n elements, then biggerchain must have n + 1 elements.
Step 3 above is called the Base Case, let's show that we can prove it:
We must show that P(1) holds. That is, that we can make a chain of one element.
The following code can accomplish this for us:
let chain := makenode(i, null)
The principle of induction says, then, that we have proven that we can make a chain of n elements for all value of . How is this so? Probably the best way to think of induction is that it's actually a way of creating a formula to describe an infinite number of proofs. After we prove that the statement is true for P(1), the base case, we can apply the inductive hypothesis to that fact to show that P(2) holds. Since we now know that P(2) holds, we can apply the inductive hypothesis again to show that P(3) must hold. The principle says that there is nothing to stop us from doing this repeatedly, so we should assume it holds for all cases.
Induction may sound like a strange way to prove things, but it's a very useful technique. What makes the technique so useful is that it can take a hard sounding statement like "prove P(n) holds for all " and break it into two smaller, easier to prove statements. Typically base cases are easy to prove because they are not general statements at all. Most of the proof work is usually in the inductive hypothesis, which can often require clever ways of reformulating the statement to "attach on" a proof of the n + 1 case.
You can think of the contained value of a node as a base case, while the next pointer of the node as the inductive hyptothesis. Just as in mathematical induction, we can break the hard problem of storing an arbitrary number number of elements into an easier problem of just storing one element and then having a mechanism to attach on further elements.
[edit]
Induction on a Summation
[TODO: show the closed form of the sum of the first n numbers. This is useful as another example of induction, and also for when insertion sort is covered.]
[edit]
Asymptotic Notation
In order to analyze which data structure or algorithm is the best suited for the job, it is necessary to understand how a data structure or algorithm behaves over time and space. An important tool in this analysis is asymptotic notation.
Big Oh Notation
For nonnegative functions, f(n) and g(n), if there exists an integer n0 and a constant c > 0 such that for all integers n > n0, f(n) ≒ cg(n), then f(n) is Big Oh of g(n). This is denoted as "f(n) = O(g(n))".
If graphed, g(n) serves as an upper bound to the curve you are analyzing, f(n). It describes the worst that can happen for a given data size.
Big Omega Notation
For nonnegative functions, f(n) and g(n), if there exists an integer n0 and a constant c > 0 such that for all integers n > n0, f(n) ≡ cg(n), then f(n) is omega of g(n). This is denoted as "f(n) = 次(g(n))".
This is almost the same definition as Big Oh, except that"f(n) ≡ cg(n)", this makes g(n) a lower bound function, instead of an upper bound function. It describes the best that can happen for a given data size.
Theta Notation
For nonnegative functions, f(n) and g(n), f(n) is theta of g(n) if and only if f(n) = O(g(n)) and f(n) = 次(g(n)). This is denoted as "f(n) = 成(g(n))".
This is basically saying that the function, f(n) is bounded both from the top and bottom by the same function, g(n).
Little Oh Notation
For nonnegative functions, f(n) and g(n), f(n) is little oh of g(n) if and only if f(n) = O(g(n)), but f(n) ≧ 成(g(n)). This is denoted as "f(n) = o(g(n))".
This represents a loose bounding version of Big Oh. g(n) bounds from the top, but it does not bound the bottom.
Little Omega Notation
For nonnegative functions, f(n) and g(n), f(n) is little omega of g(n) if and only if f(n) = 次(g(n)), but f(n) ≧ 成(g(n)). This is denoted as "f(n) = 肋(g(n))".
Much like Little Oh, this is the equivalent for Big Omega. g(n) is a loose lower boundary of the function f(n); it bounds from the bottom, but not from the top.
[edit]
How asymptotic notation relates to analyzing complexity
If you think of the amount of time and space your algorithm uses as a function of your data over time or space (time and space are usually analyzed separately), you can analyze how the time and space is handled when you introduce more data to your program.
This is important in data structures because you want a structure that behaves efficiently as you increase the amount of data it handles. Keep in mind though that algorithms that are efficient with large amounts data are not always simple and efficient for small amounts of data. So if you know you are working with only a small amount of data and you have concerns for speed and code space, a trade off can be made for a function that does not behave well for large amounts of data.
[edit]
A few examples of asymptotic notation
Generally, we use asymptotic notation as an convienient way to examine what can happen in a function in the worst case or in the best case. For example, if you want to write a function that searches through an array of numbers and returns the smallest one:
function findmin(array a[1..n])
let j :=
for i := 1 to n:
j := min(j, a[i])
repeat
return j
end
Regardless of how big or small the array is, every time we run findmin, we have to initialize the i and j integer variables and return j at the end. Therefore, we can just think of that part of the function as a constant and ignore it.
So, how can we use asymptotic notation to discuss the findmin function? If we search through an array with 87 elements, then the for loop iterates 87 times, even if the very first element we hit turns out to be the minimum. Likewise, for n elements, the for loop iterates n times. Therefore we say the function runs in time O(n).
What about this function:
function findminplusmax(array a[1..n])
// First, find the smallest element in the array
let j := ;
for i := 1 to n:
j := min(j, a[i])
repeat
let min := j
// Now, find the biggest element, add it to the smallest and
// return the sum of the two
j := ;
for i := 1 to n:
j := max(j, a[i])
repeat
let max := j
return min + max;
end
What's the running time for findminplusmax? There are two for loops, that each iterate n times, so the running time is clearly O(2n). Because 2 is a constant, we throw it away and write the running time as O(n). Why can you do this? If you recall the definition of BigO notation, the function whose bound you're testing can be multiplied by some constant. If f(x) = 2x, we can see that if g(x) = x, then the BigO condition holds. Thus O(2n) = O(n). This rule is general for the various asymptotic notations.
[edit]
Arrays
[TODO: Remove language specific parts; put the yellow box ADT description closer to the top of the page; make indexes only integers; discuss matricies as an application of two dimensional arrays; discuss insertion sort on an array; discuss binary search on an array; perhaps mutable vs. immutable? (and their speeds?)]
An array is a particular method of storing elements of indexed data. Elements of data are stored sequentially in blocks within the array. Each element is referenced by an index, or subscript.
The index is usually a number used to address an element in the array. For example, if you were storing information about each day in August, you would create an array with an index capable of addressing 31 values  one for each day of the month. Indexing rules are language dependent, however most languages use either 0 or 1 as the first element of an array.
The concept of an array can be daunting to the uninitiated, but it is really quite simple. Think of a notebook with pages numbered 1 through 12. Each page may or may not contain information on it. The notebook is an array of pages. Each page is an element of the array 'notebook'. Programmatically, you would retrieve information from a page by referring to its number or subscript, i.e., notebook(4) would refer to the contents of page 4 of the array notebook.
The notebook (array) contains 12 pages (elements)
Arrays can also be multidimensional  instead of accessing an element of a onedimensional list, elements are accessed by two or more indices, as from a matrix or tensor.
Multidimensional arrays are as simple as our notebook example above. To envision a multidimensional array, think of a calendar. Each page of the calendar, 1 through 12, is an element, representing a month, which contains approximately 30 elements, which represent days. Each day may or may not have information in it. Programmatically then, calendar(4,15) would refer to the 4th month, 15th day. Thus we have a twodimensional array. To envision a threedimensional array, break each day up into 24 hours. Now calendar(4,15,9) would refer to 4th month, 15th day, 9th hour.
A simple 6 element by 4 element array
Arrays guarantee constant time read and write access, O(1), however many lookup operations (find_min, find_max, find_index) of an instance of an element are linear time, O(n). Arrays are very efficient in most languages, as operations compute the address of an element via a simple formula based on the base address element of the array.
The implementation of arrays differ greatly between languages: some languages allow arrays to be resized automatically, or to even contain elements of differing types. Other languages are very strict and require the type and length information of an array to be known at run time.
Arrays typically map directly to contiguous storage locations within your computers memory and are therefore the "natural" storage structure for most higher level languages.
Simple linear arrays are the basis for most of the other data structures. Many languages do not allow you to allocate any structure except an array, everything else must be implemented on top of the array. The exception is the linked list, that is typically implemented as individually allocated objects, but it is possible to implement a linked list within an array.
[edit]
type
The array index needs to be of some type. Usually, the standard integer type of that language is used, but there are also languages such as Ada and Pascal which allow any discrete type as an array index. Scripting languages often allow any type as an index (associative array).
[edit]
bounds
The array index consists of a range of values with a lower bound and an upper bound.
In some programming languages only the upper bound can be choosen while the lower bound is fixed to be either 0 (C, C++, C#, Java) or 1 (FORTRAN).
Other programming languages (Ada, PL/I, Pascal) both the upper and lower bound can be freely choosen (even negative).
[edit]
bounds check
The third aspect of an array index is the check for valid ranges and what happens when an invalid index is accessed. This is a very important point since the majority of computer_worms and computer viruses attack by using invaild array bounds.
There are three options open:
Most languages (Ada, PL/I, Pascal, Java) will check the bounds and raise some error condition when an element is accessed which does not exists.
A few languages (C, C++) will not check the bounds and return or set some abritary value when an element outside the valid range is accessed.
Scripting languages often automaticly expand the array when data is written to an index which was not vaild until them.
[edit]
Array Operations
createarray (index_type lowerbound, index_type upperbound)
Create an array of elements (in some languages the elements are homogeneous and the element type must be specified), indexed from lowerbound to upperbound, inclusive. The number of elements in the array, also known as the size of the array, is equal to upperbound  lowerbound + 1.
getelement (index_type index) : element_type
Returns the value of the element at the given index. When an array has a constant size, then the index must be in bounds: lowerbound <= index <= upperbound. This operation is also known as subscripting.
setelement (index_type index, element_type newvalue)
Sets the element of the array at the given index to be equal to newvalue.
[edit]
Declaring Array Types
The declaration of array type depends on how many features the array in a particular language has.
The easiest declaration is when the language has a fixed lower bound and fixed index type. If you need an array to store the monthly income you could declare in C
typedef double Income[12];
This gives you an array with in the range of 0 to 11. For a full description of arrays in C see Programming:C_arrays.
If you use a language where you can choose both the lower bound as well as the index type, the declaration is  of course  more complex. Here are two examples in Ada:
type Month is range 1 .. 12;
type Income is array (Month) of Float;
or shorter:
type Income is array (1 .. 12) of Float;
For a full description of arrays in Ada see Programming:Ada:Types:array.
[edit]
Array Acccess
We generally write arrays with a name, followed by the index in some brackets, square '[]' or round '()'. For example, august[3] is the method used in the C programming language to refer to a particular day in the month.
Because the C language starts the index at zero, august[3] is the 4th element in the array. august[0] actually refers to the first element of this array. Starting an index at zero is natural for computers, whose internal representations of numbers begin with zero, but for humans, this unnatural numbering system can lead to problems when accessing data in an array. When fetching an element in a language with zerobased indexes, make sure keep in mind the true length of an array, lest you find yourself fetching the wrong data. This is the disadvantage of programming in langunges with fixed lower bounds, the programmer must always remember that "[0]" means "1st" and, when appropriate, add or subtract one from the index. Languages with variable lower bounds will take that burden off the programmer's shoulder.
We use indexes, to store related data. If our C language array is called august, and we wish to store that we're going to the supermarket on the 1st, we can say, for example
august[0] = "Going to the shops today"
In this way, we can go through the indexes from 0 to 30 and get the related tasks for each day in august.
[edit]
Language Specifics
Programming:Ada:Types:array
Programming:C arrays
Programming:Perl_Array_Variables
[edit]
List Structures and Iterators
We have seen now two different data structures that allow us to store an ordered sequence of elements. However, they have two very different interfaces. The array allows us to use getelement() and setelement() functions to access and change elements. The node chain requires us to use getnext() until we find the desired node, and then we can use getvalue() and setvalue() to access and modify its value. Now, what if you've written some code, and you realize that you should have been using the other sequence data structure? You have to go through all of the code you've already written and change one set of accessor functions into another. What a pain! Fortunately, there is a way to localize this change into only one place: by using the List Abstract Data Type (ADT).
List<itemtype> ADT
getbegin():List Iterator<itemtype>
Returns the list iterator (we'll define this soon) that represents the first element of the list. Runs in O(1) time.
getend():List Iterator<itemtype>
Returns the list iterator that represents one element past the last element in the list. Runs in O(1) time.
prepend(newitem:itemtype)
Adds a new element at the beginning of a list. Runs in O(N) time.
insertafter(iter:List Iterator<itemtype>, newitem:itemtype)
Adds a new element immediately after iter. Runs in O(N) time.
removefirst()
Removes the element at the beginning of a list. Runs in O(N) time.
removeafter(iter:List Iterator<itemtype>)
Removes the element immediately after iter. Runs in O(N) time.
isempty():Boolean
True iff there are no elements in the list. Has a default implementation. Runs in O(1) time.
getsize():Integer
Returns the number of elements in the list. Has a default implementation. Runs in O(N) time.
getnth(n:Integer):itemtype
Returns the nth element in the list, counting from 0. Has a default implementation. Runs in O(N) time.
setnth(n:Integer, newvalue:itemtype)
Assigns a new value to the nth element in the list, counting from 0. Has a default implementation. Runs in O(N) time.
The iterator is another abstraction that encapsulates both access to a single element and incremental movement around the list. Its interface is very similar to the node interface presented in the introduction, but since it is an abstract type, different lists can implement it differently.
List Iterator<itemtype> ADT
getvalue():itemtype
Returns the value of the list element that this iterator refers to.
setvalue(newvalue:itemtype)
Assigns a new value to the list element that this iterator refers to.
movenext()
Makes this iterator refer to the next element in the list.
equal(otheriter:List Iterator<itemtype>):Boolean
True iff the other iterator refers to the same list element as this iterator.
All operations run in O(1) time.
There are several other aspects of the List ADT's definition that need more explanation. First, notice that the getend() operation returns an iterator that is "one past the end" of the list. This makes its implementation a little trickier, but allows you to write loops like:
var iter:List Iterator := list.getbegin()
while(not iter.equal(list.getend()))
# Do stuff with the iterator
iter.movenext()
end while
Second, each operation gives a worstcase running time. Any implementation of the List ADT is guaranteed to be able to run these operation at least that fast. Most implementations will run most of the operations faster. For example, the node chain implementation of List can run insertafter() in O(1).
Third, some of the operations say that they have a default implementation. This means that these operations can be implemented in terms of other, more primitive operations. They're included in the ADT so that certain implementations can implement them faster. For example, the default implementation of getnth() runs in O(N) because it has to traverse all of the elements before the nth. Yet the array implementation of List can implement it in O(1) using its getelement() operation. The other default implementations are:
abstract type List<itemtype>
method isempty()
return getbegin().equal(getend())
end method
method getsize():Integer
var size:Integer := 0
var iter:List Iterator<itemtype> := getbegin()
while(not iter.equal(getend()))
size := size+1
iter.movenext()
end while
return size
end method
helper method findnth(n:Integer):List Iterator<itemtype>
if n >= getsize()
error "The index is past the end of the list"
end if
var iter:List Iterator<itemtype> := getbegin()
while(n > 0)
iter.movenext()
n := n1
end while
return iter
end method
method getnth(n:Integer):itemtype
return findnth(n).getvalue()
end method
method setnth(n:Integer, newvalue:itemtype)
findnth(n).setvalue(newvalue)
end method
end type
[edit]
Syntactic Sugar
Occasionally throughout this book we'll introduce an abbreviation that will allow us to write, and you to read, less pseudocode. For now, we'll introduce an easier way to compare iterators and a specialized loop for traversing sequences.
Instead of using the equal() method to compare iterators, we'll overload the == operator. To be precise, the following two expressions are equivalent:
iter1.equal(iter2)
iter1 == iter2
Second, we'll use the for keyword to express list traversal. The following two blocks are equivalent:
var iter:List Iterator<itemtype> := list.getbegin()
while(not iter == list.getend())
operations on iter
iter.movenext()
end while
for iter in list
operations on iter
end for
[edit]
Implementations
In order to actually use the List ADT, we need to write a concrete data type that implements its interface. There are two standard data types that naturally implement List: the node chain described in the Introduction, normally called a Singly Linked List; and an extension of the array type called Vector, which automatically resizes itself to accomodate inserted nodes.
[edit]
Singly Linked List
type Singly Linked List<itemtype> implements List<itemtype>
head refers to the first node in the list. When it's null, the list is empty.
data head:Node<itemtype>
Initially, the list is empty.
constructor()
head := null
end constructor
method getbegin():Sll Iterator<itemtype>
return new SllIterator(head)
end method
The "one past the end" iterator is just a null node. To see why, think about what you get when you have an iterator for the last element in the list and you call movenext().
method getend():Sll Iterator<itemtype>
return new SllIterator(null)
end method
method prepend(newitem:itemtype)
head = makenode(newitem, head)
end method
method insertafter(iter:Sll Iterator<itemtype>, newitem:itemtype)
var newnode:Node<itemtype> := makenode(newitem, iter.node().getnext())
iter.node.setnext(newnode)
end method
method removefirst()
head = head.getnext()
end method
This takes the node the iterator holds and makes it point to the node two nodes later.
method removeafter(iter:Sll Iterator<itemtype>)
iter.node.setnext(iter.node.getnext().getnext())
end method
end type
If we want to make getsize() be an O(1) operation, we can add an Integer data member that keeps track of the list's size at all times. Otherwise, the default O(N) implementation works fine.
An iterator for a singly linked list simply consists of a reference to a node.
type Sll Iterator<itemtype>
data node:Node<itemtype>
constructor(_node:Node<itemtype>)
node := _node
end constructor
Most of the operations just pass through to the node.
method getvalue():itemtype
return node.getvalue()
end method
method setvalue(newvalue:itemtype)
node.setvalue(newvalue)
end method
method movenext()
node := node.getnext()
end method
For equality testing, we assume that the underlying system knows how to compare nodes for equality. In nearly all languages, this would be a pointer comparison.
method equal(otheriter:List Iterator<itemtype>):Boolean
return node == otheriter.node
end method
end type
[edit]
Vector
Let's write the Vector's iterator first. It will make the Vector's implementation clearer.
type Vector Iterator<itemtype>
data array:Array<itemtype>
data index:Integer
constructor(my_array:Array<itemtype>, my_index:Integer)
array := my_array
index := my_index
end constructor
method getvalue():itemtype
return array.getelement(index)
end method
method setvalue(newvalue:itemtype)
array.setelement(index, newvalue)
end method
method movenext()
index := index+1
end method
method equal(otheriter:List Iterator<itemtype>):Boolean
return array==otheriter.array and index==otheriter.index
end method
end type
We implement the Vector in terms of the primitive Array data type. It is inefficient to always keep the array exactly the right size (think of how much resizing you'd have to do), so we store both a size, the number of logical elements in the Vector, and a capacity, the number of spaces in the array. The array's valid indices will always range from 0 to capacity1.
type Vector<itemtype>
data array:Array<itemtype>
data size:Integer
data capacity:Integer
We initialize the vector with a capacity of 10. Choosing 10 was fairly arbitrary. If we'd wanted to make it appear less arbitrary, we would have chosen a power of 2, and innocent readers like you would assume that there was some deep, binaryrelated reason for the choice.
constructor()
array := createarray(0, 9)
size := 0
capacity := 10
end constructor
method getbegin():Vector Iterator<itemtype>
return new VectorIterator(array, 0)
end method
The end iterator has an index of size. That's one more than the highest valid index.
method getend():List Iterator<itemtype>
return new VectorIterator(array, size)
end method
We'll use this method to help us implement the insertion routines. After it is called, the capacity of the array is guaranteed to be at least newcapacity. A naive implementation would simply allocate a new array with exactly newcapacity elements and copy the old array over. To see why this is inefficient, think what would happen if we started appending elements in a loop. Once we exceeded the original capacity, each new element would require us to copy the entire array. That's why this implementation at least doubles the size of the underlying array any time it needs to grow.
helper method ensurecapacity(newcapacity:Integer)
If the current capacity is already big enough, return quickly.
if capacity >= newcapacity
return
end if
Now, find the new capacity we'll need,
var allocatedcapacity:Integer := max(capacity*2, newcapacity)
var newarray:Array<itemtype> := createarray(0, allocatedcapacity  1)
copy over the old array,
for i in 0..size1
newarray.setelement(i, array.getelement(i))
end for
and update the Vector's state.
array := newarray
capacity := allocatedcapacity
end method
This method uses a normallyillegal iterator, which refers to the item one before the start of the Vector, to trick insertafter() into doing the right thing. By doing this, we avoid duplicating code.
method prepend(newitem:itemtype)
insertafter(new VectorIterator(array, 1), newitem)
end method
insertafter() needs to copy all of the elements between iter and the end of the Vector. This means that in general, it runs in O(N) time. However, in the special case where iter refers to the last element in the vector, we don't need to copy any elements to make room for the new one. An append operation can run in O(1) time, plus the time needed for the ensurecapacity() call. ensurecapacity() will sometimes need to copy the whole array, which takes O(N) time. But much more often, it doesn't need to do anything at all.
Amortized Analysis
In fact, if you think of a series of append operations starting immediately after ensurecapacity() increases the Vector's capacity (call the capacity here C, and ending immediately after the next increase in capacity, you can see that there will be exactly appends. At the later increase in capacity, it will need to copy C elements over to the new array. So this entire sequence of function calls took operations. We call this situation, where there are O(N) operations for O(N) function calls "amortized O(1) time".
method insertafter(iter:Vector Iterator<itemtype>, newitem:itemtype)
ensurecapacity(size+1)
This loop copies all of the elements in the vector into the spot one index up. We loop backwards in order to make room for each successive element just before we copy it.
for i in size1 .. iter.index+1 step 1
array.setelement(i+1, array.getelement(i))
end for
Now that there is an empty space in the middle of the array, we can put the new element there.
array.setelement(iter.index+1, newitem)
And update the Vector's size.
size := size+1
end method
Again, cheats a little bit to avoid duplicate code.
method removefirst()
removeafter(new VectorIterator(array, 1))
end method
Like insertafter(), removeafter needs to copy all of the elements between iter and the end of the Vector. So in general, it runs in O(N) time. But in the special case where iter refers to the last element in the vector, we can simply decrement the Vector's size, without copying any elements. A removelast operation runs in O(1) time.
method removeafter(iter:List Iterator<itemtype>)
for i in iter.index+1 .. size2
array.setelement(i, array.getelement(i+1))
end for
size := size1
end method
This method has a default implementation, but we're already storing the size, so we can implement it in O(1) time, rather than the default's O(N).
method getsize():Integer
return size
end method
Because an array allows constanttime access to elements, we can implement get and setnth() in O(1), rather than the default implementation's O(N)
method getnth(n:Integer):itemtype
return array.getelement(n)
end method
method setnth(n:Integer, newvalue:itemtype)
array.setelement(n, newvalue)
end method
end type
[edit]
Bidirectional Lists
[TODO: This is very terse. It should be fleshed out later.]
Sometimes we want to move backward in a list too.
Bidirectional List<itemtype> ADT
getbegin():Bidirectional List Iterator<itemtype>
Returns the list iterator (we'll define this soon) that represents the first element of the list. Runs in O(1) time.
getend():Bidirectional List Iterator<itemtype>
Returns the list iterator that represents one element past the last element in the list. Runs in O(1) time.
insert(iter:Bidirectional List Iterator<itemtype>, newitem:itemtype)
Adds a new element immediately before iter. Runs in O(N) time.
remove(iter:Bidirectional List Iterator<itemtype>)
Removes the element immediately refered to by iter. After this call, iter will refer to the next element in the list. Runs in O(N) time.
isempty():Boolean
True iff there are no elements in the list. Has a default implementation. Runs in O(1) time.
getsize():Integer
Returns the number of elements in the list. Has a default implementation. Runs in O(N) time.
getnth(n:Integer):itemtype
Returns the nth element in the list, counting from 0. Has a default implementation. Runs in O(N) time.
setnth(n:Integer, newvalue:itemtype)
Assigns a new value to the nth element in the list, counting from 0. Has a default implementation. Runs in O(N) time. Bidirectional List Iterator<itemtype> ADT
getvalue():itemtype
Returns the value of the list element that this iterator refers to. Undefined if the iterator is pasttheend.
setvalue(newvalue:itemtype)
Assigns a new value to the list element that this iterator refers to. Undefined if the iterator is pasttheend.
movenext()
Makes this iterator refer to the next element in the list. Undefined if the iterator is pasttheend.
moveprevious()
Makes this iterator refer to the previous element in the list. Undefined if the iterator refers to the first list element.
equal(otheriter:List Iterator<itemtype>):Boolean
True iff the other iterator refers to the same list element as this iterator.
All operations run in O(1) time.[edit]
Doubly Linked List Implementation
[edit]
Vector Implementation
The vector we've already seen has a perfectly adequate implementation to be a Bidirectional List. All we need to do is add the extra member functions to it and its iterator; the old ones don't have to change.
type Vector<itemtype>
... # alreadyexisting data and methods
Implement this in terms of the original insertafter() method. After that runs, we have to adjust iter's index so that it still refers to the same element.
method insert(iter:Bidirectional List Iterator<itemtype>, newitem:itemtype)
insertafter(new VectorIterator(iter.array, iter.index1))
iter.movenext()
end method
Also implement this on in terms of an old function. After removeafter() runs, the index will already be correct.
method remove(iter:Bidirectional List Iterator<itemtype>)
removeafter(new VectorIterator(iter.array, iter.index1))
end method
end type
[edit]
Tradeoffs
[TODO: Refactor useful information from below into above]
[TODO: following outline from talk page:]
Common list operations & example
set(pos, val), get(pos), remove(pos), append(val)
[Perhaps discuss operations on Nodes, versus operations on the List
itself: example, Nodes have a next operation, but the List itself has
a pointer to the head and tail node, and an integer for the number of
elements. This view would work well in both the Lisp and OO worlds.]
Doubley linked list
Vectors (resizeable arrays)
In order to choose the correct data structure for the job, we need to have some idea of what we're going to *do* with the data.
Do we know that we'll never have more than 100 pieces of data at any one time, or do we need to occasionally handle gigabytes of data ?
How will we read the data out ? Always in chronological order ? Always sorted by name ? Randomly accessed by record number ?
Will we always add/delete data to the end or to the beginning ? Or will we be doing a lot of insertions and deletions in the middle ?
We must strike a balance between the various requirements. If we need to frequently read data out in 3 different ways, pick a data structure that allows us to do all 3 things nottooslowly. Don't pick some data structure that's unbearably slow for *one* way, no matter how blazingly fast it is for the other ways.
Often the shortest, simplest programming solution for some task will use a linear (1D) array.
If we keep our data as an ADT, that makes it easier to temporarily switch to some other underlying data structure, and objectively measure whether it's faster or slower.
[edit]
Advantages / Disadvantages
For the most part, an advantage of an array is a disadvantage of a linkedlist, and vice versa.
Array Advantages (vs. LinkLists)
Index  Fast access to every element in the array using an index [], not so with linked list where elements in beginning must be traversed to your desired element.
Faster  In general, It is faster to access an element in an array than accessing an element in a linkedlist.
Array Disadvantages (vs. LinkLists)
No Resizing  Cannot easily resize the array to accomodate for more elements in an already full array, (easily done in a linked list).
No Insertion  Cannot easily insert an element in the middle of an array, (much easier to do in a linked list).

LinkLists Advantages (vs. Arrays)
Resize  Can easily resize the linklist by adding elements without affecting the majority of the other elements in the linklist.
Insertion  Can easily insert an element in the middle of a linkedlist, (the element is created and then you code pointers to link this element to the other element(s) in the linklist).
LinkLists Disadvantages (vs. Arrays)
No Index  Elements cannot be accessed by an [] directly. Although it is possible to code a layer to have the [] index functionality, it is much slower.
Slower  In general, it is slower to access an element in a linklist than accessing an element in an array.
Sidenote:  How to insert an element in the middle of an array. If an array is not full, you take all the elements after the spot or index in the array you want to insert, and move them foward by 1, then insert your element. If the array is already full and you want to insert an element, you would have to, in a sense, 'resize the array.' A new array would have to be made one size larger than the original array to insert your element, then all the elements of the original array are copied to the new array taking into consideration the spot or index to insert your element, then insert your element.
[TODO: queue implemented as an array: circular and fixedsized]
[edit]
Stacks and Queues
[edit]
Stacks
A stack is a basic data structure that is used all through out programming. The idea is to think of your data as a stack of plates or books where you can only take the top item off the stack in order to remove things from it.
A stack is also called a LIFO (Last In First Out) to demonstrate the way it accesses data.
Stack<itemtype> Operations
push(newitem:itemtype)
Adds an item onto the stack.
top():itemtype
Returns the last item pushed onto the stack.
pop()
Removes the mostrecentlypushed item from the stack.
isempty():Boolean
True iff no more items can be popped and there is no top item.
isfull():Boolean
True iff no more items can be pushed.
getsize():Integer
Returns the number of elements on the stack.
All operations except getsize() can be performed in O(1) time. getsize() runs in at worst O(N).
[edit]
Linked List Implementation
The basic linked list implementation is one of the easiest linked list implementations you can do. Structurally it is a linked list.
type Stack<item_type>
data list:Singly Linked List<item_type>
constructor()
list := new SinglyLinkedList()
end constructor
Most operations are implemented by passing them through to the underlying linked list When you want to push something onto the list, you simply add it to the front of the linked list. The previous top is then "next" from the item being added and the list's front pointer points to the new item.
method push(new_item:item_type)
list.prepend(new_item)
end method
To look at the top item, you just examine the first item in the linked list.
method top():item_type
return list.getbegin().getvalue()
end method
When you want to pop something off the list, simply remove the first item from the linked list.
method pop()
list.removefirst()
end method
A check for emptiness is easy. Just check if the list is empty.
method isempty():Boolean
return list.isempty()
end method
A check for full is simple. Linked lists are considered to be limitless in size.
method isfull():Boolean
return False
end method
A check for the size is again passed through to the list.
method getsize():Integer
return list.getsize()
end method
end type
A real Stack implementation in a published library would probably reimplement the linked list in order to squeeze the last bit of performance out of the implementation by leaving out unneeded functionality. The above implementation gives you the ideas involved, and any optimization you need can be accomplished by inlining the linked list code.
[edit]
Performance Analysis
In a linked list, accessing the first element is an O(1) operation because the list contains a pointer directly to it. Therefore, push, pop, and top are a quick O(1) operations.
The checks for empty/fullness as done here are also O(1).
The performance of getSize() depends on the performance of the corresponding operation in the linked list implementation. It could be either O(n), or O(1), depending on what time/space tradeoff is made. Most of the time, users of a Stack do not use the getSize() operation, and so a bit of space can be saved by not optimizing it.
[edit]
Array Implementation
[edit]
Performance Analysis
[edit]
Related Links
Stack (Wikipedia)
[edit]
Queues
A queue is a basic data structure that is used throughout programming. You can think of it as a line in a grocery store. The first one in the line is the first one to be served.
A queue is also called a FIFO (First In First Out) to demonstrate the way it accesses data.
Queue<itemtype> Operations
enqueue(newitem:itemtype)
Adds an item onto the end of the queue.
front():itemtype
Returns the item at the front of the queue.
dequeue()
Removes the item from the front of the queue.
isempty():Boolean
True iff no more items can be dequeued and there is no front item.
isfull():Boolean
True iff no more items can be enqueued.
getsize():Integer
Returns the number of elements in the queue.
All operations except getsize() can be performed in O(1) time. getsize() runs in at worst O(N).
[edit]
Linked List Implementation
The basic linked list implementation uses a singlylinked list with a tail pointer to keep track of the back of the queue.
type Queue<item_type>
data list:Singly Linked List<item_type>
data tail:List Iterator<item_type>
constructor()
list := new SinglyLinkedList()
tail := list.getbegin() # null
end constructor
When you want to enqueue something, you simply add it to the back of the item pointed to by the tail pointer. So the previous tail is considered next compared to the item being added and the tail pointer points to the new item. If the list was empty, this doesn't work, since the tail iterator doesn't refer to anything
method enqueue(new_item:item_type)
if isempty()
list.prepend(new_item)
tail := list.getbegin()
else
list.insert_after(new_item, tail)
tail.movenext()
end if
end method
The front item on the queue is just the one referred to by the linked list's head pointer
method front():item_type
return list.getbegin().getvalue()
end method
When you want to dequeue something off the list, simply point the head pointer to the previous from head item. The old head item is the one you removed of the list. If the list is now empty, we have to fix the tail iterator.
method dequeue()
list.removefirst()
if isempty()
tail := list.getbegin()
end if
end method
A check for emptiness is easy. Just check if the list is empty.
method isempty():Boolean
return list.isempty()
end method
A check for full is simple. Linked lists are considered to be limitless in size.
method isfull():Boolean
return False
end method
A check for the size is again passed through to the list.
method getsize():Integer
return list.getsize()
end method
end type
[edit]
Performance Analysis
In a linked list, accessing the first element is an O(1) operation because the list contains a pointer directly to it. Therefore, enqueue, front, and dequeue are a quick O(1) operations.
The checks for empty/fullness as done here are also O(1).
The performance of getSize() depends on the performance of the corresponding operation in the linked list implementation. It could be either O(n), or O(1), depending on what time/space tradeoff is made. Most of the time, users of a Stack do not use the getSize() operation, and so a bit of space can be saved by not optimizing it.
[edit]
Ulol
[edit]
Performance Analysis
[edit]
Related Links
Queue (Wikipedia)
[edit]
Dequeues
As a start
Dequeue
[edit]
Trees
As a start:
wikipedia:binary tree
wikipedia:AVL tree
[TODO: cover binary search trees, tree traversals, iterators for trees, general trees, and balanced binary trees]
[edit]
Min and Max Heaps
A heap is an efficient data structure which supports two operations:
INSERT(heap, element)
element REMOVE_MIN(heap)
(we discuss minheaps, but there's no real difference between min and max heaps, except how the comparison is interpreted.)
A heap is implemented using an array that is indexed from 1 to N, where N is the number of elements in the heap.
At any time, the heap must satisfy the heap invariants
array[n] <= array[2*n]
and
array[n] <= array[2*n+1]
whenever the indices are in the arrays bounds.
[edit]
Compute the extreme value
We will prove that array[1] is the minimum element in the heap. We prove it by seeing a contradiction if some other element is less than the first element. Suppose array[i] is the first instance of the minimum, with array[j] > array[i] for all j < i, and . But by the heap invariant array, array[floor(i / 2)] < = array[i]: this is a contradiction.
Therefore, it is easy to compute MIN(heap):
MIN(heap)
return heap.array[1];
[edit]
Removing the Extreme Value
To remove the minimum element, we must adjust the heap to fill heap.array[1]. This process is called percolation. Basically, we move the hole from node i to either node 2i or 2i + 1. If we pick the minimum of these two, the heap invariant will be maintained; suppose
