Archive for March, 2011

Sort Algorithms

So, I’ve been preparing for some job interviews lately. Of the highly technical nature. These are not the ones where they ask you what you’ve been working on or what your experience is in. These are the ones where they ask you questions to test the depth of your knowledge in Computer Science, algorithms, data structures, theory and practice. Stuff you learned as a first and second year undergrad and haven’t actually put into practice much. Not because you didn’t need to use these concepts or they were of no real importance. Rather because most languages and libraries have these things built in and you don’t normally have to implement them except in the most extreme cases.

I started going over my old college notes (yes, I still have them) to refresh my memory. I was surprised by how much I had forgotten. This post I’ll start with sorting algorithms, mainly as a refresher for me and hopefully this proves helpful to someone else. Each algorithm is implemented as a C function that takes two parameters. The first parameter is an array of integers. The second parameter is the length (number of elements) of the array.

Note that some of these examples do some things that break the actual efficiency of the algorithm (for instance using memmove() in insertion sort). This is done to focus simply on the sort algorithm without having to get into too much detail about implementing more efficient data structures that are more suited to the algorithm. Those instances are noted below.

Bubble Sort

The first sort is probably the simplest and most straight forward of all the sorts. It is the bubble sort. The bubble sort starts at the beginning of the array and compares the first two values. If the first value is greater than the second, they are swapped. It then compares the second and third elements swapping them if needed and continues with each pair of adjacent values in the array. Once the first pass is complete, the largest value is in the last position in the array. The algorithm starts again and repeats until no values need to be swapped.

Bubble sort has an average and worse case performance complexity of `0(n^2)`. Because of this poor performance it is not very useful for large data sets. However, it is easy to understand and easy to implement so it can be useful on smaller data sets in situations where efficiency isn’t terribly important.

void bubble_sort(int* arr, int len)
{
	int i, j, swp;
	for(i = 0; i < len; i++)
	{
		for(j = 0; j < len - i - 1; j++) 
		{ 
			if(arr[j] > arr[j + 1])
			{
				swp = arr[j];
				arr[j] = arr[j + 1];
				arr[j + 1] = swp;
			}
		}
	}
}

Selection Sort

The selection sort starts at the beginning of the data and scans for (selects) the minimum value. Once the minimum value is found, it is swapped with the value in the first position. The algorithm starts again at the second position and repeats. Selection sort also has performance complexity of `0(n^2)`. However, since it performs no more than `n` swaps it performs better than bubble sort.

void selection_sort(int* arr, int len)
{
	int i, j, si, swp;
	for(i = 0; i < len - 1; i++)
	{
		si = i;
		for(j = i + 1; j < len; j++)
		{
			if(arr[j] < arr[si])
				si = j;
		}

		if(si != i)
		{
			swp = arr[i];
			arr[i] = arr[si];
			arr[si] = swp;
		}
	}
}

Insertion Sort

Insertion sort is similar to selection sort in that it selects each value and inserts it into its proper place. When an element is inserted into its position, all the following elements must be shifted by one. This algorithm is useful in instances where insertion is fast such as with linked lists or in data that is already mostly sorted.

Note that this implementation I use an array using memmove() to make room for the insert. I do this for simplicity to demonstrate the algorithm. I wouldn’t recommend using this on an array in practice.

void insertion_sort(int* arr, int len)
{
	int i, j, si, swp;
	for(i = 0; i < len - 1; i++)
	{
		si = i;
		for(j = i + 1; j < len; j++)
		{
			if(arr[j] < arr[si])
				si = j;
		}

		if(si != i)
		{
			swp = arr[si];
			memmove(&arr[i + 1], &arr[i], (si - i) * sizeof(int));
			arr[i] = swp;
		}
	}
}

Shell Sort

The shell sort is a more advanced sorting algorithm that arranges the input data into a series of columns and then sorts each column using insertion sort. By sorting data in columns, the data is able to be moved over greater distances more quickly than would happen with the standard insertion sort. Here is the algorithm written in C. Following, I’ll walk through an application of the algorithm on a random data set.

void shell_sort(int* arr, int len)
{
	int i, j, gap, swp;
	gap = len / 2;
	while(gap > 0)
	{
		for(i = gap; i < len; i++)
		{
			swp = arr[i];
			j = i;
			while(j >= gap && arr[j - gap] > swp)
			{
				arr[j] = arr[j - gap];
				j -= gap;
			}

			arr[j] = swp;
		}

		gap = gap / 2;
	}
}

The random data set is an array of 15 randomly generated numbers. We first calculate the gap and arrange the data into columns based on the gap.
amath
[33, 38, 73, 29, 6, 72, 71, 19, 53, 57, 43, 38, 5, 83, 7]

l = 15, g = l / 2 = 7

[[33, 38, 73, 29, 6, 72, 71],[19, 53, 57, 43, 38, 5, 83],[7,,,,,,]]
endamath

In this algorithm we start in the first column and save the 19 in swp. We then check each value in the column above the 19 and if it is greater, move the number down the column by one. Finally we put the number in its correct position, in this case at the beginning of the column which will produce the following:
amath
[[19, 38, 73, 29, 6, 72, 71],[33, 53, 57, 43, 38, 5, 83],[7,,,,,,]]
endamath

We repeat for each column:
amath
[[19, 38, 57, 29, 6, 5, 71],[33, 53, 73, 43, 38, 72, 83],[7,,,,,,]]
endamath

Note that `j` began with the second element in each column and wraps to the third row (then 4th, etc… depending on how many elements are in each column) which looks at the `7` and puts it in its place.
amath
[[7, 38, 57, 29, 6, 5, 71],[19, 53, 73, 43, 38, 72, 83],[33,,,,,,]]

[7, 38, 57, 29, 6, 5, 71, 19, 53, 73, 43, 38, 72, 83, 33]
endamath

Notice now that the `7` which was last is now in the first position. Values have the potential to jump great distances quite quickly. Now we recalculate the gap and start again.
amath
g = g / 2 = 3

[[7, 38, 57], [29, 6, 5], [71, 19, 53], [73, 43, 38], [72, 83, 33]] => [[7, 6, 5], [29, 38, 57], [71, 19, 53], [73, 43, 38], [72, 83, 33]] => [[7, 6, 5], [29, 19, 53], [71, 38, 57], [73, 43, 38], [72, 83, 33]] => [[7, 6, 5], [29, 19, 38], [71, 38, 53], [73, 43, 57], [72, 83, 33]] => [[7, 6, 5], [29, 19, 33], [71, 38, 38], [72, 43, 53], [73, 83, 57]]

[7, 6, 5, 29, 19, 33, 71, 38, 38, 72, 43, 53, 73, 83, 57]
endamath

Finally, when gap is `1`, shell sort is essentially an insertion sort on a mostly sorted array for which it performs well.
amath
g = g / 2 = 1

[[7], [6], [5], [29], [19], [33], [71], [38], [38], [72], [43], [53], [73], [83], [57]] => [[6], [7], [5], [29], [19], [33], [71], [38], [38], [72], [43], [53], [73], [83], [57]] => [[5], [6], [7], [29], [19], [33], [71], [38], [38], [72], [43], [53], [73], [83], [57]] => [[5], [6], [7], [19], [29], [33], [71], [38], [38], [72], [43], [53], [73], [83], [57]] => [[5], [6], [7], [19], [29], [33], [38], [71], [38], [72], [43], [53], [73], [83], [57]] => [[5], [6], [7], [19], [29], [33], [38], [38], [71], [72], [43], [53], [73], [83], [57]] => [[5], [6], [7], [19], [29], [33], [38], [38], [43], [71], [72], [53], [73], [83], [57]] => [[5], [6], [7], [19], [29], [33], [38], [38], [43], [53], [71], [72], [73], [83], [57]] => [[5], [6], [7], [19], [29], [33], [38], [38], [43], [53], [57], [71], [72], [73], [83]]

[5, 6, 7, 19, 29, 33, 38, 38, 43, 53, 57, 71, 72, 73, 83]
endamath

Merge Sort

Merge sort is an `O(nlogn)` sorting algorithm that gets its name from merging two already sorted list into a single sorted list. The algorithm begins by dividing the data in two and recursively sorting the two sub lists. When the recursive calls return, the sub lists are sorted and can then be merged into a final list.

Note, this implementation is inefficient using an inplace array and memmove. In practice, merge sort works best where insertions are inexpensive such as linked lists or copying the two sub arrays into a separate large array.

void merge_sort(int* arr, int len)
{
	int *l, *r;
	int i, swp, lenl, lenr;

	if(len <= 1)
		return;

	lenl = len / 2;
	lenr = len - lenl;

	l = arr;
	r = arr + lenl;

	merge_sort(l, lenl);
	merge_sort(r, lenr);

	for(i = 0; i < len && lenl > 0 && lenr > 0; i++)
	{
		if(*l > *r)
		{
			swp = *r;
			memmove(&arr[i + 1], &arr[i], (r - &arr[i]) * sizeof(int));
			arr[i] = swp;
			r++; lenr--;
			l++;
		}
		else
		{
			l++; lenl--;
		}
	}
}

Heap Sort

Heap sort uses a heap data structure to determine the minimum or maximum value in the data. The data is first placed in a heap structure, in this example a max-heap. Once in this structure, the largest value is at the top of the heap. The heap here is represented in the array where array[0] is the root node, array[1] is the left child, array [2] is the right child, array [3] is the left child of array[1], array[4] is the right child of array[1], etc…

Once the data is heapified, the largest value (root) is swapped with the last value in the array (its final position) and the array is again heapified (excluding the value in now in its correct position). This process is repeated until the array is completely sorted.

void sift_heap(int* arr, int r, int len)
{
	int c, si, swp;

	while(r * 2 + 1 < len)
	{
		c = r * 2 + 1;
		si = r;
		if(arr[si] < arr[c])
			si = c;
		if(c < len - 1 && arr[si] < arr[c + 1])
			si = c + 1;
		if(si != r)
		{
			swp = arr[r];
			arr[r] = arr[si];
			arr[si] = swp;
			r = si;
		}
		else
			break;
	}
}

void heapify(int* arr, int len)
{
	int i, r, c, si, swp;
	for(i = len / 2 - 1; i >= 0; i--)
	{
		sift_heap(arr, i, len);
	}
}

void heap_sort(int* arr, int len)
{
	int i, swp;
	heapify(arr, len);
	for(i = len; i > 1; i--)
	{
		swp = arr[0];
		arr[0] = arr[i - 1];
		arr[i - 1] = swp;

		sift_heap(arr, 0, i - 1);
	}

}

Quick Sort

Quick sort works by choosing an “pivot” element and moving all elements with a value lower than the pivot value before the pivot and all elements higher than the pivot after. Once the values have been moved, the pivot is in its final position and quick sort is recursively called on each sub list before and after the pivot.

Quick sort has an average complexity of `O(nlogn)` but can have a worst case `o(n^2)` with poorly chosen pivots (for example if the pivot chosen is always the largest or smallest value).

void quick_sort(int* arr, int len)
{
	int i, j, p, swp;
	if(len <= 1)
		return;

	p = 0;
	j = 1;
	for(i = 1; i < len; i++)
	{
		if(arr[i] < arr[p])
		{
			swp = arr[i];
			arr[i] = arr[j];
			arr[j] = swp;
			j++;
		}
	}

	swp = arr[p];
	arr[p] = arr[j - 1];
	arr[j - 1] = swp;

	quick_sort(arr, j);
	quick_sort(arr + j, len - j);
}

Bucket Sort

Bucket sort organizes the data into buckets that hold a range of values, for example 0-10, 11-20, 21-30, etc... Then each bucket is sorted using a different algorithm, in this case, insertion sort.

void bucket_sort(int* arr, int len)
{
	int i, j, k, n, max, **buckets;
	max = arr[0];
	for(i = 1; i < len; i++)
		if(arr[i] > max)
			max = arr[i];

	n = (int)floor(sqrt(max));
	n = max / n;

	buckets = (int**)malloc(sizeof(int*) * (n + 1));
	memset(buckets, 0, sizeof(int*) * (n + 1));

	for(i = 0; i < len; i++)
	{
		if(buckets[arr[i] / n] == NULL)
		{
			buckets[arr[i] / n] = (int*)malloc(sizeof(int) * len);
			memset(buckets[arr[i] / n], -1, sizeof(int) * len);
		}

		for(j = 0; buckets[arr[i] / n][j] != -1; j++);
		buckets[arr[i] / n][j] = arr[i];
	}

	k = 0;
	for(i = 0; i <= n; i++)
	{
		if(buckets[i])
		{
			for(j = 0; buckets[i][j] != -1; j++)
			{
				arr[k++] = buckets[i][j];
			}

			free(buckets[i]);
		}
	}

	free(buckets);

	insertion_sort(arr, len);
}

Tags: , ,

Sunday, March 6th, 2011 Algorithms, C, Programming No Comments