Linear Search versus Binary Search

There are many ways of approaching the problem of searching for an item within an array. A linear search simply looks through every single item in a list in order until the relevant item is found. If we imagine searching a dictionary for a word, a linear search would start at “A” and look through every word beginning with that letter, then move onto the “B” section, then the “C” section and so on until the word is found.

This is not the most efficient way in which to use the dictionary though. If the word we are looking for is “transistor”, we’ll have to trawl through nearly the entire book to find it as it’s nearer the end. A faster approach is to take advantage of the fact that the dictionary is in a sorted order. This is what a binary search does.

Let’s say the dictionary has 200 pages. To perform a binary search we start by dividing the pages of the dictionary into two even sets right down the middle. So we go to the centre of the dictionary and have exactly 100 pages to the left and 100 pages to the right. At the exact centre of the dictionary we find the word “golf”.

Now we ask the question is transistor in the left set of 100 pages or the right set? We know due to the sorted order of the words it must be on a later page than golf so it’s in the right-hand set of 100 pages. Now we consider the right pages and divide them in two parts exactly down the middle. So we are now at page 150 of the dictionary with 50 pages to the left and 50 pages to the right. We find ourselves at the word “psychic”. Now we ask again is transistor in the left or right set of pages? It’s in the right set, so we divide the right hand set of pages in two again. We go the middle (page 175) and now have 25 pages to the left and 25 pages to the right. Now we are at the word “steamroller”. Transistor is still further along, so we repeat the process again and end up at page 187 with the word “trusted”. We have 12 (and a half) pages to the left and the same to the right.

Transistor is to left of trusted in the dictionary. So we go to the left hand set of 12.5 pages and divide it in two again exactly down the middle. We have 6.25 pages either side and are at the word “tautology”. Dividing into two groups again, transistor is to the the right. Then we have a choice between two groups of three pages and finally we have whittled it down to almost the exact page that the word transistor is on.

We could keep going below the size of one page and consider parts of a page but when we get fairly close to what we are looking for it is reasonable to stop doing a binary search, switch to a linear search and just simply look through the few remaining items in order.

A binary search works only if the data is in sorted order. If the data is not sorted, then we will have to resort to a linear search. If we only want to look through the data once, sorting it could take longer than performing a linear search. However, if we want to search the data several times, then it may be faster to sort it first and then do a binary search for each of of the items we want to look up.

Implementation of a Binary Search in Python

The following code shows an implementation of a binary search in Python, which looks for an animal within a list of animals in sorted order:

#An implementation of a binary search algorithm to search an array of items
def binary_search_algorithm( data, search_query ):

	#Start with the upper bound being the last item in the data
	upper_bound = len(data) - 1
	#The lower bound is the first item in the data
	lower_bound = 0
	
	while lower_bound <= upper_bound:
	
		#Get the array index of the middle item 
		middle_index = int(( lower_bound + upper_bound )/2)
		
		#Check if the middle item is the item we are currently searching for
		if data[middle_index] == search_query:
			#if so, return the array index where it is found
			return middle_index
		
		#Is the item we are looking for to the left of middle?
		elif search_query < data[middle_index]:
			#Change the upper bound to be (just to the left of) the middle but keep the lower bound the same
			upper_bound = middle_index - 1
			
		#Otherwise the item we are looking for is to the right of middle
		else:
			#Change the lower bound to be (just to the right of) middle but keep the upper bound the same
			lower_bound = middle_index + 1
			
	#The item does not exist in the data
	return None

if __name__ == "__main__":

	#This is the list of data items to search within
	data_to_search = [ "aardvark", "cat", "dog", "duck", "fox", "hippo", "pigeon", "rabbit", "squirrel", "zebra" ]
	
	#This is the specific item we are looking for
	search_for_item = "hippo"
	
	#Run the binary search
	search_result = binary_search_algorithm( data_to_search, search_for_item )
	
	if search_result is None:
		print(f'{search_for_item} was not found in the data')
	else:
		print(f'{search_for_item} was found in the data at array index {search_result}')