Abstract Data Types
All programming languages come with a set of basic (or primitive) data types. These normally include at least the following, although there can be many others:
- Integers - whole numbers
- Floats - that is, floating point data types which are numbers that may have a decimal point.
- Strings - sequences of characters or text
In many programming languages when you make a variable (declare it) you are asked to specify which of these types it is going to be. Some programming languages, like Python, are weakly typed which means you don’t actually have to directly specify the type if you don’t want to and can let the programming language automatically figure it out from the context of whatever value you put into the variable (assigned to it). If you put some text into the variable, then the language realises the variable type must be a string. If you put a decimal number in then it realises it must be a float. You can be assured though that under the hood in the internals of the language, the variable will have a type even if you don’t specify it or really know what it is. The opposite of weak typing is strong typing where the programming language requires you to state the type of the every variable in advance.
Primitive types are conceptually very simple. You have a variable (a storage box) into which you can put a value. You can retrieve the value from the box and replace it with a different value and that’s about all you can do. Modern programming languages though come bundled with more complex Abstract Data Types, which are kinds of storage boxes that have more sophisticated funtionality than the primitive types.
One such kind of storage box is the list. These store a collection of values instead of just a single value. You can put as many items as you want into the box (as long as you have enough memory). The list usually allows you to add an item to the list, retrieve an item from a specific position in the list (e.g. get the 3rd item) and also insert items into the list.
A list is somewhat like an array. One difference is an array is a pre-allocated set of empty slots. You ask for an array with 100 slots and can immediately insert at position 99 if you want. A list usually starts with no items allocated and you must extend it to the length you need. If you want to add to the list at position 99 you must first insert 98 preceding items in order to insert the 99th item.
An advantage of the list is when an array is created with 100 items, it then immediately consumes the memory necessary to store 100 items but a list does not. A list only uses the memory necessary to store 100 items when that many are inserted. With an array you must know exactly how many slots you need in advance, but with a list you do not need to specify this. A list is therefore useful when it is not known how many items need to be stored at the point it is created. A list though usually requires more memory than an array to store the same number of items due to the costs of the internal housekeeping it does. Therefore, if the precise number of items that need to be stored is known in advance, an array could be a more efficient choice.
Another advantage of the list is that it usually provides features to easily insert and remove items in the middle, whereas an array does not. If you have 100 items in an array and need to insert at item 50, firstly you have to perform a complicated and costly operation to move items 50 - 100 one place to the right to make a space in which the new item can be inserted. This is not necessary with a list and you can just tell the programming language to insert at item 50 and it works out how to do it. One method by which lists are implemented is through the linked list algorithm.
A queue data type is a more restrictive kind of list. You can add to the end of the list and remove from the front. You cannot usually change what goes on the middle. This is somewhat like a supermarket queue where you join the end and are served when you reach the front. Each person is served in the order they originally arrived. A queue is also known as a First-In-First-Out structure (or FIFO).
Queues are commonly used in programming to provide interfaces between operations that may work at different speeds. A computer can prepare pages of text in a fraction of a second, but a printer can only print the text comparatively slowly, possibly taking several minutes. Pages of text to be printed may be held on a queue pending printing. The computer inserts the pages at the end of the queue very rapidly and the printer retrieves the pages one at a time from the front as it is able. When the computer has finished inserting all the pages into the queue, it may move onto other tasks even if the printer has not yet printed all the pages, so it does not become stuck waiting for the slow printer to finish.
A stack data type works like a pile of paper. You can add a piece of paper only to the top of the pile. You can also remove items but again only from the top of the pile. If you want to read the item from the bottom of the pile, you have to remove all of the pieces of paper from the top one-by-one until you get to the last item at the bottom. A stack is also known as a Last-In-First-Out (or LIFO) queue.
The list type in a programming language may provide the features of both a stack and queue, however the latter types may exist separately for specialist reasons.
A tree maintains the items in the box in an ordered way and is used to speed up searching. With a list, if a programmer wants to find a specific item they must start at the beginning of the list and work their way along until the item is found (which might be right at the end). This is not necessary in a tree. There are several kinds of tree but one common kind is the binary tree.
A dictionary is another structure for speeding up searching of items. Items placed in the box can be quickly retrieved by item name, rather than having to search through them one at a time. A common method of implementing the dictionary is with a hash table. In some programming languages dictionaries are called hash tables instead after the implementation method.
In most programming languages it is possible to make your own abstract data types. You could make a new abstract type that works differently to the above to solve a specific problem. In older programming languages you used to have to make all of them yourself, but these days at least the above (and a lot more) are usually available as standard.
Photo Credit: Jim Champion