8 data structures every programmer must know

A data structure is a special way of organizing and storing data that allows us to perform operations on the stored data more efficiently. Data structures have wide and varied uses in computer science and software engineering.

Insert image description here
Almost every program or software system developed uses data structures. Furthermore, data structures are fundamental to computer science and software engineering. This is a key topic when it comes to software engineering interview questions. Therefore, as developers, we must have a good understanding of data structures.

In this article, I will briefly explain 8 common data structures that every programmer must know.

1.Array

Arrays are fixed-size structures that can hold items of the same data type. It can be an array of integers, an array of floats, an array of strings or even an array of arrays (e.g. a 2D array). Arrays are indexed, which means random access is possible.

Insert image description here
Fig 1. Visualization of basic Terminology of Arrays

Array operations
· Traversal: Traverse all elements and print them.

· Insert: Insert one or more elements into an array.

· Delete: Remove elements from an array

· Search: Search for an element in an array. You can search for elements by their value or index

Update: Updates the value of an existing element at a given index

Applications of Arrays
· Used as a basis for building other data structures, such as array lists, heaps, hash tables, vectors, and matrices.

· Used for different sorting algorithms such as insertion sort, quick sort, bubble sort and merge sort.

2. Linked list

A linked list is a sequential structure consisting of a linear sequence of items linked to each other. So you have to access the data sequentially and random access is not possible. Linked lists provide a simple and flexible representation of dynamic sets.

Let us consider the following terms regarding linked lists. You can get a clear idea by referring to Figure 2.

· The elements in a linked list are called nodes.

· Each node contains a key and a pointer to its successor node (called next).

· The attribute named head points to the first element of the linked list.

· The last element of the linked list is called the tail.
Insert image description here

Fig 2. Visualization of basic Terminology of Linked Lists

Below are the various types of linked lists available.

· Singly linked list—Items can only be traversed in the forward direction.

· Doubly linked list - items can be traversed in forward and backward directions. The node consists of an additional pointer called previous, which points to the previous node.

· Circular linked list—A linked list in which the previous pointer at the head points to the tail and the next pointer at the tail points to the head.

Linked list operations
· Search: Find the first element with key k in the given linked list through a simple linear search and return a pointer to the element

· Insert: Insert a key in the linked list. Insertion can be done in 3 different ways; insert at the beginning of the list, insert at the end of the list, and then insert in the middle of the list.

· Delete: Delete element x from the given linked list. You cannot delete nodes in a single step. Deletion can be done in 3 different ways; delete from the beginning of the list, delete from the end of the list, and delete from the middle of the list.

Application of linked lists
· Used for symbol table management in compiler design.

· Used to switch between programs using Alt Tab (implemented using circular linked lists).

3.Stack

A stack is a LIFO (last-in-first-out - the element placed last can be accessed first) structure commonly found in many programming languages. This structure is called a "stack" because it resembles a real-world stack - a stack of boards.
Image Source: pixabay
Stack Operations
Given below are 2 basic operations that can be performed on the stack. Please refer to Figure 3 for a better understanding of stack operations.

· Push Push: Insert an element at the top of the stack.

· Pop: Remove the top element and return it.

Insert image description here
Fig 3. Visualization of basic Operations of Stacks

Additionally, the following additional functions are provided for the stack to check its status.

· Peep: Returns the top element of the stack without deleting it.

· isEmpty: Checks whether the stack is empty.

· isFull: Check if the stack is full.

Applications of the stack
· For expression evaluation (for example: the shunting yard algorithm for parsing and evaluating mathematical expressions).

· Used to implement function calls in recursive programming.

4.Queue

A queue is a FIFO (first in first out - elements placed first can be accessed first) structure commonly found in many programming languages. This structure is called a "queue" because it resembles a queue in the real world - people wait in a queue.
Image Source: pixabay
Queue Operations
Given below are 2 basic operations that can be performed on a queue. Please refer to Figure 4 for a better understanding of stack operations.

· Enqueue: Insert the element into the end of the queue.

· Dequeue: Remove elements from the beginning of the queue.
Insert image description here Fig 4. Visualization of Basic Operations of Queues

Queue applications
· Used to manage threads in multi-threads.

· For implementing queuing systems (e.g. priority queues).
Insert image description here

5. Hash table

A hash table is a data structure used to store values with keys associated with each key. Additionally, it effectively supports lookup if we know the key associated with the value. Therefore, insertions and searches are very efficient regardless of data size.

When stored in a table, direct addressing uses a one-to-one mapping between values and keys. However, this approach is problematic when there are a large number of key-value pairs. The table will have many records, be very large, and may be impractical or even impossible to store given the memory available on a typical computer. To avoid this problem, we use a hash table.

Hash Function
A special function called hash function (h) is used to overcome the above problems in direct addressing.

In direct access, the value with key k is stored in slot k. Using a hash function, we can calculate the index of the table (slot) that each value points to. The value calculated using a hash function for a given key is called a hash value, and it represents the index of the table to which the value maps.

· h: Hash function

· k: The key whose hash value should be determined

· m: size of the hash table (number of available slots). A prime number that is not close to an exact power of 2 is a good choice for m.

Insert image description here

Fig 5. Representation of a Hash Function

· 1→1→1

· 5→5→5

· 23→23→3

· 63→63→3

From the last two examples given above, we can see that collisions occur when a hash function generates the same index for multiple keys. We can resolve collisions by choosing a suitable hash function h and using techniques such as chaining and open addressing.

Application of hash table
· Used to implement database indexes.

· Used to implement associative arrays.

· Used to implement the "settings" data structure.

6.Tree

A tree is a hierarchical structure in which data is organized hierarchically and linked together. This structure differs from a linked list, in which items are linked in linear order.

Over the past few decades, various types of trees have been developed to suit certain applications and meet certain constraints. Some examples are binary search trees, B-trees, red-black trees, expanded trees, AVL trees, and n-ary trees.

Binary Search Tree
As the name suggests, a Binary Search Tree (BST) is a binary tree in which data is organized in a hierarchical structure. This data structure stores values in sorted order, which we will study in detail in this course.

Each node in a binary search tree contains the following properties.

· key: the value stored in the node.

· left: Pointer to the left child.

· Right: Pointer to the correct child.

· p: Pointer to the parent node.

Binary search trees have unique properties that distinguish them from other trees. This property is called the binary-search-tree property.

Let x be a node in the binary search tree.

· If y is a node in the left subtree of x, then y.key ≤ x.key

· If y is a node in the right subtree of x, then y.key ≥ x.key

Insert image description here

Fig 6. Visualization of Basic Terminology of Trees.

Applications of trees
· Binary tree: used to implement expression parsers and expression solvers.

· Binary search trees: used in many search applications that continuously input and output data.

· Heap: used by the JVM (Java Virtual Machine) to store Java objects.

· Trap: for wireless networks.

7.Heap

A heap is a special case of a binary tree where the values of parent nodes are compared with their children and arranged accordingly.

Let's see how to represent a heap. Heaps can be represented using trees and arrays. Figures 7 and 8 show how we use binary trees and arrays to represent binary heaps.

Fig 7. Binary Tree Representation of a Heap

Insert image description here

Fig 8. Array Representation of a Heap

Heaps can be of 2 types.

· Min-heap - The parent's key is less than or equal to the child's key. This is called the min-heap attribute. The root will contain the minimum value of the heap.

· Maximum heap size - The parent's key is greater than or equal to the child's key. This is called the max-heap attribute. The root will contain the maximum size of the heap.

Applications of Heap
· Used to implement priority queues because priority values can be ordered based on heap properties.

· Queue functionality can be implemented using a heap in O(log n) time.

· Used to find the k smallest (or largest) values in a given array.

· Used in the heap sort algorithm.

8. Figure

A graph consists of a finite set of vertices or nodes and a set of edges connecting these vertices.

The order of a graph is the number of vertices in the graph. The size of a graph is the number of edges in the graph.

If two nodes are connected to each other by the same edge, they are called adjacent nodes.

Directed Graph
A graph G is called a directed graph if all its edges have directions indicating what is the starting vertex and what is the ending vertex.

We say that (u, v) is incident on or leaves vertex u and then incident on or enters vertex v.

Self-loop: an edge from a vertex to itself.

Undirected Graph
If all the edges of a graph G have no direction, it is called an undirected graph. It can propagate between two vertices in two ways.

A vertex is said to be isolated if it is not connected to any other node in the graph.
Insert image description here

Fig 9. Visualization of Terminology of Graphs

Applications of graphs
· Used to represent social media networks. Each user is a vertex, and an edge is created when users are connected.

· Web pages and links used to represent search engines. Web pages on the Internet are linked to each other through hyperlinks. Each page is a vertex, and the hyperlink between two pages is an edge. Used for page ranking in Google.

· Used to represent locations and routes in GPS. Locations are vertices, and routes connecting locations are edges. Used to calculate the shortest path between two locations.