Back to home
## Graph Data Structure

Logicmojo - Updated Dec 19, 2022

Graphs are flexible data structures that may represent and solve a wide range of real-world problems. Electronic circuit boards, highways and intersections inside a city, airline routes or roadways connecting different cities, product catalogues, movies and actors, and so on can all be represented using graphs. Interview questions on graphs are frequently asked in programming interviews due to their utility in representing a wide range of real-world circumstances.

Graphs are not basic data structures in software programming, but they are used to simulate graph representations using other core data structures such as arrays, sets, and so on.

If you're interviewing for a job as a Java Developer or a software engineer, you can be asked multiple questions about this topic. Answering questions
based on graph and related programmes might be difficult for applicants. In this article, we'll look at some of the most often
requested graph interview questions. This article will

assist you in better comprehending the notion and preparing you to land your dream job!

A graph is a data structure made up of a finite number of nodes (or vertices) and the edges that connect them. An edge is a pair of vertices (x,y) that conveys that the x vertex connects to the y vertex.

Graphs are used to address real-world problems in which the problem area is represented as a network. Telephone networks, circuit networks, and social networks are examples of networks (like LinkedIn, Facebook etc.).

Graph | Tree |
---|---|

It is non-linear data structure | It is non-linear data structure. |

Each node can have any number of edges. | Nodes in general trees can have any number of child nodes. In binary trees, however, each node can have no more than two children. |

In a graph, there is no such thing as a root node. | In trees, there is a unique node known as the root. |

A cycle can be formed on graph. | In tree cycle can't be formed |

It is used to discover the shortest path in a networking graph. | The tree is utilised in applications such as game trees and decision trees. |

EdgeList: We have a two-dimensional array of vertex numbers, or an array of objects holding the vertex numbers of the vertices where the edges intersect (plus weight). Edge lists are straightforward, but if we want to see if the graph contains a specific edge, we must search the edge list. A linear search through E edges occurs when the edges appear in the edge list in no particular order.

Example: [ [0,1], [0,6], [0,8], [1,4], [1,6], [1,9], [2,4], [2,6], [3,4], [3,5], [3,8], [4,5], [4,9], [7,8], [7,9] ]

Adjacency Matrices: We can query if edge(i j) is in the graph with an adjacency matrix by looking at the relevant item in the matrix - we can query whether edge(i j) is in the graph by looking at graph[i][j] value. The adjacency matrix for a sparse graph is primarily 0s, and we require a lot of space to describe only a few edges. The adjacency matrix for an undirected graph is symmetric.

[ [0, 1, 0, 0, 0, 0, 1, 0, 1, 0], [1, 0, 0, 0, 1, 0, 1, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 0, 0, 0], [0, 0, 0, 0, 1, 1, 0, 0, 1, 0], [0, 1, 1, 1, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 1, 0, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1], [1, 0, 0, 1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 1, 0, 0, 1, 0, 0] ]

Adjacency Lists: Keep an array of the vertices adjacent to each vertex i. (or array of tuples for weighted graph). We go to i's adjacency list in constant time and look for j in i's adjacency list to see if an edge (i,j) is present in the graph.

[ [1, 6, 8], // 0 [0, 4, 6, 9], // 1 [4, 6], // 2 [4, 5, 8], [1, 2, 3, 5, 9], [3, 4], [0, 1, 2], [8, 9], [0, 3, 7], [1, 4, 7] ] // N

The depth-first search algorithm is used to traverse or search data structures such as trees and graphs. The algorithm starts at the root node (in the case of a graph, any random node can be used as the root node) and examines each branch as far as feasible before backtracking.

Each vertex of the graph is classified into one of two groups in a basic DFS implementation.

The algorithm's goal is to avoid cycles by marking each vertex as visited.

Algorithm

1. Place any of the graph's vertices on top of a stack to begin.

2. Add the top item in the stack to the list of things you've seen.

3. Make a list of the nodes that are adjacent to that vertex. Place those that aren't on the visited list at the top of the stack.

4. Repeat steps 2 and 3 until the stack is exhausted.

Graphs are used in computer science to depict the flow of computation.

Google Maps builds transportation systems using graphs, in which the intersection of two (or more) roads is referred to as a vertex, and the road connecting two vertices is referred to as an edge. As a result, their navigation system is based on an algorithm that calculates the shortest path between two vertices.

Users on Facebook are referred to as vertices, and if they are friends, there is an edge connecting them. Graph theory is used in Facebook's Friend recommendation system. Facebook is an example of undirected graph.

We come across the Resource Allocation Graph in the Operating System, where each process and resource are regarded vertices. From resources to assigned processes, or from the requesting process to the desired resource, edges are drawn. A stalemate will develop if this results in the establishment of a cycle.

BFS is a graph traversal technique in which you start at a specific node (source or starting node) and traverse the graph layer by layer, thus examining the neighbours (nodes which are directly connected to source node). Then you must proceed to the next-level neighbour nodes.

The following is how the algorithm works:

1. Put any of the graph's vertices at the end of a queue to begin.

2. Add the item at the front of the queue to the visited list.

3. Make a list of the nodes that are adjacent to that vertex. Place those that aren't on the visited list at the bottom of the queue.

4. Steps 2 and 3 should be repeated until the queue is empty.

BFS | DFS |
---|---|

For determining the shortest path, BFS (Breadth First Search) employs a Queue data structure. | For determining the shortest path, DFS (Breadth First Search) employs a Stack data structure. |

Because BFS reaches a vertex with the fewest number of edges from a source vertex, it can be used to identify single source shortest path in an unweighted graph. | To get from a source vertex to a destination vertex in DFS, we may have to traverse through additional edges. |

BFS is better at finding vertices that are close to the specified source. | When there are solutions that are not available at the source, DFS is a better option. |

BFS has a time complexity of O(V + E) when using an Adjacency List and O(V2) when using an Adjacency Matrix, where V stands for vertices and E stands for edges. | DFS has a time complexity of O(V + E) when using an Adjacency List and O(V2) when using an Adjacency Matrix, where V stands for vertices and E stands for edges. |

BFS necessitates a larger amount of memory. | DFS has a lower memory requirement. |

A bipartite graph, also known as a bi-graph, is a set of graph vertices, or locations where several lines intersect, that has been decomposed into two disjoint sets, meaning they have no element in common and no two graph vertices inside the same set are adjacent.

DFS's time complexity is not O(V+E).

In reality, it is dependent on the data structure you use to represent your graph. If you express your graph with an adjacency matrix, for example, it will be O(V2).i.e. independent
of E; alternatively, if you express your graph with an edge list, it will be O. (VE)

It's only O(V+E) if you express your graph with an adjacency list. Assume you have an adjacency list in which each node's neighbours are kept as a link list (or may be vector.) Each node and its neighbours will be explored exactly once in DFS. Exploring node i takes O(1) time, while exploring its neighbours takes O(degree(i)) time.

O(Σ(1+degree(i))) 1<=i<=V

As Σdegree(i) = 2*E = O(E)

So O( Σ(1+degree(i)) ) = O(V+E)

DFS is a recursive algorithm; you can construct it using a stack, but that doesn't change the fact that it's recursive, thus DO NOT USE it on infinitely deep graphs. For infinitely deep graphs, DFS is not a complete method (it does not guarantee to reach the goal if there is any).

Using DFS is not a smart idea even if your graph is very deep but you know your aim is shallow. Also, because BFS requires all current nodes to be kept in memory, DO NOT USE it on graphs with a large branching factor, otherwise your machine will quickly run out of memory.

In general, depending on the graph, it may or may not be possible. A stack is used in a depth-first search, which contains nodes from the root to the node being searched. As a result, the graph's radius is the maximum.

A queue is used in a breadth-first search, which contains nodes at the front of the search. As a result, at most all nodes at d.

In general, all you can say is that it is at least all nodes in the tree in both cases. However, if you have a balanced (or nearly balanced) k-ary tree, its depth, or radius, will be just O(log(n)), but the lowest layer will have O(n) nodes (n/2 for binary trees and even more for higher degrees).

As a result, depth-first search will utilise just O(log(n)) memory, while breadth-first search would use O(n) memory (n). For balanced k-ary trees; other situations may provide different conclusions (but for most common graphs diameter will still be significantly less than circumference).

For a Directed Acyclic Graph (DAG), topological sorting is a linear ordering of vertices in which vertex u occurs before v in the ordering for any directed edge u v. If the graph is not a DAG, topological sorting is not possible.

Topological sorting has a wide range of applications, including ranking problems such as the feedback arc set. Topological sorting is achievable even if the DAG has disconnected components.

It's easy to find cycles in a basic graph, as in the first two examples in this article. We may go over all of the vertices and see whether any of them are connected to each other or to another connected vertex. However, in more complex cases, this isn't likely to work.

So, one well-known way for locating cycles is to use Depth-First-Search (DFS). DFS Trees are the result of traversing a graph using DFS. The DFS Tree is primarily a reorganisation of graph edges and vertices. We've classed the edges as tree edges, forward edges, back edges, and cross edges once we've built the DFS trees.

For a single graph, we can have many DFS trees. In our example, the cross edge is the edge that connects the vertex of one DFS tree to the vertex of another DFS tree in our graph. If a cross edge does not connect a parent and child, it can be found within the same DFS tree (an edge between different branches of the tree).

Let's look at two binary tree nodes, n1 and n2.

The shared ancestor of n1 and n2 that is positioned farthest from the root is known as the lowest common ancestor (LCA).

To locate the LCA, use the approach described below.

Create an array with the route from the root node to n1.

Create an array with the route from the root node to n2.

Continue along both pathways until the value in both arrays is the same.

Check if a graph is bipartite with this algorithm:

Using Breadth First Search, the following is a simple technique for determining whether a given graph is Bipartite or not (BFS).

Assign the source vertex the colour RED (putting into set U).

Color all of your neighbours in BLUE (putting into set V).

Color all of your neighbours' neighbours in RED (putting into set U).

Assign colour to all vertices in this fashion so that it meets all of the criteria of the m-way colouring problem, where m = 2.

If we find a neighbour with the same colour as the current vertex while assigning colours, the graph cannot be coloured with two vertices (or graph is not Bipartite)

We can use Dijkstra's technique to discover the shortest path between any two network vertices.

Because the shortest distance between two vertices may not include all of the graph's vertices, it differs from the least spanning tree.

Algorithm Steps:

Except for the source vertex, which has a distance of 0, set all vertice distances to infinity.

Push the source vertex into a min-priority queue in the form (distance, vertex), because the min-priority queue's comparison will be based on vertices distances.

Pop the vertex with the shortest distance from the priority queue (the popped vertex was initially the source).

In the case of "current vertex distance + edge weight next vertex distance," update the distances of the related vertices to the popped vertex, then push the vertex.

due to the increased distance from the priority queue

Continue without using the popped vertex if it has already been visited.

Repeat the process until the priority queue is no longer full.

A data structure that keeps track of a collection of elements partitioned into a number of distinct (non-overlapping) subsets is known as a disjoint-set data structure. An algorithm that performs two beneficial actions on such a data structure is known as a union-find algorithm.

Find: Determine which subset a specific element belongs to. This is useful for detecting whether two elements belong to the same subset.

Union: Join two subsets together to form a single subset. First, we must determine whether the two subsets are members of the same set. If the answer is no, we will be unable to form a union.

A * algorithm is a path-finding algorithm that looks for the shortest path between the starting and ending states. It's utilised in a variety of applications, including maps.

The A* algorithm is used in maps to find the shortest distance between a source (initial state) and a destination (final state) (final state).

Consider a square grid with numerous obstacles strewn about at random. The first and last cells are presented. The goal is to go to the last cell in the quickest time possible.

There are three parameters in the A* algorithm:

g: the cost of travelling from the first to the second cell. It's basically the sum of all the cells that have been visited since the initial one.

h: The projected cost of travelling from the present cell to the end cell, also known as the heuristic value. The final cell must be reached before the exact cost can be determined. As a result, h represents the expected cost. We must make certain that the cost is never underestimated.

f is equal to the sum of g and h. As a result, f = g + h

In a full graph with N vertices, the total number of potential edges can be expressed as,

Total number of edges in a complete graph of N vertices = (n * (n – 1)) / 2

If a path connects all pairs of vertices in a directed graph, it is said to be strongly linked. A maximal strongly connected subgraph is a strongly connected component (SCC) of a directed graph.

Using Kosaraju's approach, we can locate all strongly connected components in O(V+E) time.

If deleting a vertex (and the edges connecting it) disconnects an undirected connected graph, it is called an articulation point (or cut vertex). Articulation points are single points in a connected network that, if they fail, break the network into two or more components. They are beneficial in the creation of dependable networks.

We have now reached the end of this page. With the knowledge from this page, you should be able to create your own programmes with some research, and it's in fact recommended to hone your programming skills with small projects. There is no way to cover all the information you need to be a successful programmer in one course. In fact, programming is a constant learning process, regardless of whether you are a seasoned professional developer or a newbie.