You may be wondering what questions you'll face in your next data structure interview. Just remember that data structure interviewers aren't trying to trick you and don't expect perfection, but it's their opportunity to ascertain your knowledge before they invest in your employment. Proper preparation is always advised.
Data structures interview questions are an essential part of any programming interview, mainly one for Data Science and Java-based role. Thorough understanding and deep knowledge of data structures and algorithms will serve as a benefit for you to stand out from other candidates. The following Data Structures interview questions will help you crack your next interview!
Many a times Computer Science graduates devalue the importance of learning data structures and algorithms considering it as complicated, irrelevant or a waste of time. However they soon get a reality check when they enter the real-world for job hunting. Big Companies like Amazon, Google, Microsoft often ask questions related to algorithms and data structures to check the problem-solving abilities of the candidates
So, Prepare yourself for tech interview by learning Data Structures and Algorithms Concepts
It's really important to understand the real-world significance of algorithms and its properties because using different ideas one can design many algorithms for computing a solution to a given problem. Key important questions in algorithms are
ðŸš€ How do we design good algorithms?
ðŸš€ How do we know that our algorithm is efficient ?
ðŸš€ How to efficiently implement algorithms in a programming language?
Let's look at some important product-based companies that test data Structures and algorithms skills in coding rounds:
|Company||Average Package in INR|
Congratulations, you are ready to put your skills into practice! In a real coding interview, you will be given a technical question (or questions) by the interviewer, write code in a real-time collaborative editor (phone screen/virtual onsite) or on a whiteboard (onsite) to solve the problem within 30-45 minutes. This is where the real fun begins!
Many candidates jump into coding the moment they hear the question. That is usually a big mistake. Take a moment to repeat the question back at the interviewer and make sure that you understand exactly what they are asking. Repeating back/rephrasing the question will reduce chances of miscommunication.
Some common questions you can ask:
1.How big is the size of the input?
2.Are there duplicates within the input?
3.Can I destroy the original array/graph/data structure?
4.What kind of values are there? Are there negative numbers? Floating points? Will there be empty inputs?
After you have sufficiently clarified the scope and intention of the problem, start with a brute force approach, communicate it to the interviewer, explain the time and space complexity and why it is bad. It is unlikely that the brute force approach will be one that you will be coding.
At this point, the interviewer will usually pop the dreaded "Can we do better?" question, meaning that they are looking for a more optimal approach. In general, look for repeated work and try to optimize them by potentially caching the calculated result somewhere and reference it later, rather than having to compute it all over again.
Getting stuck during coding interviews is extremely common. But do not worry, that is part of the process and is a test of your problem solving abilities. Here are some tips to try out when you are stuck:
ðŸš€ Talk through what you initially thought might work and explain why it doesn't
ðŸš€ Come up with more test cases and write them down
ðŸš€ Think about how you would solve it without a program
ðŸš€ Recall past questions related to the topic, what similar questions in the past have you encountered and what techniques did you use to solve them?
ðŸš€ Enumerate through the common data structures and whether they can be applied to the question. There really aren't that many - stack, queue, dictionary, heap, graph, etc.
ðŸš€ Look out for repeated work and determine if you can cache those computations.
Write your code with a neat coding style (consistent indentation, spacing around your operators). Reading code written by others is usually not an enjoyable task. Use clear variable names, avoid single letter names unless they are for iteration.
Always be explaining what you are currently writing/typing to the interviewer. This is not about literally reading out what you are typing to the interviewer. Talk about the section of the code you are currently implementing at a higher level, explain why it is written as such and what it is trying to achieve.
After you have finished coding, do not immediately announce to the interviewer that you are done. In most cases, your code is usually not perfect and contains some bugs or syntax errors. What you need to do now is to review your code.
Next, take the initiative to come up with small test cases and step through the code (not your algorithm!) with those sample input. What interviewers usually do after you have finished coding would be to get you to write tests. It is a huge plus if you write tests for your code even before they prompt you to do so.
Lastly, give the time/space complexity of your code and explain why it is such. If your interviewer is happy with the solution, the interview usually ends here. It is also not uncommon that the interviewer asks you extension questions, such as how you would handle the problem if the whole input is too large to fit into memory, or if the input arrives as a stream.
Being able to solve these extension questions will indicate that you are a strong candidate which will likely lead to a better offer.
A data structure offers a convenient way of organizing as well as manipulating the data. Simply put, it allows the data to be used in an effective manner. There is a galore of data structures and each of them is suitable for a distinct set of applications.
It is a fundamental concept of any programming language, essential for algorithmic design.
Data structure isn't a programming language like C, C++, java, etc. It is a set of algorithms that can be used in any programming language to organize the data in the memory.
This is one of the most frequently asked data structures interview questions where the interviewer expects you to give a clear answer. Data structures are of two types:
Linear Data Structure: Linear data structures are those whose elements are in sequential and in ordered way. For example: Array, Linked list
Non-Linear Data Structures: The Non-linear data structure does not form a sequence i.e. each item or element is connected with two or more other items in a non-linear arrangement. The data elements are not arranged in the sequential structure. For Example : Graphs, Trees
In data structures, data is organized in a way that makes it efficient to be used. Some practical applications of data structures are:
Storing data in a tabular form. For example, contact details of a person. This is done through arrays.
Arrays are widely used in image processing and speech processing.
Music players and image sliders use Linked Lists to move to next or previous items.
A Queue is used for job scheduling, the arrangement of data packets for communication.
Technologies like Blockchain, cryptography are based on Hashing algorithms.
Matrices are widely used to represent data and plotting graphs, performing statistical analysis.
File Structure: A hard disk or external device (such as USB), stores data that remains intact till manually deleted. Such a representation of data into secondary or auxiliary memory is called a file structure.
Storage Structure: In this type of structure, data (variables, constants, etc.) are stored in the main memory, i.e. RAM, and is deleted once the function that uses this data gets completed.
Following are the various operations that can be performed on a data structure:
Deletion: Deleting an existing element from the data structure
Insertion: Adding a new element to the data structure
Searching: Find the location of an element, if it exists, in the data structure
Sorting: Arranging elements of the data structure in:
Traversal: Accessing each element of the data structure once for processing
This is one of the most frequently asked data structure interview questions where the interviewer expects you to give a thorough answer. Try to explain as much as possible rather than finishing your answer in a sentence!
A linked list is a data structure that has sequence of nodes where every node is connected to the next node by means of a reference pointer. The elements are not stored in adjacent memory locations. They are linked using pointers to form a chain.
Each node has two fields:
Data field: For storing data values.
Reference field: For storing address.
The entry point in a linked list is called the head. Where the list is empty, the head is a null reference and the last node has a reference to null.
A linked list is a dynamic data structure, where the number of nodes is not fixed, and the list has the ability to grow and shrink on demand.
Linked lists are considered both linear and non-linear data structures depending upon the application they are used for. When used for access strategies, it is considered as a linear data-structure. When used for data storage, it is considered a non-linear data structure.
Linked Lists are better than arrays in the following ways:
Array sizes are fixed at run-time and can't be modified later, but Linked Lists can be expanded in real-time, as per the requirements.
Linked Lists are not stored contiguously in the memory, as a result, they are a lot more memory efficient than arrays that are statically stored.
As the size of a linked list can grow or shrink based on the needs of the program, there is no memory wasted because it is allocated in runtime.
Insertion and deletion process is expensive in an array as the room has to be created for the new elements and existing elements must be shifted.
Following are the scenarios where we use linked list over array:
Less number of random access operations.
When we want to insert items anywhere in the middle of the list, such as when implementing a priority queue, linked list is more suitable.
When we do not know the exact number of elements beforehand.
Below are the cases where we use arrays over the linked list:
When we need speed while iterating over the elements in the sequence.
When we need to index or randomly access elements more frequently.
Due to the nature of arrays and linked list, it is safe to say that filled arrays use less memory than linked lists.
In a nutshell, requirements of space, time, and ease of implementation are considered while deciding which data structure has to be used over what.
It is a complex type (double-ended LL) of a linked list in which a node has two links, one that connects to the next node in the sequence and another that connects to the previous node. This allows traversal across the data elements in both directions.
The browser cache with BACK-FORWARD visited pages
The undo and redo functionality on platforms such as word, paint etc, where you can reverse the node to get to the previous page.
Stack is a linear data structure that follows LIFO (Last In First Out) approach for accessing elements.Push, pop, and top (or peek) are the basic operations of a stack.
Some notable applications of a stack are:
Check for balanced parentheses in an expression
Reverse a string
Evaluation of different expressions
A queue is a form of linear structure that follows the FIFO (First In First Out) approach for accessing elements. Dequeue, enqueue, front, and rear are basic operations on a queue. Like a stack, a queue can be implemented using arrays and linked lists.
In a stack, the item that is most recently added is removed first. Contrary to this, the item least recently added is removed first in case of a queue.
Heap is a special tree-based non-linear data structure in which the tree is a complete binary tree. A binary tree is said to be complete if all levels are completely filled except possibly the last level and the last level has all elements towards as left as possible. Heaps are of two types:
In a Max-Heap the data element present at the root node must be greatest among all the data elements present in the tree.
In a Min-Heap the data element present at the root node must be the smallest (or minimum) among all the data elements present in the tree.
Hashmap is a data structure that uses implementation of hash table data structure which allows access of data in constant time (O(1)) complexity if you have the key.
The time complexity is O(1) assuming that the hash function used in hash map distributes elements uniformly among the buckets.
In a stack, the item that is most recently added is removed first. Contrary to this, the item least recently added is removed first in case of a queue.
This is one of the favourite interview question where the interviewer demands a comprehensive explanation of the working principle.Try to explain it in a step-by-step manner.
Computers have cache memory that temporarily stores the most frequently used data. It's a great way to get the data that is used most often because the retrieval process is super fast
However, cache memory is limited in size and there needs to be a way to manage what data needs to be removed from the cache in order to store new data. That's where LRU cache comes in.
It's a cache replacement algorithm that removes the least recently used data in order to make room for new data. In order to achieve this, two data structures are used:
Queue: This is implemented using a doubly-linked list. The maximum size of the queue is determined by the cache size, i.e by the total number of available frames. The least recently used pages will be near the front end of the queue whereas the most recently used pages will be towards the rear end of the queue.
Hashmap: Hashmap stores the page number as the key along with the address of the corresponding queue node as the value.
The implementation is tricky. It exercises many of your programming fundamental skills and knowledge like using object oriented programming.
-You can checkout our course video explanation!
As the name suggests, we have to design a cache which will evict the least frequently used item when the cache is full and a new item needs to be added in the cache, also the operations get(K) and put(K, V) should take O(1) time.
To implement LFU cache we will use the combination of a oubly linked list and a hash map data structures. Define a node in the doubly linked list. Each node will contain information like key, value, frequency(number of times the key has been accessed), left node, and right node. For every new key, a new node will be added to the linked list. At any point in time, the nodes in the doubly linked list will be sorted based on their frequency of access. The head of the linked list will be the node that is least frequently accessed. This will help us get the node which is least frequently accessed in O(1) time.
Now, how do we access a key in constant time? For this purpose, we will use a hash map that will store the node's (node in the doubly linked list)
let's see an example of LFUCache
cache = LFUCache(2). # The Cache is empty1.cache.put(1, 1)
A new DataNode(key = 1, val = 1, frequency = 1) will be inserted in the cache.The cache becomes :
[frequency = 1] ->[key = 1, value = 1]
Second DataNode(key = 2, val = 2, frequency = 1) will be inserted in the # cache.
[frequency = 1] ->[key = 1, value = 1] , [key = 2, value = 2]
Returns 1(for key = 1, value = 1), DataNode with key = 1 is promoted,
[frequency = 1] -> [key = 2, value = 2]
[frequency = 2] -> [key = 1, value = 1]
Since the cache is full and key 3 is not present, key with least frequency(2 in # this case) will be evicted and a new DataNode(key = 3, val = 3, frequency #= 1) will be added.
[frequency = 1] -> [key = 3, value = 3]
[frequency = 2] -> [key = 1, value = 1]
Since key 2 is not present in the cache None will be returned. Cache remains as it was.
The implementation is tricky. It exercises many of your programming fundamental skills and knowledge like using object oriented programming.
-You can checkout our course video explanation!
Tree is a recursive, non-linear data structure consisting of the set of one or more data nodes where one node is designated as the root and the remaining nodes are called as the children of the root.
Some of the applications of trees are:
Filesystems: files inside folders that are inturn inside other folders.
Comments on social media: comments, replies to comments, replies to replies etc form a tree representation.
The maximum nodes are : 2k
Consider that every node of a tree represents a class called Node as given below:
Then the height of the binary tree can be found as follows:
Tree traversal is a process of visiting all the nodes of a tree. Since root (head) is the first node and all nodes are connected via edges (or links) we always start with that node. There are three ways which we use to traverse a tree:
Traverse the left subtree, i.e., call Inorder(root.left)
Visit the root, i.e. print(root.val)
Traverse the right subtree, i.e., call Inorder(root.right)
Visit the root, i.e. print(root.val)
Traverse the left subtree, i.e., call Preorder(root.left)
Traverse the right subtree, i.e., call Preorder(root.right)
Traverse the left subtree, i.e., call Postorder(root.left)
Traverse the right subtree, i.e., call Postorder(root.right)
Visit the root, i.e. print(root.val)
This is an interesting type of tree, i.e binary search tree because it follows certain parameters.
The left subtree of a node contains only nodes with keys lesser than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
The left and right subtree each must also be a binary search tree.
A graph is a common data structure that consists of a finite set of nodes (or vertices) and a set of edges connecting them. A pair (x,y) is referred to as an edge, which communicates that the x vertex connects to the y vertex.
BFS and DFS both are the traversing methods for a graph. Graph traversal is nothing but the process of visiting all the nodes of the graph.
The main difference between BFS and DFS is that BFS traverses level by level whereas DFS follows first a path from the starting to the end node, then another path from the start to end, and so on until all nodes are visited.
BFS uses queue data structure for storing the nodes whereas DFS uses the stack for traversal of the nodes for implementation.
The topological sort algorithm takes a directed graph and returns an array of the nodes where each node appears before all the nodes it points to.
Only applicable in Directed Acyclic graph(DAG)
Here is the implementation part in Python:
Time and Space Complexity
All together, the time complexity is O(M+N), (where M is the number of edges),(where N is the number of nodes). That's the fastest time we can expect, since we'll have to look at all the nodes and edges at least once.
All in all, we have three structures and they're all O(N)space. Overall space complexity: O(N)
A single sorting algorithm can't be considered best, as each algorithm is designed for a particular data structure and data set. However, the QuickSort algorithm is generally considered the fastest because it has the best performance for most inputs.
Its advantages over other sorting algorithms include the following:
Cache-efficient: It linearly scans and linearly partitions the input. This means we can make the most of every cache load.
Efficient even in worst-case input sets, as the order is generally random.
Easy adaption to already- or mostly-sorted inputs.
Merge sort is a divide-and-conquer algorithm for sorting the data. It works by merging and sorting adjacent data to create bigger sorted lists, which are then merged recursively to form even bigger sorted lists until you have one single sorted list.
It will be more clear if you go through below animation
Here is the implementation in python
A binary search is more efficient than a linear search because we perform fewer comparisons. With linear search, we can only eliminate one element per comparison each time we fail to find the value we are looking for, but with the binary search, we eliminate half the set with each comparison.
Binary search runs in O(log n) time compared to linear search's O(n) time. This means that the more of the elements present in the search array, the faster is binary search compared to a linear search.
The major data structures used are as follows:
RDBMS - Array (i.e. Array of structures)
Network data model - Graph
Hierarchical data model - Trees
Stack data structure is used in recursion due to its last in first out nature. Operating system maintains the stack in order to save the iteration variables at each function call
Most basic example of recursion is Fibonacci Series. Fibonacci series is a series of numbers formed by the addition of the preceding two numbers in the series. The first two terms are zero and one respectively. The terms after this are generated by simply adding the previous two terms
2 is calculated by adding the two numbers preceding it (1+1), 3 is calculated by adding the two numbers preceding it (1+2), 5 is (2+3), and so on..
Below in image we show how to design Fibonacci series using recursion. Recursion internally uses in-memory stack
Searching Algorithms are designed to check for an element or retrieve an element from any data structure where it is stored.
Different types of search algorithms are:
A binary search is an algorithm that is best applied to search a list when the elements are already in order or sorted. The list is searched starting in the middle, such that if that middle value is not the target search key, it will check to see if it will continue the search on the lower half of the list or the higher half. The split and search will then continue in the same manner.
Binary Search Algorithm is one of the widely used searching techniques. It can be used to sort arrays. This searching technique follows the divide and conquer strategy. The search space always reduces to half in every iteration. Binary Search Algorithm is a very efficient technique for searching but it needs some order on which partition of the array will occur.
This one was a tricky concept. We haven't come across any practical use case of this one yet but just knowing the concept is good from the interview perspective.
In a stable sorting algorithm, the order of the same element remains the same even after sorting. In other words, stable sorting maintains the position of two equals elements relative to one another but during the unstable sorting algorithm, this changes.
Some examples of stable algorithms are Merge Sort, Insertion Sort, Bubble Sort. While, QuickSort, Heap Sort, and Selection sort are unstable sorting algorithms.
Another tricky question which is easy if you know the trick. Well we can swap two numbers without using a temporary or third variable if we can store the sum of numbers in one number and then minus the sum with other number something like
Solution 1 - Using Addition and Subtraction
But it will not work in all conditions. Integer will overflow if the addition is more than the maximum value of int primitive as defined by Integer.MAX_VALUE and if subtraction is less than the minimum value, i.e. Integer.MIN_VALUE.
Solution 2 - Using XOR trick
If you remember the XOR truth table then you know that A XOR B will return 1 if A and B are different and 0 if A and B are the same. This property gives birth to the following code, popularly known as the XOR trick:
Difference Between List and Tuple in Python:
|Lists are mutable. They can be modified after creation||Tuples are immutable. Once a tuple is created it cannot by changed Lists have order.|
|Lists consume more memory||Tuple consume less memory as compared to the list|
|The list is better for performing operations, such as insertion and deletion.||Tuple data type is appropriate for accessing the elements|
Due to the fact that a backtracking algorithm takes all possible outcomes for a decision it is similar from this point of view with the brute force algorithm. The difference consist in the fact that sometimes a backtracking algorithm can detect that an exhaustive search is unnecessary and therefore it can perform better.
Brute force method actually takes a lot of time in solving problems because in this we generate all those solutions also which are actually not a solution. And this makes it time-consuming. But the idea of Backtracking is different. In Backtracking, we don't generate all possible solutions, rather than we keep go on checking whether the solution is correct or not at every step. If it is correct, we continue generating subsequent solutions to the problem. If it is incorrect we backtrack to the previous step and check for the other solutions.
Why is Quick Sort preferred for Arrays?
One of the main reasons for efficiency in quick sort is locality of reference, which makes it easy for the computer system to access memory locations that are near to each other, which is faster than memory locations scattered throughout the memory which is the case in merge sort.
Quick sort is an in-place sorting algorithm i.e. it does not require any extra space, whereas Merge sort requires an additional linear space, which may be quite expensive. In merge sort, the allocation and deallocation of the extra space increases the running time of the algorithm.
Why is Merge Sort preferred for Linked Lists?
In the case of linked lists, the nodes may not be present at adjacent memory locations, therefore Merge Sort is used.
Unlike arrays, in linked lists, we can insert items in the middle in O(1) extra space and O(1) time if we are given a reference/pointer to the previous node. Therefore, we can implement the merge operation in the merge sort without using extra space.
A String is said to be a palindrome if the reverse of String is equal to itself like "aba" is a palindrome because the opposite of "aba" is also "aba", but "abc" is not a palindrome because the reverse of "abc" is "cba" which is not equal.
The program is simple, and here are steps to find palindrome String:
1.Reverse the given String
2.Check if the reverse of String is equal to itself; if yes, then given String is a palindrome.
Here is the implementation in Python:
Initially, reverse the individual words of the given string one by one, for example, after reversing individual words the string should be "i ekil siht margorp yrev hcum". Reverse the whole string from start to end to get the desired output "much very program this like i" in this example.
Two String is called anagram, if they contain the same characters but on different order e.g. army and mary, stop and pots, etc. Anagrams are actually a mix-up of characters in String. If you are familiar with String API, i.e. java.lang.String then you can easily solve this problem.
In order to check if Strings are anagrams, you need to get their character array and see if they are equal or not. Though you can also use indexOf(), substring(), and StringBuffer or StringBuilder class to solve this question.
An undirected graph is tree if it has following properties.
1.There is no cycle
2.The graph is connected. For an undirected graph we can either use BFS or DFS to detect above two properties.
Here is the implementation in Python:
Dynamic programming is an extension of Divide and Conquer paradigm.
They both work by recursively breaking down a problem into two or more sub-problems. The solutions to the sub-problems are then combined to give a solution to the original problem
In Divide and conquer the sub-problems are independent of each other. Mergesort is a classic example of divide and conquer. The main difference between this example and the Fibonacci example is that in a mergesort, the division can (theoretically) be arbitrary, and no matter how you slice it up, you are still merging and sorting.
In dynamic programming the sub-problem are not independent. So to calculate new Fib number you have to know two previous values. For Merge sort you don't need to know the sorting order of previously sorted sub-array to sort another one.
Dynamic programming approach may be applied to the problem only if the problem has certain restrictions or prerequisites:
Optimal substructure: optimal solution can be constructed from optimal solutions of its subproblems
Overlapping sub-problems: problem can be broken down into subproblems which are reused several times or a recursive algorithm for the problem solves the same subproblem over and over rather than always generating new subproblems.
You can say both let's see how.
It's greedy because we always mark the closest vertex.
It's dynamic because distances are updated using previously calculated values.
But would say it's definitely closer to dynamic programming than to a greedy algorithm. To find the shortest distance from A to B, it does not decide which way to go step by step. Instead, it finds all places that one can go from A
The optimal decisions are not made greedily, but are made by exhausting all possible routes that can make a distance shorter. Therefore, it's a dynamic programming algorithm, the only variation being that the stages are not known in advance, but are dynamically determined during the course of the algorithm.
The following would be considered DP, but without recursion (using bottom-up or tabulation DP approach).
You can find the count of total occurrences of a character in a string by following the approach below:
1.Initialize a counter variable to store the count of total occurrences of a character in a string.
2.Traverse the string character by character.
3.If the character of the string matches with the given character, increment the value of the count variable. Finally, return the counter variable.
The Bellman-Ford algorithm finds single source shortest paths by repeatedly relaxing distances until there are no more distances to relax. Relaxing distances is done by checking if an intermediate point provides a better path than the currently chosen path.
This algorithm has the advantage over Dijkstra because it can handle graphs with negative edges, while Dijkstra is limited to non-negative ones. The only limitation it has are graphs with cycles that have an overall negative path, but this would just mean that there is no finite solution.
There are two approaches basically.
The first approach we can use Hash Table in hash table basically we record each node's reference (or memory address) in a hash table. if the current node is null we reached the end of the Linked List and its not a cycle otherwise if the node's reference present in the Hash table, then there is a loop.
This is the second approach Basically Floyd's Cycle-Finding Algorithm which is the fastest algorithm described below
1.Traverse Linked List using two-pointer.
2.Move slow pointer by one position
3.Move fast pointer by two positions.
4.If these two pointers meet at the same node then there is a Loop. If they don't meet in the same position (pointer) then LinkedList doesn't have a loop.
Bit manipulation allows a programmer to work with bits as opposed to the abstractions used in modern programming languages. They normally use this kind of programming to detect and correct algorithms.
Bit manipulation makes use of bitwise operations, which is a level of operation that works with individual bits, which are the smallest units of data.
When two integers have opposite signs, it means that their most significant bits are different. The most significant bit of any negative number will always be one, whereas the most significant bit for positive numbers will always be zero. If you apply the bitwise operator exclusive OR (XOR "^") to two integers with opposite signs, you will get a negative number. This means that if the XOR operator is applied to two integers and returns less than zero, the two numbers have opposite signs.
A Simple Solution is to traverse all bits one by one. For every pair of bits, check if both are same, set the corresponding bit as 0 in output, otherwise set as 1.
A Better Solution can find XOR without using a loop. The expression (x | y) - (x & y) is equivalent to x ^ y
Minimum number of queues needed to implement priority queue is two. One queue is used for the actual storing of data, and the other one is used for storing the priorities.
The main problem, while dealing with an array is not finding duplicates, it's about removing them. Since an array is a static, fixed-length data structure, you can not change its length. This means, deleting an element from an array requires creating a new array and copying content into that array.Algorithm:
1.Traverse the given array from start to end.
2.For every element in the array increment the arr[i]%n'th element by n.
3.Now traverse the array again and print all those indexes i for which arr[i]/n is greater than 1. Which guarantees that the number n has been added to that index
This approach works because all elements are in the range from 0 to n-1 and arr[i] would be greater than n only if a value "i" has appeared more than once.
Here is the implementation in Python:
The correct answer is DFS. A DFS algorithm, (i.e. depth first search) as its name implies, traverses nodes at levels below the current node before it processes the rest of the nodes at the same level.
BFS does the opposite-- exhausts all known nodes at a level before it traverses a node in the level below. Since T is full, all of its leaves are at the same tree level. BFS reaches a leaf only after all internal nodes are traversed, while DFS reaches a leaf the soonest can be.
No, duplicate keys are not allowed in HashMap. The key in a
HashMap does not allow for duplicate keys however, it does allow duplicate values. This means there can be multiple different keys with the same values.
HashMap also allows a null key, however as this is a key there can only be one null key per HashMap. You can have multiple null values.
This is an open-ended question to test your knowledge of in-place algorithms also to see whether it is thorough enough for you generate an example explaining it.
An in-place algorithm is an algorithm that doesn't use additional memory to process the input while it could use the input itself for its execution. The purpose of an in-place algorithm is to save memory space during execution.
A typical example is reversing an array. Consider an algorithm to reverse the values stored in an array 'inputArray' so that, in the 'outputArray' the first cell will hold value at the last cell of 'inArray', and so on.
The main differences between pointers and reference parameters are -
1.References are used to refer an existing variable in another name whereas pointers are used to store address of variable
2.References cannot have a null value assigned but pointer can.
3.A reference variable can be referenced by pass by value whereas a pointer can be referenced by pass by reference.
4.A reference must be initialized on declaration while it is not necessary in case of pointer.
5.A reference shares the same memory address with the original variable but also takes up some space on the stack whereas a pointer has its own memory address and size on the stack
A null pointer means it is not pointing to anything. If, there is no address that is assigned to a pointer, then set it to null.
A pointer type, i.e., int *, char * each have a null pointer value.
The syntax is as follows-
A void pointer is nothing but the one who does not have any data type with it. It is also called as a general purpose pointer. It can hold the addresses of any data type.
The syntax is as follows -
B-trees are a type of self-balancing tree structure designed for storing huge amounts of data for fast query and retrieval.
They can be often confused with their close relation - the Binary Search Tree. Although they're both a type of m-way search tree, the Binary Search Tree is considered to be a special type of B-tree.
B-trees can have multiple key/value pairs in a node, sorted ascendingly by key order. We call this key/value pair a payload.
B-trees are considered to be so advantageous because they provide logarithmic O(log(n)) time complexity when it comes to insert, delete, and search operations.
The most well-known version of the B-tree is the B+tree. What distinguishes the B+tree from the B-tree are two main aspects:
All leaf nodes are linked together in a doubly-linked list. Satellite data is stored on the leaf nodes only. Internal nodes only hold keys and act as routers to the correct leaf node
In B+trees, data stored on the leaf node makes the search more efficient since we can store more keys in internal nodes - this means we need to access fewer nodes
Deleting data from a B+tree is easier and less time consuming because we only need to remove data from leaf nodes
In fact, 99% of database management systems use B+trees for indexing.
Data abstraction refers to providing only essential information to the outside world and hiding their background details, i.e., to represent the needed information in program without presenting the details. Data abstraction is a programming (and design) technique that relies on the separation of interface and implementation.
For example, your program can make a call to the sort() function without knowing what algorithm the function actually uses to sort the given values. In fact, the underlying implementation of the sorting functionality could change between releases of the library, and as long as the interface stays the same, your function call will still work.
In the case of signed numbers, the first bit is used to indicate whether positive or negative, which leaves you with one bit short. With unsigned numbers, you have all bits available for that number. The effect is best seen in the number range (unsigned 8 bit number has a range 0-255, while 8-bit signed number has a range -128 to +127.
To find the target key in a linked list, you have to apply sequential search. Each node is traversed and compared with the target key, and if it is different, then it follows the link to the next node. This traversal continues until either the target key is found or if the last node is reached.
The way to write arithmetic expressions is known as notation. There are three types of notations used in an arithmetic expression, i.e., without changing the essence or output of expression. These notations are:
Prefix (Polish) Notation - In this, the operator is prefixed to operands, i.e. ahead of operands.
Infix Notation - In this, operators are used in between operands.
Postfix (Reverse-Polish) Notation - In this, the operator is postfixed to the operands, i.e., after the operands.
|x + y||+xy||xy+|
|(x + y)*z||*+xyz||xy+z*|
|x * (y + z)||*x+y||xyz+*|
The amount of space or memory is occupied or allocated depends upon the data type of the variables that are declared. So let's explain the same by considering an example: Let's say the variable is declared as an integer type then 32 bits of memory storage is allocated for that particular variable. So based on the data type of the variable, the memory space will be allocated.
Amortised time explained in simple terms: If you do an operation say a million times, you don't really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times. So it doesn't matter if the operation is very slow once in a while, as long as "once in a while" is rare enough for the slowness to be diluted away.
Essentially amortised time means "average time taken per operation, if you do many operations". Amortised time doesn't have to be constant; you can have linear and logarithmic amortised time or whatever else.
Trie is an efficient information retrieval data structure. Using Trie, search complexities can be brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST will need time proportional to M * log N, where M is maximum string length and N is number of keys in tree. Using Trie, we can search the key in O(M) time.
Every node of Trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as end of word node. A Trie node field isEndOfWord is used to distinguish the node as end of word node. A simple structure to represent nodes of the English alphabet can be as following,
Tries support ordered iteration, whereas iteration over a hash table will result in a pseudorandom order given by the hash function (also, the order of hash collisions is implementation defined), which is usually meaningless.
Tries tend to be faster on average at insertion than hash tables because hash tables must rebuild their index when it becomes full - a very expensive operation. Tries therefore have much better bounded worst case time costs, which is important for latency sensitive programs.
Tries facilitate longest-prefix matching, but hashing does not, as a consequence of the above. Performing such a "closest fit" find can, depending on implementation, be as quick as an exact find.
Hashing: is simply the act of turning a data chunk of arbitrary length into a fixed-width value (hereinafter called a hash value) that can be used to represent that chunk in situations where dealing with the original data chunk would be inconvenient.
A hash table is one of those situations; it simply stores references to such data chunks in a table indexed by each chunk's hash value. This way, instead of potentially comparing your desired key chunk against a huge number of chunks, you simply compute the hash value of the key chunk and do a much faster lookup with that hash value.
Heaps are structures meant to allow quick access to the min or the max. It allows you to pull the smallest or the biggest and quickly know the NEXT smallest or biggest. That's why it's called a Priority Queue
Use it whenever you need quick access to the largest (or smallest) item, because that item will always be the first element in the array or at the root of the tree. However, the remainder of the array is kept partially unsorted. Thus, instant access is only possible to the largest (smallest) item. Insertions are fast, so it's a good way to deal with incoming events or data and always have access to the earliest/biggest.
To construct a stack using queue we need two queues (q1, q2), and need to simulate the stack operations by using queue operations:
1. if q1 is empty, enqueue E to q1
2. if q1 is not empty, enqueue all elements from q1 to q2, then enqueue E to q1, and enqueue all elements from q2 back to q1
dequeue an element from q1
As we see, q1 acts as the main source for the stack, while q2 is just a helper queue that we use to preserve the order expected by the stack.
The very first approach of the problem is to sort the array in O(nlogn) time and take last 100 numbers. But the interviewer will not be happy with this solution so we need to think of a better solution.
We can keep a priority queue of the 100 biggest numbers, iterate through the billion numbers, whenever you encounter a number greater than the smallest number in the queue (the head of the queue), remove the head of the queue and add the new number to the queue.
With a priority queue implemented with a heap, the complexity of insertion to queue is O(log N). In the worst case you get billion*log2(100) which is better than billion*log2(billion)
The following are the main advantages of tries over binary search trees (BSTs):
Looking up keys is faster. Looking up a key of length m takes worst case O(m) time. A BST performs O(log(n)) comparisons of keys, where n is the number of elements in the tree, because lookups depend on the depth of the tree, which is logarithmic in the number of keys if the tree is balanced. Hence in the worst case, a BST takes O(m log n) time. Moreover, in the worst case log(n) will approach m. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines.
The number of internal nodes from root to leaf equals the length of the key. Balancing the tree is therefore no concern.
Tries are more space efficient when they contain a large number of short keys, because nodes are shared between keys with common initial subsequences.
|String refers to a sequence of characters represented as a single data type.||Character Array is a sequential collection of data type char.|
|Strings are immutable.||Character Arrays are mutable.|
|Strings can be stored in any any manner in the memory.||Elements in Character Array are stored contiguously in increasing memory locations.|
|Not preferred for storing passwords in Java.||Preferred for storing passwords in Java.|
|All Strings are stored in the String Constant Pool.||All Character Arrays are stored in the Heap.|
In an index type is a b-tree and access an element in a b-tree in logarithmic time. On the other hand, accessing an element in a hash table is in O(1) then why we are using b-tree in database. The reason is you can only access elements by their primary key in a hashtable.This is faster than a tree algorithm(O(1) instead of O(logn)) but you cannot select ranges(everything in between x and y). Tree algorithms support this in logn whereas hash indexes can result in full table scan O(n).
This is another fascinating coding problem based on a binary tree that is frequently asked of new programmers. It's rather simple to solve if you have any familiarity
with binary tree-related problems. We can use recursion here. Because recursion is used in most binary tree problems, we apply it to both left and right subtrees.
To answer this problem, you must first understand what a leaf node is, since if you do not, you will be unable to solve it. So a leaf node is a node whose left and right child nodes are null.
So we can print the leaf nodes of a BST checking each node if their both left and right child is null then print the node.
⮞ The very first case if given tree node or root is null then return
⮞ print the node if both of its children is null
⮞ repeat the process in both left and right subtree using recursion.
Because doubly linked lists have immediate access to both the front and back ends of the list, they can insert and delete data on either side in O(1) time.
Doubly linked lists are the ideal underlying data structure for a queue because they can insert data at the end in O(1) time and delete data from the front in O(1) time.
⮞ Doubly linked list is used in LRU cache design since we need to remove least recently items frequently. Deletion operation is faster.
⮞ In singly linked list all operations require an extra pointer previous to be maintained. For example, in insertion, we need to modify previous pointers together with next pointers.
Floyd's Cycle Detection Algorithm is the foundation for this problem. The idea is to employ two pointers, one that moves quickly and the other that moves slowly.
Both points travel the linked list at different speeds, and when they meet, it indicates that the LinkedList has a cycle. Save the address of this node and start incrementing
a counter with 1 while traversing the LinkedList using another pointer from the common point. We will have our number of nodes in cycle count in the pointer whenever we reach
the common pointer again. This number will be returned to you.
⮞ Take two pointers, a fast pointer, and a slow pointer pointing to the head initially.
Traverse both the pointers as slowptr = slowptr->next (1 node at a time), and fastptr = fastptr->next->next (2 nodes at a time).
⮞ When slowptr == fastptr, the common point is the node for the head of the cycle.
⮞ Fix one pointer to this node and take count = 0 and move the other pointer from the common point one by one in the linked list and increment the counter by 1 in each step
⮞ When the other pointer reaches the common point then stop the iteration and return the count.
Let’s consider a graph G(V, E). The graph G is a bipartite graph if:
• The vertex set V of G can be partitioned into two disjoint and independent sets V1 and V2
• All the edges from the edge set E have one endpoint vertex from the set V1 and another endpoint vertex from the set V2
Some important points regarding bipartite graph
• If a graph is a bipartite graph then it’ll never contain odd cycles.
• The subgraphs of a bipartite graph are also bipartite.
• In an undirected bipartite graph, the degree of each vertex partition set is always equal.
To check whether graph is bipartite or not we use graph coloring and BFS to determine it.
Here are the steps of algorithm:
⮞ Assign a red color to the starting vertex start
⮞ Find the neighbors of the starting vertex and assign a different color(let's say blue)
⮞ Find the neighbor’s neighbor and assign a red color
⮞ Continue this process until all the vertices in the graph get colored
⮞ During this process if a neighbour vertex and coming vertex are of same color then we return graph is not a bipartite graph
Sometimes an interviewer asks you some tricky questions just to test how far you can go. So this question is one of them.
We all know how to find the fibonacci number in recursion or iteration and for larger numbers we can use memoization technique to store the overlapping values.
The time complexity for each of the methods are as follows:
Recursion: O(2N) This is the most naive approach to calculate fibonacci number and recursion.
Iterative or using memoization: O(n) Storing these values prevent us from constantly using memory space in the Stack.
Binet Formula: This approach will fail for higher values of n. For eg. The result given by below code for n = 71 is 308061521170129, while the correct answer is 308061521170130. So, this method will work well only for small values of n and is rarely used practically.
So, The nth Fibonacci number is given by the following formula:
goldenRatio = (1 + sqrt(5)) / 2
return round(pow(goldenRatio, n) / sqrt(5))
Time complexity of this method is O(1)
Bucket sort, also known as bin sort, is a sorting algorithm that splits the contents of an array into many buckets. The buckets are then sorted one by one,
either with a separate sorting algorithm or by applying the bucket sorting process recursively.
Create an empty array of size n(n empty buckets).
Loop through the original array and put each array element in a “bucket”.
Sort each of the non-empty buckets using insertion sort.
Visit the buckets in order and put all elements back into the original array
Worst case time complexity:Θ(n2)
Average case time complexity:Θ(n+k)
Best case time complexity:Θ(n+k)
HashMap and Hashtable store key and value pairs. When utilising a Hashtable or HashMap, we specify an object to serve as a key and the value to be associated with that key.
|No method is synchronized.||Every method is synchronized.|
|Threads are not required to wait and hence relatively performance is high.||It increases the waiting time of the thread and hence performance is low.|
|Null is allowed for both key and value||Null is not allowed for both key and value. Otherwise, we will get a null pointer exception.|
|HashMap inherits AbstractMap class.||Hashtable inherits Dictionary class.|
The Boyer Moore algorithm is a searching algorithm that looks for a string with length n and a pattern with length m. It prints all of the pattern's occurrences in the Text. Boyer Moore employs both a bad character heuristic and a positive character heuristic. Each of them is utilised to look for the pattern on its own. Pattern processing is utilised to create separate arrays for both heuristics in this approach, and the best heuristic is used at each stage. Boyer Moore begins by matching the previous pattern, which differs from KMP and Naive's method.
Because, when analysing performance, we are usually only interested in the worst-case scenario. Knowing the upper bound is therefore sufficient. When it performs better than expected for given input. Some Algorithms don't have tight bound at all. Moreover, tight bounds are often more difficult to compute.
|Backtracking is a technique for discovering all viable solutions to a problem. When it realises it has made a faulty decision, it reverses the previous decision by backing it up. It searches the state space tree until a solution to the problem is found.||Optimisation challenges are solved with Branch-and-Bound. It abandons the pre-solution when it realises that it already has a better optimal solution that the pre-solution leads to. To find the best answer, it searches the entire state space tree.|
|Backtracking involves feasibility function.||Branch-and-Bound involves a bounding function.|
|Backtracking is used for solving Decision Problem.||Branch-and-Bound is used for solving Optimisation Problem.|
|It is more efficient||It is less efficient|
|Useful in solving N-Queen Problem, Subset Sum.||Useful in solving Travelling Salesman Problem, Knapsack Problem.|
These are some of the most frequently asked Data Structures interview questions. The questions depend on your level of experience and the position you are interviewing for. In general, companies try to assess how good you are in things you claim to know.
We hope that the above Data Structures interview questions have sufficed in explaining to you how important it is to study Data Structures and Algorithms. Do not be disheartened if you don't know how to take the necessary steps in improving your knowledge on DSA and getting the job you had in mind while answering these Data Structures interview questions.
You can join Logicmojo's Data Structures and Algorithms course to upskill your technical knowledge, experience, and guidance that will help you in securing the dream job.
A. for(i = 0; i < n; i++)
B. for(i = 0; i < n; i += 2)
C. for(i = 1; i < n; i *= 2)
D. for(i = n; i > -1; i /= 2)