
Top 30 Python Data Science Interview Questions
Apr 18, 2025 7 Min Read 4793 Views
(Last Updated)
Welcome to the last minute preparation guide on python to ace your data science interview. Python plays an important role in all data science related roles and their interview process. It is used from analyzing and visualizing data, building models to creating user interfaces.
In this blog, we will look into top 30 python data science interview questions. This blog will focus on python interview questions for data science roles. It covers topics from basic to advanced level. Let’s get started!
Table of contents
- Basic Python Data Science Interview Questions
- What built-in data types are used in Python?
- How are data analysis libraries used in Python? What are some of the most common libraries?
- What is negative Indexing in Python? [with example]
- What is dictionary comprehension in Python? [with example]
- Is Python an object-oriented programming language?
- What library would you prefer for plotting Seaborn or Matplotlib?
- What is the difference between lists and tuples in Python?
- How would you sort a dictionary in Python?
- What is the difference between a series and a data frame in Pandas?
- Is memory de-allocated when you exit Python?
- Intermediate Python Data Science Interview Questions
- What is a keyError in python?
- Given two arrays, write a Python function to return the intersection of the two. For example, X = [1,5,9,0] and Y = [3,0,2,9] it should return [9,0]
- How do map, reduce, and filter functions work?
- What is the difference between del(), clear(), remove(), and pop()?
- Given an integer n and an integer K, output a list of all of the combinations of k numbers chosen from 1 to n. For example, if n=3 and k=2, return [1,2],[1,3],[2,3]
- Given two strings, string1 and string2, write a function is_subsequence to find out if string1 is a subsequence of string2.
- What is the difference between pass, continue, and break?
- Write a function that can take a string and return a list of bigrams.
- What are namespaces in Python? [explain in brief]
- What is the difference between 'is' and '=='?
- Advanced Python Data Science Interview Questions
- Write a function to generate N samples from a normal distribution and plot them on the histogram.
- Write a function that takes in a list of dictionaries with both a key and a list of integers, and returns a dictionary with the standard deviation of each list.
- Given a list of stock prices in ascending order by datetime, write a function that outputs the maximum profit by buying and selling at a specific interval.
- Given a positive integer X, return an integer that is the factorial of X. If a negative integer is provided, return -1. Implement the solution by using a recursive function.
- Given a dataset of test scores, write Pandas code to return cumulative bucketed scores of <50, <75, <90, and <100.
- Given a data frame of students’ favorite colors and test scores, write a function to select only those rows (students) where their favorite color is blue or red and their test grade is above 80.
- Write a function that returns the maximum number in the list.
- Given an array, find all the duplicates in this array? For example: input: [1,2,3,1,3,6,5] output: [1,3]
- Given a dictionary with keys of letters and values of a list of letters, write a function nearest_key to find the key with the input value closest to the beginning of the list.
- Develop a k-means clustering algorithm in Python from the ground up.
- Concluding Thoughts...
- FAQs
- Is pursuing a career in data science still advisable in 2025?
- What is a 'list' in the context of Python programming during interviews?
- How is Python described in interviews?
Basic Python Data Science Interview Questions
The basic level covers python topics such as data types, object oriented programming, memory management, data manipulation and analysis. Let’s look into the questions.
1. What built-in data types are used in Python?
Python offers several built-in data types that are foundational for data manipulation and programming. These include:
- int: Used for integer values.
- float: Handles floating-point numbers.
- str: Manages strings of characters.
- bool: Boolean values like True and False.
- list: A mutable sequence of elements.
- tuple: An immutable sequence of elements.
- set: An unordered collection of unique elements.
- dict: A collection of key-value pairs.
Understanding these data types is crucial as they form the basis of Python programming, especially in data science, where data manipulation and analysis are key.
2. How are data analysis libraries used in Python? What are some of the most common libraries?
Python is renowned for its robust libraries that simplify data analysis, including:
- Pandas: Offers data structures like DataFrames and Series for easy data manipulation.
- NumPy: Provides support for large, multi-dimensional arrays and matrices.
- Matplotlib: A plotting library useful for creating static, interactive, and animated visualizations.
- Seaborn: Built on top of Matplotlib, it provides a high-level interface for drawing attractive statistical graphics.

These libraries are integral for performing complex data analysis tasks efficiently in Python.
3. What is negative Indexing in Python? [with example]
Negative indexing in Python allows access to the list elements from the end. For instance, consider the list a = [1, 2, 3, 4, 5]
:
a[-1]
would give the last element, which is5
.a[-2]
would return4
the second last element.
This feature is particularly useful for quickly accessing data from the end without needing to know the length of the list.
4. What is dictionary comprehension in Python? [with example]
Dictionary comprehension offers a concise way to create dictionaries. The syntax is {key: value for vars in iterable}
. For example:
squares = {x: x*x for x in range(6)}
This creates a dictionary squares
where each key is an integer and its value is the square of the key.
5. Is Python an object-oriented programming language?
Yes, Python supports object-oriented programming (OOP) principles, making it a multi-paradigm language that facilitates OOP with classes and objects. It allows for concepts like inheritance, encapsulation, and polymorphism, which are fundamental in creating reusable and modular code.

6. What library would you prefer for plotting Seaborn or Matplotlib?
Choosing between Seaborn and Matplotlib depends on the specific needs:
- Matplotlib provides extensive control and customization over plots.
- Seaborn is preferable for making attractive statistical plots quickly and provides themes and high-level interfaces.
For detailed customization, Matplotlib is ideal, while for high-level statistical plotting, Seaborn is more convenient.
7. What is the difference between lists and tuples in Python?
The primary difference is mutability:
- Lists are mutable, meaning they can be modified after creation (e.g., adding or removing elements).
- Tuples are immutable, meaning their contents cannot be changed once created.

This distinction affects performance and usage: tuples can be faster and are useful where fixed data is needed.
8. How would you sort a dictionary in Python?
Dictionaries can be sorted by keys or values using sorted()
:
my_dict = {'one': 1, 'three': 3, 'five': 5}
sorted_by_key = {k: my_dict[k] for k in sorted(my_dict)}
sorted_by_value = {k: v for k, v in sorted(my_dict.items(), key=lambda item: item[1])}
This results in dictionaries sorted by keys and values, respectively.
9. What is the difference between a series and a data frame in Pandas?
- Series: A one-dimensional array with labels. It can hold any data type.
- DataFrame: A two-dimensional table with row and column labels. It resembles a spreadsheet or SQL table and is suitable for representing complex data relationships.
Understanding these structures is fundamental for effective data manipulation in Pandas.
10. Is memory de-allocated when you exit Python?
Memory de-allocation in Python is generally handled by Python’s garbage collector, which deallocates memory not in use automatically. However, in cases of circular references or references from global namespaces, memory might not be freed upon Python exit, depending on the environment and operating system.
If you are interested in getting started with Data Science and love to learn by reading, you can use Guvi’s FREE E-book on Master the Art of Data Science – A Complete Guide. This e-book is an overall roadmap and explains in detail about each step towards a career in data science.
Intermediate Python Data Science Interview Questions
The intermediate level covers topics such as built-in functions, errors, data structures like stack, queue, and string manipulation. Let’s dive into the questions.
11. What is a keyError in python?
A key error in Python occurs when you try to access a key element, but there is no match in the actual dictionary. Python throws KeyError only after iterating through the dictionary to look for the key. For example, a student dictionary is mapped with their role numbers between 1 to 10 and their names. When you try to access a student with a roll number 11, Python will throw a KeyError since the roll numbers are mapped till 10.
To solve this error, you can use the get() method to check for a key; if there is no matching key, it will return a null value. The most efficient way is to use a try and catch block, if there is no matching, then it will execute the catch block.
12. Given two arrays, write a Python function to return the intersection of the two. For example, X = [1,5,9,0] and Y = [3,0,2,9] it should return [9,0]
This problem can be solved using a built-in method called intersect(). Check out the code below
X = [1, 5, 9, 0] Y = [3, 0, 2, 9] answer = X.intersect(Y) # Returns intersect element from X and Y print(answer) # [9, 0] |
13. How do map, reduce, and filter functions work?
- map(): Applies a function to all items in an input list. Example:
items = [1, 2, 3, 4, 5] squared = list(map(lambda x: x**2, items))
- reduce(): Applies a rolling computation to sequential pairs of values in a list. This function is part of the
functools
module:from functools import reduce result = reduce((lambda x, y: x * y), items)
- filter(): Creates a list of elements for which a function returns true:
even_items = list(filter(lambda x: x % 2 == 0, items))

14. What is the difference between del(), clear(), remove(), and pop()?
- del(): Deletes items from a list or entire variables.
- clear(): Empties the entire list.
- remove(): Removes the first matched item.
- pop(): Removes the item at a specific index and returns it.
15. Given an integer n and an integer K, output a list of all of the combinations of k numbers chosen from 1 to n. For example, if n=3 and k=2, return [1,2],[1,3],[2,3]
To find the list of all the combinations of a number, we can use a built-in method called combinations() from the itertools package.
from itertools import combinations def find_combintaion(k,n): list_num = [] comb = combinations([x for x in range(1, n+1)],k) for i in comb: list_num.append(i) print(list_num,”\n”) #Output: [1, 2], [1, 3], [2, 3] find_combinations(2, 3) |
16. Given two strings, string1 and string2, write a function is_subsequence to find out if string1 is a subsequence of string2.
A function to determine if one string is a subsequence of another can be implemented as follows:
def is_subsequence(s1, s2):
iter_s2 = iter(s2)
return all(char in iter_s2 for char in s1)
17. What is the difference between pass, continue, and break?
All three statements are called escape statements in Python.
- pass: Does nothing; used as a placeholder.
- continue: Skips the rest of the loop’s current iteration and moves to the next iteration.
- break: Exits the loop entirely.
18. Write a function that can take a string and return a list of bigrams.
A function to extract bigrams from a string could look like this:
def find_bigrams(input_string):
words = input_string.split()
return [(words[i], words[i + 1]) for i in range(len(words) - 1)]
19. What are namespaces in Python? [explain in brief]
Namespaces in Python are mappings from names to objects. They help avoid naming conflicts by ensuring that names are unique within a particular context or scope.
20. What is the difference between ‘is’ and ‘==’?
'is'
: Checks if two variables point to the same object in memory.'=='
: Checks if the values of two variables are equal.
Each of these questions and answers deepens your understanding of Python, preparing you for scenarios you might face in data science interviews.
Advanced Python Data Science Interview Questions
This section covers the advanced Python concepts that are necessary for data science roles. It includes the knowledge of Python libraries used for various data science lifecycles, such as numpy, pandas, matplotlib, and statistical problems. Let’s look into each of these.
21. Write a function to generate N samples from a normal distribution and plot them on the histogram.
To tackle this problem, you can use libraries like Numpy, Matplotlib, or Seaborn for visualization. Here’s how you can create a function in Python:
import numpy as np
import seaborn as sns
def generate_and_plot(N):
# Generate N samples from a normal distribution
samples = np.random.randn(N)
# Plotting the histogram
sns.histplot(samples, bins=20, kde=True, color='blue')
return samples
# Example usage:
samples = generate_and_plot(1000)
This function not only generates the samples but also plots them, providing a visual understanding of the distribution.
22. Write a function that takes in a list of dictionaries with both a key and a list of integers, and returns a dictionary with the standard deviation of each list.
For this task, you can utilize Python’s numpy
library to calculate the standard deviation:
import numpy as np
def calculate_std_dev(dict_list):
result = {}
for d in dict_list:
for key, values in d.items():
result[key] = np.std(values)
return result
# Example usage:
dict_list = [{'a': [1, 2, 3]}, {'b': [4, 5, 6, 7]}]
std_devs = calculate_std_dev(dict_list)
This function processes each dictionary in the list, computing the standard deviation for each list associated with a key.
23. Given a list of stock prices in ascending order by datetime, write a function that outputs the maximum profit by buying and selling at a specific interval.
To maximize the profit from stock prices, you can use the following approach:
def max_profit(prices):
min_price = float('inf')
max_profit = 0
for price in prices:
min_price = min(min_price, price)
profit = price - min_price
max_profit = max(max_profit, profit)
return max_profit
# Example usage:
prices = [9, 11, 8, 5, 7, 10]
profit = max_profit(prices)
This function keeps track of the minimum price and calculates the potential profit at each step, updating the maximum profit accordingly.
24. Given a positive integer X, return an integer that is the factorial of X. If a negative integer is provided, return -1. Implement the solution by using a recursive function.
def factorial(x): # Edge cases if x < 0: return -1 if x == 0: return 1 # Exit condition – x = 1 if x == 1: return x else: # Recursive part return x * factorial(x – 1) answer = factorial(4)print(answer)#Output: 24 |
25. Given a dataset of test scores, write Pandas code to return cumulative bucketed scores of <50, <75, <90, and <100.
You can use the pandas
library to categorize and calculate the cumulative percentages:
import pandas as pd
def bucket_scores(df):
bins = [0, 50, 75, 90, 100]
labels = ["<50", "<75", "<90", "<100"]
df['bucket'] = pd.cut(df['score'], bins=bins, labels=labels, right=False)
df_grouped = df.groupby('bucket').size().cumsum() / len(df) * 100
return df_grouped.reset_index(name='cumulative_percentage')
# Example usage:
data = {'score': [39, 80, 73, 91, 92, 85, 41]}
df = pd.DataFrame(data)
result = bucket_scores(df)
This function categorizes the scores into predefined buckets and calculates the cumulative percentage of scores in each bucket.
26. Given a data frame of students’ favorite colors and test scores, write a function to select only those rows (students) where their favorite color is blue or red and their test grade is above 80.
This selection can be efficiently done using the pandas
library:
def select_students(df):
return df[(df['favorite_color'].isin(['blue', 'red'])) & (df['test_grade'] > 80)]
# Example usage:
data = {'favorite_color': ['green', 'red', 'blue'], 'test_grade': [91, 89, 95]}
df = pd.DataFrame(data)
selected_students = select_students(df)
This function filters the data frame based on the conditions provided, selecting students accordingly.
27. Write a function that returns the maximum number in the list.
Using Python’s built-in functions, you can find the maximum number easily:
def find_max(numbers):
return max(numbers)
# Example usage:
numbers = [1, 2, 3, 4, 5]
max_number = find_max(numbers)
This simple function returns the highest number in a list using the max()
function.
28. Given an array, find all the duplicates in this array? For example: input: [1,2,3,1,3,6,5] output: [1,3]
list = [1, 2, 3, 1, 3, 6, 5] set1=set() res=set() for i in list: if i in set1: res.add(i) else: set1.add(i) print(res)#Output: [1, 3] |
29. Given a dictionary with keys of letters and values of a list of letters, write a function nearest_key to find the key with the input value closest to the beginning of the list.
This can be achieved by iterating through the dictionary and finding the closest match:
def nearest_key(target, dictionary):
nearest = None
min_index = float('inf')
for key, values in dictionary.items():
if target in values:
idx = values.index(target)
if idx < min_index:
min_index = idx
nearest = key
return nearest
# Example usage:
dictionary = {'a': ['b', 'c', 'd'], 'b': ['a', 'd', 'e']}
nearest = nearest_key('d', dictionary)
This function searches for the target value in each list and keeps track of the key whose list contains the target at the smallest index.
30. Develop a k-means clustering algorithm in Python from the ground up.
Implementing k-means involves several steps including initializing centroids, assigning points to the nearest centroids, and updating centroids based on the mean of assigned points:
import numpy as np
def k_means(data, k, max_iters=100):
centroids = data[np.random.choice(len(data), k, replace=False)]
for _ in range(max_iters):
clusters = {i: [] for i in range(k)}
for point in data:
distances = [np.linalg.norm(point - centroid) for centroid in centroids]
cluster = distances.index(min(distances))
clusters[cluster].append(point)
new_centroids = np.array([np.mean(clusters[i], axis=0) for i in range(k)])
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, clusters
# Example usage:
data = np.random.rand(100, 2) # 100 points in 2D space
centroids, clusters = k_means(data, 3)
This function initializes centroids randomly, then iteratively reassigns points to the nearest centroid and updates centroids based on the mean of points in each cluster until convergence.
These advanced Python data science interview questions and answers, complete with code snippets, will help you demonstrate your technical proficiency and problem-solving skills in your upcoming interviews.
If you want to learn the necessary skills required for a data science starting from scratch to advance from India’s top Industry Instructors, consider enrolling in GUVI’s Zen class “Become a Data Science Course with IIT-M Pravarta” that not only teaches you everything about data science, but also provides you with hands-on project experience an industry-grade certificate!
Concluding Thoughts…
In conclusion, this blog is the perfect last-minute guide to ace your data science interview. It covers Python topics ranging from data types, object-oriented programming, memory management, data manipulation, data analysis, built-in functions, data structures, and various Python libraries for data science. Mastering these concepts will not only help you in acing data science-related interviews but also help you in other roles that require Python programming. Happy Learning!
FAQs
Absolutely. Choosing a career in data science continues to be a smart and profitable decision in 2025.
In Python, a ‘list’ refers to an ordered collection of elements that can include various types. Lists are mutable, allowing modifications such as changing an element’s value or adjusting the list’s size by adding or removing elements. They are defined using square brackets with elements separated by commas.
Python is described as a high-level, general-purpose programming language that supports object-oriented programming. Often referred to as a scripting language, Python is widely used for developing web applications, webpages, and graphical user interface (GUI) applications. Its popularity is largely due to its versatility.
Did you enjoy this article?