• LOGIN
  • No products in the cart.

Data Science with Python Interview questions and Answers Updat 2022

Q1.In Python, which built-in data types are used?

Python contains several built-in data types, including:

the number (int, float, and complex)

  • a line (str)
  • quadruple (tuple)
  • a wide range (range)
  • Make a list (list)
  • Organize (set)
  • Thesaurus (dict)

Data types are used to classify or categorize data in Python, and each value has its own data type.

Q2. How do Python’s data analysis libraries work? Which library is the most popular?

Python’s popularity in data science programming stems from its large library of data analysis libraries. Functions, tools, and methods for data management and analysis are included in these libraries. Python libraries are available for processing picture and textual data, data mining, and data visualization, among other data science tasks. The following are some of the most popular Python data analysis libraries:

  • Pandas
  • NumPy
  • SciPy
  • TensorFlow
  • SciKit
  • Seaborn
  • Matplotlib

Q3. What is the purpose of a negative index in Python?

Python uses negative indexes to evaluate and index lists and arrays from the beginning, counting backward. For instance, n-1 displays the last item in a list, while n-2 displays the second-to-last item. In Python, here’s an example of a negative index:

b = “Python Coding Fun”

print(b[-1])

>> n

Q4. In Python, what is the difference between lists and tuples?

In Python, lists and tuples are classes that hold one or more objects or data. The following are some key distinctions:

Tuples are enclosed in parenthesis, and lists are enclosed in square brackets.

Immutable vs. Mutable — Lists are mutable, meaning they can be changed after they’ve been formed. Tuples are immutable, which means they can’t be changed after they’ve been created.

Operations — Lists offer more features than tuples, such as insert and pop operations, as well as sorting.

Tuples take less memory and are therefore faster because they are immutable.

Q5. Is Python an object-oriented language? What does it mean to program in an object-oriented manner?

Python is an object-oriented programming language, which means that it can contain codes within objects. The attribute enables data and method storage in a single object.

Q6: What is the difference between a Python module and a Python function? What sets it apart from traditional libraries?

A module is a single (or multiple) files containing functions, definitions, and variables to accomplish certain tasks. It’s an a.py file with a Python extension. It can be imported at any point throughout the session and only once. Import or from module name import are the two methods for importing a Python module.

A library is a collection of code that allows us to reuse functionality.

Q7. What exactly is PEP8?

Convection is coded by PEP 8. It comprises coding guidelines, which are a set of suggestions for making the Python language more understandable and usable for others.

Q8. Differentiate between changeable and immutable objects.

A data structure’s mutability refers to the ability to change a section of it without having to reconstruct it. Lists, sets, and values in a dictionary are examples of mutable objects.

Q9.What is the lambda function, and what is it used for?

An unnamed or nameless function is a lambda function.

Because they are not specified using the def keyword, these functions are referred to as anonymous functions. The return keyword is not required. The function already includes them.

The function can have any number of parameters, but only one statement and only one value in the form of an expression. They can’t have any instructions or phrases in them.

Because lambda requires an expression, an anonymous function can’t be used to print directly.

Lambda functions have their local namespace, therefore they can only access variables in their parameter list and the global namespace.

Example:
x = lambda i,j: i+j = lambda i,j = lambda i,j = lambda I

print(x(7,8))

Q10: What is the difference between a map, reduce, and filter?

The Map function updates all iterables by applying the specified function to them and returning a new list. A function is assigned to each element in a series.

Reduce applies the same logic to each item in a list. The first argument of the next operation is the result of previous operations. Instead of a list, it returns a single item.

A single item is removed from a sequence using the filter function. It is used to test whether all of the elements in a given iterable (list, set, or tuple) are true or false using another function specified as an argument. As a result, it generates a filtered list.

Q11. What’s the difference between the functions del(), clear(), remove(), and pop()?

del(): deletes concerning the value’s position. It doesn’t tell you which value was removed. Lowering one value also shifts the index to the right. It’s also possible to erase the complete data structure with it.

clear(): removes all items from the list.

remove(): deletes with respect to the value, therefore it’s useful if you know which value you want to get rid of.

pop(): removes the last element by default and also returns the value that was destroyed. When we wish to build a reference, we use it frequently. In this case, we can save the erased return value in a variable and use it again later.

Q12.What is the distinction between pass, continue, and break?

When you need a block of code syntactically but don’t want it to run, you can use the pass command. This is a null operation in essence. When this is run, nothing happens.

Continue: It lets you skip a section of a loop when a certain condition is fulfilled, and the control is returned to the loop’s beginning. The loop does not end; instead, it moves on to the next iteration.

Break: It permits the loop to end when a condition is met, and control of the program is transferred to the statement immediately following the loop’s body. If the break statement is inside a nested loop (a loop within a loop), the break statement is ignored.

Q13. What does the With statement do?

When used with an open file, the with statement aids in exception management as well as file processing. Using this method:

with file name set to open(“filename,” “mode”):

We can open and process the file without having to explicitly close it. If the block is present, the file object will be closed. The With statement is helpful in that it ensures that the file stream process is not interrupted and that, in the event of an exception, it is correctly terminated.

Q14: Can you explain the differences between merge, join, and concatenate?

Merge is a command that joins two data frames together using a unique column identity. The merge occurs by default on an inner, which is the intersection of all the elements. pd.merge(df1, df2, ‘outer’, on=’custId’) Syntax: pd.merge(df1, df2, ‘outer’, on=’custId’)

Join is a function that joins data frames using a unique index. The left join is the default, which means it uses all of the data frame’s exclusive ids from the left table. It will return all of the indexes on the table’s left side, as well as NaN for any values that do not exist on the right table. df1.join is the syntax (df2)

Concatenate links data frames together based on their rows or columns. PD.concat is the syntax (df1,df2)

Q15: What is the function apply()?

The apply() method is
essential for applying to data frames and series. It can be used on any Pandas
series or data frame value. The method by which the apply() function works on a
column to ensure that it remains in a data frame and is iterated in a loop for
all remaining columns. It can be used for both built-in and custom functions.
Lambda functions can also be used in this way. df.apply (lambda x: x**2) is an
example.

Q16. When should crosstab and pivot table be used?

CrossTab:

In a data frame, the crosstab function can be used on Numpy arrays, series, or columns.

Pandas work behind the scenes to count how many occurrences of each combination there are.

Why use a crosstab function at all? The short answer is that it has a couple of useful capabilities for formatting and summarising data more quickly.

Unless aggfunc is specifically indicated, CrossTab values will not operate, and vice versa.

gives ValueError: Without an aggfunc, values cannot be used.  You can’t use aggfunc if you don’t have any values.

CrossTab’s syntax is limited to:

pd.crosstab(index=df[],columns=df[],values=df[], aggfunc=) pd.crosstab(index=df[],columns=df[],values=df[], aggfunc=) pd.crosstab(

If you don’t specifically specify values to a pivot table, it will use ALL of the numeric columns as values by default.

By default, the average is used.

Q17. Provide examples of categorical and distribution
plots?

Plots of Distribution:

display: An interface for drawing distribution plots onto a FacetGrid at the figure level.

his plot: Displays dataset distributions as univariate or bivariate histograms.

deploy: Uses kernel density estimation to plot univariate or bivariate distributions.

Plots with Categorical Data:

Cat plot: An interface for drawing categorical plots onto a FacetGrid at the figure level.

Strip plot: Creates a scatter plot using one category variable.

Swarmplot: Creates a non-overlapping categorical scatter plot.

Boxplot: Displays distributions in terms of categories using a box plot.

Violin plot: Combines a boxplot and a kernel density estimate into a single plot.

For larger datasets, Boxenplot creates an upgraded box plot.

Point plot: Uses scatter plot graphics to display point estimates and confidence intervals.

The point estimates and confidence intervals are displayed as rectangular bars in a barplot.

Q18.What exactly are boolean arrays? Write code to generate a boolean array using the NumPy library.

A boolean array’s components must all be of the boolean data type. Python keywords like and or don’t work with boolean arrays, so keep that in mind.

Barr = np.array([True, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False

Q19.What precisely is fancy indexing?

In NumPy, you can use an integer list to specify the indexing of NumPy arrays. Array[[2,1,0,3]] will print the rows in the order provided by the list for a 4×4 array, for example.

Q20. What does NaT mean in the Pandas Python library?

The acronym NaT stands for “Not a Time.” For timestamp data, it’s the NA value.

Q21. What is NumPy array broadcasting?

Broadcasting is a method of specifying how arithmetic computations are carried out amongst arrays of various dimensions.

The following illustration depicts this:

NumPy arrays broadcasting

Q22.Write a function that can take a string and return a list of bigrams.

Example:

sentence = “””
Have free hours and love children?
Drive kids to school, soccer practice
and other activities.
“””
output = [(‘have’, ‘free’),
(‘free’, ‘hours’),
(‘hours’, ‘and’),
(‘and’, ‘love’),
(‘love’, ‘children?’),
(‘children?’, ‘drive’),
(‘drive’, ‘kids’),
(‘kids’, ‘to’),
(‘to’, ‘school,’),
(‘school,’, ‘soccer’),
(‘soccer’, ‘practice’),
(‘practice’, ‘and’),
(‘and’, ‘other’),
(‘other’, ‘activities.’)]

Q23. Given two strings A and B, return whether or not A can be shifted some number of times to get B.

Example:

A = ‘abcde’
B = ‘cdeab’
can_shift(A, B) == True
A = ‘abc’
B = ‘acb’
can_shift(A, B) == False

Q24. Given two strings, string1 and string, determine if there exists a one-to-one character mapping between each character of string1 to string2.

Example
1:

  • string1 = ‘qwe’
  • string2 = ‘asd’
  • string_map(string1, string2) == True
  • #q = a, w = s, and e = d

Example
2:

  • string1 = ‘donut’
  • string2 = ‘fatty’
  • string_map(string1, string2) == False
  • #t cannot map to two different values

Q25. What is PEP for Python?

PEP stands for Python Enhancement Proposal. It is a document that provides information related to new features of Python, its processes, or environments.

Q26. What do you mean by overfitting a dataset?

Overfitting a dataset means our model is fitting the training dataset so well that it performs poorly on the test dataset. One of the key reasons for overfitting could be that the model has learned the noise in the dataset.

Q27. What do you mean by underfitting a dataset?

Underfitting a dataset means our model is fitting the training dataset poorly. It usually occurs when we don’t fine-tune the parameters of a model and keep looking for alternatives.

Q28. What is the difference between a test set and a validation set?

For unsupervised learning, we use a validation set for selecting a model based on the estimated prediction error. On the other hand, we use a test set to assess the accuracy of the finally chosen model.

Q29. What is the F1 score for a binary classifier? Which library in Python contains this metric?

The F1-score is a combination of precision and recall that represents the harmonic mean of the two quantities. It is given by the formula

Q30. Write a function for f1_score that takes True Positive, False Positive, True Negative, and False Negative as input and outputs f1_score.

  • def f1_score(tp, fp, fn, tn):
  •   p =  tp / (tp + fp)
  •   r = tp / (tp + fn)
  •   return 2 * p * r / (p + r)

Q31. Using sklearn library, how will you implement ridge regression?

>>> from sklearn import linear_model

>>>reg = linear_model.LinearRegression()

>>> reg = linear_model.Ridge(alpha=0.5)

>>> reg.fit(sample_dataset)

Q32. Using sklearn library, how will you implement lasso regression?

>>> from sklearn import linear_model

>>>reg = linear_model.LinearRegression()

>>> reg = linear_model.Lasso(alpha=0.4)

>>> reg.fit(sample_dataset)

Q33. How is correlation a better metric than covariance?

Covariance is a metric that reflects how two variables (a and b) vary from their respective average values (Ä and Æ€). It is given

Where N is the number of data points.

Correlation is a metric that takes into account the standard deviations of the variables (a and b). Mathematically, it is defined as,

Q34.Given two strings, string1 and string, determine if there exists a one-to-one character mapping between each character of string1 to string2.

Example
1:

  • string1 = ‘qwe’
  • string2 = ‘asd’
  • string_map(string1, string2) == True
  • #q = a, w = s, and e = d

Example
2:

  • string1 = ‘donut’
  • string2 = ‘fatty’
  • string_map(string1, string2) == False
  • #t cannot map to two different values

In general, we know that both strings must be equal in length. If they aren’t then there is not a one-to-one mapping. Next, we’ll look at the most efficient way to determine True or False given the conditions.

We know that if there exists one false condition between characters of string1 to string2, we can immediately determine the mapping as FALSE. However, for the condition to be true, we have to continue checking each character in the string until we’ve exhausted all characters and checked for the mapping. Given this mindset, let’s then try looping through both strings and creating a key-value dictionary for the mapping of the characters on string1 to string2 at each index. If the character at an index does not equal the character in the dictionary, then return False.

 Q35. Given a string, return the first recurring character in it, or None if there is no recurring character.

Example:

  • input = “interviewquery”
  • output = “i”
  • input = “interv”
  • output = None

Q36. Explain zip() and enumerate() function.

The zip() function takes multiple lists as input and creates those into a single list of tuples. It does so by taking the corresponding elements of each of the lists as a parameter. It continues this process until it finds the pairs of the tuples.

Example:  We have two lists:

l1 = [‘A’, ‘B’,’C’,’D’] and l2 = [50,100, 150, 200].

zip(list1, list2)

Output: A list of four tuples: [(‘A’,50), (‘B’,100), (’C’,150), (’D’,200)]

In case the length of the lists is not the same, then the zip() function will not generate the tuples once the list with the shooter length ends.

The enumerate() function also takes a list as input and creates a list of tuples. However, its output is: that the first element of the tuple is the position of that element in the list and the second element of the tuple is the actual value of the element in the list.

In short, enumerate() function assigns an index to each item in an iterable object that can be used to reference the item later. It makes it easier to keep track of the content of an iterable object. It returns (position, value). It can only take one list at a time as an input as it takes the position of all the elements.

Example:

  • list2 = [“apple”,”ball”,”cat”]
  • e1 = enumerate(list2)
  • print(e1)
  • Output: [(0, ‘apple’), (1, ball’), (2, ‘cat’)]

Q37. How do map, reduce and filter functions work?

Map function applies the given function to all the iterable and returns a new modified list. It applies the same function to each element of a sequence.

Reduce function applies the same operation to items of a sequence. It uses the result of operations as the first param of the next operation. It returns an item and not a list.

The filter function filters an item out of a sequence. It is used to filter the given iterable (list, sets, tuple) with the help of another function passed as an argument to test all the elements to be true or false. Its output is a filtered list.

Q38. What is the difference between range, range, and range?

range(): returns a Python list object, which is of integers. It is a function of BASE python.

range(): returns a range object.

arrange(): is a function in Numpy library. It can return fractional values as well.

Q39. What is the difference between pass, continue and break?

Pass: It is used when you need some block of code syntactically, but you want to skip its execution. This is a null operation. Nothing happens when this is executed.

Continue: It allows to skip some part of a loop when some specific condition is met, and the control is transferred to the beginning of the loop. The loop does not terminate but continues with the next iteration.

Break: It allows the loop to terminate when some condition is met, and the control of the program flows to the statement immediately after the body of the loop. If the break statement is inside a nested loop (the loop inside another loop), then the break statement will terminate the innermost loop.

Q40. What is Regex? List some of the important Regex functions in Python.

Regular Expression or RegEx is a sequence of characters that are used to create search patterns. In Python, the following RegEx functions are mostly used:

  • match(): it checks for a match only at the beginning of the string.
  • search(): it locates a substring matching the RegEx pattern anywhere in the string
  • sub(): searches for the pattern and replaces it with a new value
  • split(): it is used to split the text by the given RegEx pattern.
  • findall(): it is used to find all the sub-strings matching the RegEx pattern

Q41. What are namespaces in Python?

A namespace is a naming system that is used to ensure that every object has a unique name. It is like space (for visual purposes, think of this space as a container) is assigned to every variable which is mapped to the object. So, when we call out this variable, this assigned space or container is searched and hence the corresponding object as well. Python maintains a dictionary for this purpose.

Q42. What is the difference between global and local variables?

Global variables are the ones that are defined and declared outside a function, and we need to use them inside a function. A variable declared inside the function’s body or the local scope is known as a local variable.

Q43. What is a default value?

Default argument means the function will take the default parameter value if the user has not given any predefined parameter value.

Q44. What does *args, **kwargs mean? When are these used?

*args and *kwargs are keywords that allow a function to take the variable-length argument.

*args:

  • It is used to pass a variable number of arguments to a function
  • It reads the value one by one and prints the value
  • It is used when we are not sure of how many arguments will be passed to a function.
  • The symbol * is used to indicate to take in a variable number of arguments

*kwargs:

  • It is used to pass a keyworded, variable-length argument list
  • It is used when we do not know how many keyword arguments to be passed to a function
  • The symbol ** is to indicate the pass-through keyword argument
  • This helps to unpack a dictionary

Q45. What is the difference between print and return?

The print does not store any value. It simply prints the value, whereas the return gives the value as an output that can be stored in a variable or a data structure.

Q46. What is the use of the With statement?

With statement helps in exception handling and also in processing the files when used with an open file.  Using this way:

with open(“filename,” “mode”) as file_name:

We can open and process the file, and we do not need to close the file explicitly. Post the with block exists., then the file object is closed. The With statement is resourceful and ensures that the file stream process is not stopped, and in case an exception is raised, it ends properly.

Q47. What is the difference between conditionals and control flows?

Conditionals are a set of rules performed if certain conditions are met. The purpose of the conditional flow is to control the execution of a block of code if the statement’s criteria match or not.  These are also referred to as ternary operators. These single-line if-else statements consist of true-false as outputs on evaluating a statement.

Control Flows are the order in which the code is executed. In Python, the control flow is regulated by conditional statements, loops, and call functions.

Q48. How is exception handling achieved in Python?

With the help of exception handling, we can prevent the breaking of codes if an error is faced during the run time of the code. In Python, can implement the exception handling using two keywords: try and except.

Try tries to execute the code that belongs to it.

Except is used after the try block and catches all the specific errors which would appear on running the codes under the try block.

Except

For example: adding an integer with a string is not possible. How would exception handling work in such a scenario is:

try:

  •     a = 9 + ‘Alphabet’
  • except:
  •     a = 40
  • print(a)
  • This will give the output of  40.
  • If had only given the command as:
  • a = 9 + ‘Alphabet’
  • print(a)

Then, Python would return the TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’

Q49. When to use for loop and while loop?

For loopis used when you know beforehand which elements need to be iterated. If you want to iterate over every element of the data structure, then use For loop. On the other hand, the While loop is used to check for some conditions on the variables. Here, we know the exact condition to run but do not know how many times to run the loop.

Q50. What is the difference between series and vectors?

Vectors: It can only assign index positions values as 0,1,…, (n-1).Series: It has only one column. It can assign custom index positions values that are for every data series. Examples: cust_ID, cust_name, sales. Series can be created from the list, array, and dictionaries.

GoLogica Technologies Private Limited. All rights reserved 2024.