[go: up one dir, main page]

0% found this document useful (0 votes)
193 views3 pages

Pandas Interview Question

The document provides a comprehensive overview of various Pandas functionalities, including the differences between lists and tuples, DataFrames and Series, and methods for handling missing data. It covers operations such as merging DataFrames, renaming columns, and applying functions, as well as advanced topics like multi-indexing and time series data manipulation. Additionally, it highlights the importance of vectorization and provides examples of various methods and their use cases.

Uploaded by

akshat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
193 views3 pages

Pandas Interview Question

The document provides a comprehensive overview of various Pandas functionalities, including the differences between lists and tuples, DataFrames and Series, and methods for handling missing data. It covers operations such as merging DataFrames, renaming columns, and applying functions, as well as advanced topics like multi-indexing and time series data manipulation. Additionally, it highlights the importance of vectorization and provides examples of various methods and their use cases.

Uploaded by

akshat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

What are the differences between lists and tuples in Python, and how does this
distinction relate to Pandas operations?
Answer: tuples are immutable as opposed to lists which are mutable.

2. What is a DataFrame in Pandas, and how does it differ from a Series?


Answer: series has one column while dataframe have more than two column.

3. Can you explain how to handle missing data in Pandas, including the difference
between 'fillna()' and 'dropna()'?
Answer: fillna() helps us to fill the data while on the other hand dropna() delete
the rows of the missing value. fillna()
is more suitable to use bacause use of dropna() may lack the data integrity,

4. Describe the process of renaming a column in a Pandas DataFrame.


Answer: Method 1: using rename() function.
Method 2: assigning list of new column names.
Method 3: replacing the columns string.
Method 4: using set_axis() function.

5. What is the purpose of the 'groupby' function in Pandas, and provide an example
of its usage?
Answer: groupby is used for extract the rows on the basic of their group just like
in the college student dataset if want to extract
total student by their department then we use groupby.

6. How can you merge two DataFrames in Pandas, and what are the different types of
joins available?
Answer: by using join, merge and concat function.
There are five types of Joins in Pandas:
:Inner Join
:Left Outer Join
:Right Outer Join
:Full Outer Join or simply Outer Join
:Index Join

7. Explain the purpose of the 'apply' function in Pandas, and give an example of
when you might use it.
Answer: The apply() method allows you to apply a function along one of the axis of
the DataFrame,
default 0, which is the index (row) axis.
example : import pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [10, 20]})

def square(x):
return x * x

df1 = df.apply(square)
print(df)
print(df1)

8. What is the difference between 'loc' and 'iloc' in Pandas, and when would you
use each?
Answer: loc() works on coloumn name as well as index position while iloc() function
only works on integer index column.

9. Explain the difference between a join and a merge in Pandas with examples.
Answer: Both join and merge can be used to combines two dataframes but the join
method combines two dataframes
on the basis of their indexes whereas the merge method is more versatile and
allows us to specify columns beside
the index to join on for both dataframes.

10. How do you remove duplicates from a DataFrame in Pandas?


Answer: dropduplicates() function is used to remove the dropduplicates.

11. How do you join two DataFrames on multiple columns in Pandas?


Answer:

12. Discuss the use of the 'pivot_table' method in Pandas and provide an example
scenario where it is useful.
Answer:

13. Explain the difference between the 'agg' and 'transform' methods in groupby
operations.
Answer:aggregation must return a reduced version of the data, transformation can
return some transformed version
of the full data to recombine. For such a transformation, the output is the
same shape as the input. A common example
is to center the data by subtracting the group-wise mean.

14. Describe a method to handle large datasets in Pandas that do not fit into
memory.
Answer: Use chunking:
As long as each chunk fits in memory, you can work with datasets that are
much larger than memory. Chunking works well
when the operation you're performing requires zero or minimal coordination
between chunks. For more complicated
workflows, you're better off using another library.

15. How can you convert categorical data into 'dummy' or 'indicator' variables in
Pandas?
Answer:

16. What is the difference between 'concat' and 'append' methods in Pandas?
Answer: Append function will add rows of second data frame to first dataframe
iteratively one by one. Concat function
will do a single operation to finish the job, which makes it faster than
append().

17. How would you use the 'melt' function in Pandas, and what is its purpose?
Answer: Pandas melt() function is used to change the DataFrame format from wide to
long. It's used to create a
specific format of the DataFrame object where one or more columns work as
identifiers.

18. Describe how you would perform a vectorized operation on DataFrame columns.
Answer:

19. How can you set a column as the index of a DataFrame, and why would you want to
do this?
Answer:

20. Explain how to sort a DataFrame by multiple columns in Pandas.


Answer: df = df. sort_values(['attempts', 'name'], ascending=[True, True]): Here
the sort_values() method
is used to sort the DataFrame based on two columns 'attempts' and 'name'.
The ascending parameter is set to [True, True] to indicate that the sorting
should be done in scending order
for both columns.

21. How do you deal with time series data in Pandas, and what functionalities
support its manipulation?
Answer:

22. What are some ways to optimize a Pandas DataFrame for better performance?
Answer:

23. Explain the purpose of the 'crosstab' function in Pandas and provide a use
case.
Answer:

24. How can you reshape a DataFrame in Pandas using the 'stack' and 'unstack'
methods?
Answer: stack() : stack the prescribed level(s) from column to row.
unstack() : unstack the prescribed level(s) from row to column

25. Describe how to use the 'query' method in Pandas and why it might be more
efficient than other methods.
Answer:

26. Discuss the importance of vectorization in Pandas and provide an example of a


non-vectorized operation
versus a vectorized one.
Answer:

27. How would you export a DataFrame to a CSV file, and what are some common
parameters you might adjust?
Answer:

28. Explain the use of multi-indexing in Pandas and provide a scenario where it’s
beneficial.
Answer: multi-indexing is used for assigning two different index at the time of
concatination to identify the each dataset
correctly
usecase: while performing marketing data analysis concatination of two month
sales dataset and using multi-indexing
at the time of concatination will give result like dataset with two
differ index

29. How can you handle different timezones in Pandas?


Answer:

You might also like