[go: up one dir, main page]

0% found this document useful (0 votes)
50 views8 pages

Data Science Question Bank and Concepts

Good data science question bank

Uploaded by

ankitforpython
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views8 pages

Data Science Question Bank and Concepts

Good data science question bank

Uploaded by

ankitforpython
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Question Bank for Introduction to Data Science

Two Marks Questions

1 What is Data Science. Mention any two of its applications.

2 What is meant by descriptive statistics? Give two examples.

3 Define probability distribution and give an example from real life.

4 What is Data Collection and discussed the method of data collections.

5 Explain the importance of exploratory data analysis (EDA).

6 Differentiate between structured and unstructured data.

7 Describe the importance of data cleaning.

8 What are the main challenges in data collection?

9 What is the significance of probability distributions?

10 Write the difference between Type I and Type II error in hypothesis testing.

11 List any two interdisciplinary fields that contribute to Data Science.

12 What is the scope of Data Science in business decision-making?

13 Why is data preprocessing important before analysis?

14 Write any two types of probability distributions.

15 Define descriptive statistics.

16 What is p-value in hypothesis testing?

17 Differentiate between the Discrete Random variables and continuous random variables

18 Write the methods to evaluate the central tedency measures

19 Differentiate between the descriptive and inferential statistics

20 Write the properties of Probability Mass function and Probability Density Function.

21 Explain data science in terms your grandmother would understand.

22 What is something that excites you most about learning data science?

23 Give an example of something that you believe to be random and cannot be predicted.

24 Explain why data literacy is important.

25 Explain the difference between raw data and trusted data.

26 How is descriptive statistics different from inferential statistics?


27 Consider a dataset consisting of one million ones. What is its standard deviation?

28 Explain how correlation is different from covariance.

29 What is sampling error? How does increasing the sample size a ect the sampling error?

30 What is Probability Distribution .What its importance in data Science.

31 Define Big Data and mention any two examples of Big Data sources.

32 What are Structured and Unstructured data? Give one example of each.

33 Why is Veracity important in Big Data analytics?

34 WHat is Hadoop Architecture . List the different layers

35 What is MApreduce and HDFS working in Hadoop Platform

36 What is informed consent in data collection? Why is it necessary?

37 Define fairness in data science. Why is achieving fairness challenging?

38 Mention two strategies used to reduce bias in data science models.

39 What is GDPR? Mention any one way it helps protect user data.

40 What is algorithmic bias? Give one example of how it may occur.

41 What are the five V’s of Big Data?

42 Mention any two major challenges of Big Data.

43 What is the difference between batch and stream processing?

44 Define Data Wrangling.

45 Define bias in machine learning.

46 List two methods to protect data confidentiality.

47 What is the role of transparency in ethical data handling?

48 What is the significance of blockchain in data security?

49 Define data ethics and explain why it is important in data science.

50 What is data privacy? Give one example of a privacy concern in data science.

51 Differentiate between data bias and algorithmic fairness.

52 What is data security? Name any two techniques used to protect data.

53 Mention any two future trends in data science.

54 Define Big Data. What are its main characteristics?

55 What is data wrangling?


56 What is the purpose of Hadoop in big data processing?

57 List any two challenges of handling large-scale datasets.

58 What is a data pipeline?

59 What is Big Data?

60 Define Data Lake and Data Warehouse.

61 What is the role of Apache Spark in Big Data processing?

62 What is Data Privacy?

63 Define Data Integrity.

64 Disadvantages of Data Security.

Six Marks questions

1 Explain the scope of Data Science with suitable real-world applications.

2 A dataset of marks obtained by 11 students is: 12, 15, 20, 25, 30, 30, 35, 40, 45, 50, 95

(i) Find the Mean, Median, and Mode including the outlier.

(ii) Identify and remove the outlier.

(iii) Recalculate the Mean, Median, and Mode after removing the outlier.

(iv) Comment on the effect of the outlier on central tendency measures.

Explain the steps involved in performing Exploratory Data Analysis (EDA) with an example
4 dataset.

What are hypothesis tests? Explain the difference between null hypothesis (H₀) and alternative
5 hypothesis (H₁) with examples.

6 What is Data Cleaning. Explain the methods involve in data cleaning.

7 What are the major steps in a Data Science project? Explain with an example.

8 Explain descriptive statistics with suitable examples.

9 Explain the difference between qualitative and quantitative data with examples.

10 Differentiate between descriptive and inferential statistics. Give examples.

11 Explain the role of visualization in Exploratory Data Analysis (EDA).

12 Discuss the role of Data Science in modern industries like healthcare, finance, and retail.

Explain the role of structured, unstructured, and semi-structured data in Data Science
13 applications.
“Data is the backbone of Data Science.” Justify this statement with reference to data collection
14 and cleaning methods.

Explain the importance of hypothesis testing in decision-making with a practical example from
15 business or healthcare.

“EDA is the first step towards building any data-driven model.” Justify this statement with
16 suitable arguments and examples.

Discuss in detail how descriptive statistics and EDA complement each other in the data analysis
17 process.

How to evaluuate the Mean and variance of the discrete and continuous randam variables?
18 Demonstarte with the help of suitable example

How the Numerical data is classified? Differnetiate between the Qualitative data and
19 Qauantative data

20 What are the different data gathering mentods? Explain each method in detail

21 What are the main methods used in data cleaning .

22 Data Collection is the first phase in data science. Mention the various ways to collect the data .

23 Difference between 1) Discrete data and Continuous data 2)Structured and Unstructured data.

What is Data Visualization . What are different types of graph to implement the data
24 Visualization.

Explain the different types of probability distributions with suitable examples (2 from each
25 discrete and Continuous).

26 Explain the characteristics of Big Data with examples.

27 Describe data wrangling and explain its key steps with suitable examples.

What is distributed computing? Explain how Hadoop and Spark use distributed architecture to
28 handle large datasets.

Describe the importance of data pipelines. Explain the stages of an end-to-end data pipeline
29 with a real-world example.

Describe the role of data privacy in modern digital systems and explain how organizations
30 ensure user data protection.

31 Discuss the major challenges in ensuring data ethics in large-scale data-driven systems.

Discuss the importance of data security and describe methods to protect data confidentiality,
32 integrity, and availability.

33 Elaborate on future trends in data science and how they will impact industries and society.

34 Describe the concept of data integration with suitable example.

35 What is data validation? Why is it important in ensuring data accuracy and reliability?
36 Explain in brief about different types of data pipeline.

37 Discuss different types of Data Wrangling tools in brief with example.

38 Explain the major ethical issues in data science such as privacy, bias, and fairness.

39 What is data security? Discuss the importance of maintaining data confidentiality and integrity.

40 Describe the concept of algorithmic bias. How does it affect decision-making systems?

42 Discuss the role of data science in modern real-world applications with suitable examples.

43 Explain any three emerging technologies in data science and their relevance.

44 Explain the 5 Vs of Big Data with suitable examples.

45 What is Hadoop? Describe the roles of HDFS and MapReduce in big data processing.

46 Explain the steps involved in data wrangling.

47 Compare Hadoop and Spark in terms of speed, processing model, and architecture.

48 Describe the key components of a modern data pipeline and their functions.

21st century is known the data new currency. Mnetion the term used for this data and its
49 characteristics .

50 Define the terms : Data Mining, Data lakes and Data Mart

51 Why Privacy is the most important concern in Big Data. Mention several steps to achieve it.

52 What is Data Intergrity. Mention the steps to achieve it ?

53 Explain the characteristics of Big Data (the 5 Vs) in detail with suitable examples.

Describe the Data Wrangling process. Explain each step (Discovering, Structuring, Cleaning,
54 Enriching, Validating, Publishing) with examples.

55 What is Hadoop? Explain its architecture and working with the help of HDFS and MapReduce.

Explain the concepts of Privacy, Bias, and Fairness in data science. How do they impact ethical
56 decision-making?

57 Define Data Security and Data Integrity. Explain their advantages and disadvantages in detail.

Describe any four major Data Science trends (2025–2026) and explain their significance in
58 modern data-driven industries

Ten Marks Questions

1 Explain the 5 Vs of the Big Data? Explain with the Help of Suitable Examples
Explain the working different components of the Hadoop Distributed Files System(HDFS). Write
2 the challenges and Applications of the Big Data analytics

Explain the working of the Map Reduce? Implement the steps involved in the Map Reduce on
the given Input file. Input file is
" Welcome to the Hadoop Class Hadoop Class is good
Haadoop class is bad Hadoop class can be classified into class
3 Hadoop can be good Class Hadoop can be Bad class"

Explain the ethical considerations in data science with special reference to privacy, bias, and
fairness. Also discuss the importance of data security in protecting data integrity and
4 confidentiality, citing suitable examples.

Discuss the future trends in data science, highlighting the emerging technologies and their
potential impact on industries. Explain how these advancements are expected to shape the
5 prospects and evolution of data science in the coming years

Large-scale data often contains missing, inconsistent or duplicate entries. Discuss the data
6 cleaning and transformation steps involved in preparing such data for analysis.

Explain the major components of a modern end-to-end data processing pipeline. Describe how
7 data flows from ingestion to storage, transformation, and final consumption.

What is data wrangling? List and briefly explain three common data wrangling techniques used
8 for large-scale datasets.

9 Explain in detail about the three common threats to data security in organizations.

10 Analyse the ethical, social, and technical challenges of using AI in decision-making systems

Discuss in detail the ethical considerations in data science. Explain privacy, bias, fairness, and the
11 responsibilities of data scientists.

Explain various data security challenges and solutions. Discuss encryption, access control,
12 authentication, and auditing.

Describe the real-world applications of data science in healthcare, finance, transportation, and
13 social media.

What are the future trends in data science? Discuss technologies such as AI automation,
14 quantum data processing, Edge-AI, and augmented analytics.

Explain how unethical use of data can impact society. Provide real-life case studies and propose
15 preventive mechanisms.

Discuss the architecture of Hadoop in detail. Explain how HDFS, NameNode, DataNode, and
16 MapReduce work together.

Explain Apache Spark architecture, including RDDs, transformations, actions, and Spark
17 execution model.

What is Big Data? Explain its characteristics, challenges, and applications in industries such as
18 healthcare, finance, and e-commerce.
Describe the process of designing an end-to-end data pipeline. Include data ingestion,
19 transformation, storage, and visualization.

Elaborate the role of data engineering in big data processing. Discuss tools, storage systems,
20 processing frameworks, and real-world use cases.

What is Hadoop? Explain its architecture, components (HDFS, MapReduce, YARN, Hadoop
21 Common), working, and advantages & disadvantages.

Explain the characteristics of Big Data (5Vs) with suitable examples. Also discuss why these
22 characteristics make Big Data different from traditional data.

Describe in detail the benefits, use cases, and challenges of Big Data in modern organizations.
23 Give real-world examples to support your answer.

Discuss any five Big Data tools in detail (Hadoop, Spark, Cassandra, Hive, Storm, RapidMiner,
24 Flink, OpenRefine). Explain their features and use cases.

Explain the ethical considerations in data science with special reference to privacy, bias, and
fairness. Also discuss the importance of data security in protecting data integrity and
25 confidentiality, citing suitable examples.

Discuss the importance of ethical considerations in Data Science. Explain privacy, bias, fairness,
26 and the role of regulations like GDPR.

What is Data Security? Explain its advantages, disadvantages, and importance in modern data-
27 driven organizations. Give examples of security risks and measures.

Explain the top Data Science Trends for 2025–2026 (AI, NLP, XAI, Edge Computing, Augmented
28 Analytics, Synthetic Data, Graph Analytics, Quantum Computing, etc.) with examples.

Explain the entire Data Science process in detail. Discuss its importance, real-life applications,
29 and the major industries where Data Science is used.

What is Data Preprocessing? Explain all the steps involved in Data Preprocessing—Data
30 Cleaning, Data Integration, Data Transformation, and Data Reduction—with suitable examples.

Differentiate between Structured, Unstructured, and Semi-Structured Data. Explain their


31 characteristics, examples, advantages, challenges, and applications in detail.

Describe the various Data Collection Methods in detail. Explain Interviews, Questionnaires,
Observations, Experiments, Focus Groups, Case Studies, and Secondary Data with examples,
32 advantages, and limitations.

What is Data Cleaning? Explain the Data Cleaning process step-by-step with the Titanic dataset
33 example. Also discuss its benefits, disadvantages, and commonly used data cleaning tools.

Explain in detail the concepts of Population and Sample. Discuss Population Parameters vs
34 Sample Statistics with examples. Also explain when to use population and when to use a sample.

Describe the different types of Descriptive Statistics. Explain Measures of Central Tendency and
35 Measures of Variability with formulas and suitable examples.
Define and differentiate between Discrete and Continuous Variables. Explain each with
36 examples. Also discuss Percentiles, Quartiles, and the 5-Number Summary with formulas.

Explain Covariance and Correlation in detail. Discuss their formulas, types, interpretation, and
37 the difference between covariance and correlation with examples.

What is Probability Distribution? Explain the types of probability distributions: Discrete and
Continuous. Discuss Bernoulli Trials, Binomial Distribution, Poisson Distribution, Uniform
38 Distribution, and Normal Distribution with examples.

You might also like