[go: up one dir, main page]

0% found this document useful (0 votes)
12 views15 pages

Unit4 Data Science Questionans

Uploaded by

Atharv Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Unit4 Data Science Questionans

Uploaded by

Atharv Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

ARTIFICIAL INTELLIGENCE

QUESTION BANK – CLASS 10


UNIT 4: DATA SCIENCE

MCQs:

1.What is the primary component that AI depends on?


a) Algorithms
b) Hardware
c) Data
d) Software
Answer: c) Data
2.Which of the following is NOT a domain of AI mentioned in the document?
a) Data Sciences
b) Computer Vision
c) Natural Language Processing
d) Quantum Computing
Answer: d) Quantum Computing
3.Data Sciences primarily work around which type of data?
a) Image data
b) Numeric and alphanumeric data
c) Textual data
d) Speech data
Answer: b) Numeric and alphanumeric data
4.Which of the following is an example of an AI application in finance?
a) Airline route planning
b) Fraud and risk detection
c) Website recommendations
d) Personality prediction
Answer: b) Fraud and risk detection
5.How do search engines use data science?
a) To predict flight delays
b) To suggest movies
c) To deliver search results
d) To classify images
Answer: c) To deliver search results
6.What is the primary objective of data science in genomics?
a) To improve airline route planning
b) To enhance movie recommendations
c) To personalize treatment based on DNA
d) To predict stock prices
Answer: c) To personalize treatment based on DNA
7.What is the primary focus of targeted advertising?
a) Fraud detection
b) Customer segmentation
c) Maximizing click-through rates
d) Improving search results
Answer: c) Maximizing click-through rates
8.Which model is chosen for predicting food wastage in restaurants?
a) Classification
b) Clustering
c) Regression
d) Reinforcement
Answer: c) Regression
What does the AI project goal in the restaurant scenario aim to predict?
a) Customer behavior
b) Food wastage
c) Food quantity to be prepared
d) Restaurant ratings
Answer: c) Food quantity to be prepared
Which factor is NOT part of the system map for the food wastage problem?
a) Number of customers
b) Price of dish
c) Customer feedback
d) Quantity of unconsumed dish
Answer: c) Customer feedback
Which of the following is an offline data collection method?
a) Sensors
b) Surveys
c) Government portals
d) Kaggle
Answer: b) Surveys
What format is used to store tabular data with comma-separated values?
a) JSON
b) XML
c) CSV
d) SQL
Answer: c) CSV
What is the primary use of NumPy in Python?
a) Text processing
b) Web development
c) Arithmetic operations on arrays
d) Image recognition
Answer: c) Arithmetic operations on arrays
Which Python package is primarily used for data manipulation and analysis?
a) NumPy
b) Matplotlib
c) Pandas
d) TensorFlow
Answer: c) Pandas
Which data structure is used by Pandas to handle 2-dimensional data?
a) Series
b) DataFrame
c) Array
d) Tuple
Answer: b) DataFrame
Which visualization library is mentioned for creating 2D plots in Python?
a) NumPy
b) Matplotlib
c) Pandas
d) Seaborn
Answer: b) Matplotlib
What type of data is typically visualized using scatter plots?
a) Continuous data
b) Discontinuous data
c) Textual data
d) Image data
Answer: b) Discontinuous data
What is the key feature of a histogram?
a) Represents frequency distribution
b) Shows textual data
c) Displays social networks
d) Analyzes time series
Answer: a) Represents frequency distribution
What do box plots represent in data visualization?
a) Frequency of data
b) Quartiles and outliers
c) Discontinuous data
d) Categorical data
Answer: b) Quartiles and outliers
What is the key concept of the K-Nearest Neighbour algorithm?
a) Majority voting
b) Decision trees
c) Nearest neighbors classification
d) Predicting stock prices
Answer: c) Nearest neighbors classification
What is the primary application of K-Nearest Neighbour (KNN) in the
document’s game activity?
a) Predicting stock market trends
b) Personality prediction
c) Flight delay prediction
d) DNA analysis
Answer: b) Personality prediction
In the personality prediction game, which axis represents being task-focused?
a) Positive X-axis
b) Negative X-axis
c) Positive Y-axis
d) Negative Y-axis
Answer: b) Negative X-axis
How many nearest neighbors are considered in KNN when K=1?
a) 1
b) 2
c) 3
d) 5
Answer: a) 1
As the value of K increases in KNN, predictions become:
a) Less stable
b) More stable
c) Faster
d) Less accurate
Answer: b) More stable
In the example of predicting fruit sweetness, when K=2, the prediction:
a) Is sweet
b) Is not sweet
c) Becomes uncertain
d) Depends on color
Answer: c) Becomes uncertain

What is the disadvantage of having K=1 in KNN?


a) More computational resources
b) Unstable predictions
c) Slower prediction times
d) Too many errors
Answer: b) Unstable predictions
What is an advantage of using Python in data science?
a) Complex syntax
b) Limited libraries
c) Predefined functions for statistics
d) Slow performance
Answer: c) Predefined functions for statistics
What is Mean in statistics?
a) The sum of all values divided by the number of values
b) The middle value of a dataset
c) The most frequent value in a dataset
d) The range of values in a dataset
Answer: a) The sum of all values divided by the number of values
What does the term ‘mode’ refer to in statistics?
a) The highest value
b) The most frequent value
c) The middle value
d) The sum of all values
Answer: b) The most frequent value
Standard deviation is used to measure:
a) The mean
b) The range
c) The variability of data
d) The mode
Answer: c) The variability of data
Which Python package is primarily used for data visualization?
a) NumPy
b) Pandas
c) Matplotlib
d) SciPy
Answer: c) Matplotlib
What type of plot is used to represent continuous data frequency?
a) Scatter plot
b) Box plot
c) Histogram
d) Line plot
Answer: c) Histogram
Outliers in a dataset are typically represented in box plots as:
a) Boxes
b) Whiskers
c) Circles or dots
d) Lines
Answer: c) Circles or dots
Which type of data is commonly used in data science projects?
a) Textual data
b) Numeric and alpha-numeric data
c) Visual data
d) Audio data
Answer: b) Numeric and alpha-numeric data
Data Science integrates methods from:
a) History and Literature
b) Statistics and Computer Science
c) Biology and Chemistry
d) Architecture and Engineering
Answer: b) Statistics and Computer Science
What is a common source of online data for data science projects?
a) Personal interviews
b) Kaggle
c) Classroom surveys
d) Newspapers
Answer: b) Kaggle
Which AI domain deals with image and visual data?
a) Data Sciences
b) Computer Vision
c) Natural Language Processing
d) Genetic Analysis
Answer: b) Computer Vision
What is the significance of NaN in a dataset?
a) Represents text data
b) Represents an error
c) Represents missing or invalid data
d) Represents numerical data
Answer: c) Represents missing or invalid data
What does CSV stand for?
a) Comma-Separated Values
b) Code-Specific Variables
c) Constant Search Value
d) Computer-Structured Variables
Answer: a) Comma-Separated Values
Which tool is used to predict flight delays in airlines?
a) Classification model
b) K-Nearest Neighbours
c) Regression model
d) Neural networks
Answer: c) Regression model
What is the primary benefit of using a box plot?
a) It shows continuous data
b) It provides frequency distribution
c) It displays data quartiles and outliers
d) It identifies trends over time
Answer: c) It displays data quartiles and outliers
How are errors in data typically represented?
a) Through graphs
b) Through incorrect or invalid values
c) Through statistical models
d) Through outliers
Answer: b) Through incorrect or invalid values
What is the role of Pandas in Python?
a) Image processing
b) Text analysis
c) Data manipulation and analysis
d) Speech recognition
Answer: c) Data manipulation and analysis
Which data format is primarily used for tabular data?
a) JSON
b) CSV
c) XML
d) SQL
Answer: b) CSV
What is a common use of regression models in data science?
a) Classification of text
b) Predicting numerical values
c) Recognizing speech
d) Processing images
Answer: b) Predicting numerical values
What is the importance of cleaning data before analysis?
a) To format the data
b) To ensure accuracy
c) To speed up the analysis
d) To display data visually
Answer: b) To ensure accuracy
Which of the following is NOT a statistical tool used in data analysis?
a) Mean
b) Mode
c) Regression
d) Histogram
Answer: d) Histogram
What is the key challenge in airline route planning?
a) Flight delays
b) Customer loyalty programs
c) Predicting customer satisfaction
d) Predicting profitability
Answer: a) Flight delays
Which of the following is an advantage of using NumPy arrays?
a) Can hold multiple data types
b) Homogeneous data collection
c) Simple to initialize
d) Require more memory
Answer: b) Homogeneous data collection
What is the purpose of website recommendation systems?
a) Predict user behavior
b) Improve user experience
c) Boost product sales
d) All of the above
Answer: d) All of the above
QUESTION-ANSWERS:

Explain how Artificial Intelligence (AI) depends on data.


AI fundamentally relies on data to function and improve its intelligence. Data
fed into AI systems allow them to identify patterns, make predictions, and learn.
Different types of data—numeric, visual, or textual—are used in various AI
domains like Data Science (numeric data), Computer Vision (visual data), and
Natural Language Processing (NLP) (textual data). Without data, AI cannot
operate effectively.
What are the key fields that Data Science integrates, and how do they contribute
to its functions?
Data Science integrates fields such as Mathematics, Statistics, Computer
Science, and Information Science. Mathematics provides theoretical
foundations, while statistics offers tools for analyzing data. Computer Science
enables the development of algorithms to process large datasets, and
Information Science focuses on managing and retrieving this data. Together,
these fields allow data scientists to analyze and make predictions from complex
datasets.
Describe the role of Data Science in fraud and risk detection in finance.
Data Science plays a crucial role in fraud and risk detection by analyzing
historical data, customer profiles, and expenditures. In finance, it helps
companies identify potential risks, detect fraud, and prevent bad debts. By
examining customer behavior and transaction patterns, data scientists create
predictive models that reduce financial losses and optimize risk management
strategies.
How does Data Science contribute to advancements in genetics and genomics?
In genetics and genomics, Data Science enables personalized treatments by
analyzing large-scale genomic data. It helps in understanding the relationship
between DNA and health, allowing researchers to predict how individuals may
react to certain drugs or be predisposed to diseases. This deeper insight into
human DNA helps in developing advanced genetic risk prediction models for
more tailored healthcare solutions.
Explain the use of data science algorithms in internet search engines.
Search engines like Google, Bing, and Yahoo use data science algorithms to
deliver accurate search results quickly. These algorithms analyze user queries,
rank relevant results, and improve over time through machine learning. With
massive amounts of data processed daily, such algorithms help filter, sort, and
rank results based on relevance, thereby enhancing the search experience.
What is targeted advertising, and how does Data Science enhance it?
Targeted advertising uses data science algorithms to personalize advertisements
based on user behavior and preferences. Data collected from online activity,
such as browsing history and past purchases, is analyzed to display relevant ads.
This increases the effectiveness of digital marketing campaigns by ensuring that
ads reach the right audience, leading to higher conversion rates compared to
traditional advertising methods.
What is the AI project cycle, and why is it important in Data Science?
The AI project cycle consists of several stages: problem scoping, data
acquisition, data exploration, modeling, and evaluation. This cycle is essential
for systematically addressing real-world problems using AI and Data Science. It
helps ensure that the problem is well understood, the right data is collected,
accurate models are built, and their predictions are properly evaluated for
effective solutions.
How can Data Science help reduce food wastage in restaurants?
Data Science can help predict the amount of food to be prepared by analyzing
historical data such as the number of customers, dish preferences, and past food
wastage. By using regression models, restaurants can estimate the exact amount
of food needed each day, minimizing wastage and reducing losses. Predictive
algorithms help restaurants optimize their supply chain and daily food
production.
Describe the significance of regression models in predicting food wastage.
Regression models, which are part of supervised learning, predict continuous
values by analyzing trends over time. In the restaurant scenario, regression
models use historical data—such as customer footfall and dish consumption—to
predict future demand. By training these models on previous data, restaurants
can accurately estimate the amount of food to prepare, thus preventing excess
wastage.
What is the difference between offline and online data collection methods?
Offline data collection involves gathering data through traditional means such
as surveys, interviews, or observations, while online data collection utilizes
digital platforms like sensors, government portals, and open-source websites
(e.g., Kaggle). Offline methods are useful for localized or manual data
collection, whereas online methods allow access to larger datasets from reliable
sources, facilitating more comprehensive analysis in Data Science.
Explain the importance of data cleaning in Data Science.
Data cleaning is crucial because it ensures the accuracy and consistency of the
data used in analysis. It involves identifying and correcting errors such as
incorrect values, missing entries, and outliers. Without cleaning, faulty data can
lead to incorrect conclusions and poor model performance. Clean data enhances
the reliability of the models, ensuring accurate and meaningful insights.
What are the advantages of using Python’s NumPy for data analysis?
NumPy is a powerful library for numerical operations in Python, widely used in
Data Science for handling large datasets. It allows for efficient mathematical
computations, such as matrix operations and arithmetic functions, on arrays.
NumPy’s ability to process multi-dimensional data and its speed in handling
large datasets make it a preferred choice for data manipulation and analysis in
scientific computing.
How do box plots represent data distribution, and what insights do they offer?
Box plots graphically display the distribution of a dataset by dividing it into
quartiles. They show the minimum, first quartile, median, third quartile, and
maximum, along with any outliers. Box plots are useful for identifying the
spread and skewness of data, as well as detecting outliers. They offer a quick
visual summary of data distribution and are commonly used in exploratory data
analysis.
What is the K-Nearest Neighbour (KNN) algorithm, and how does it work?
The KNN algorithm is a simple, supervised learning technique used for
classification and regression tasks. It classifies data points based on the majority
label of their nearest neighbors. For instance, if K=3, the algorithm looks at the
3 closest points to the unknown data and assigns it the most common class
among those neighbors. KNN is based on the principle that similar data points
exist near each other.
What are some common sources of open data for Data Science projects?
Common sources of open data for Data Science projects include government
portals, open-source platforms like Kaggle, and world organizations’ statistical
websites. These sources provide reliable datasets that can be used for various
purposes, from academic research to business analytics. Access to open data
ensures that data scientists have authentic and diverse datasets to work with for
model training and analysis.

You might also like