[go: up one dir, main page]

0% found this document useful (0 votes)
175 views6 pages

Structured Data Classification MCQ's

The document contains multiple-choice questions (MCQs) related to structured data classification, covering topics such as hyperparameters, classification techniques, and data preprocessing. Key concepts discussed include the importance of model evaluation, handling imbalanced classes, and the use of various classifiers like Decision Trees and Naive Bayes. Additionally, it emphasizes the significance of using appropriate metrics and techniques for accurate model performance assessment.

Uploaded by

Gurram Anurag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views6 pages

Structured Data Classification MCQ's

The document contains multiple-choice questions (MCQs) related to structured data classification, covering topics such as hyperparameters, classification techniques, and data preprocessing. Key concepts discussed include the importance of model evaluation, handling imbalanced classes, and the use of various classifiers like Decision Trees and Naive Bayes. Additionally, it emphasizes the significance of using appropriate metrics and techniques for accurate model performance assessment.

Uploaded by

Gurram Anurag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Join our channel if you haven’t joined yet https://t.

me/fresco_milestone ( @fresco_milestone )

Structured Data Classification MCQ's

Which of the given hyper parameter(s), when increased may cause random forest to over fit the
data?

Answer : Depth of Tree

To view the first 3 rows of the dataset, which of the following commands are used?Download the
dataset
from:https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2
f487608c537c05e22e4b221/iris.csv to answer the question.

Answer : iris.head(3)

Pruning is a technique associated with

Answer : Decision tree

High classification accuracy always indicates a good classifier.

Answer : True

Categorical variables has

Answer : no logical order

Cross-validation technique will provide accurate results when the training set and the testing set are
from two different populations.

Answer : True

Let's assume, you are solving a classification problem with highly imbalanced class. The majority
class is observed 99% of times in the training data. Which of the following is true when your model
has 99% accuracy after taking the predictions on test data. ?

Answer : For imbalanced class problems, accuracy metric is not a good idea.

Email spam detection is an example of

Answer : supervised classification

A technique used to depict the performance in a tabular form that has 2 dimensions namely “actual”
and “predicted” sets of data.

Answer : Confusion Matrix

Choose the correct sequence for classifier building from the following:

Answer : Initialize -> Train - -> Predict-->Evaluate

The commonly used package for machine learning in python is

Answer : sklearn
Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

A classifer that can compute using numeric as well as categorical values is

Answer : Decision Tree Classifier

Can we consider sentiment classification as a text classification problem?

Answer : yes

What kind of classification is the given case study(IRIS dataset)?Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.

Answer : Multi class classification

Ensemble learning is used when you build component classifiers that are more accurate and
independent from each other.

Answer : true

clustering is an example of

Answer : unsupervised classification

Model Tuning helps to increase the accuracy

Answer : True

Images and documents are examples of _________

Answer : Unstructured Data

Ordinal variables has

Answer : clear logical order

Which command is used to select all NUMERIC types in the dataset.Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.

Answer : iris_num = iris_data.select_dtypes(include=[numpy.number])

The number of categorical attributes in the original dataset.Download the dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.

Answer : 3

Which classifier converges easily with less training data?

Answer : Naive Bayes Classifier

Imputing is a strategy to handle


Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

Answer : Missing Values

classification where each data is mapped to more than one class is called

Answer : Binary Classification.

The fit(X, y) is used to

Answer : Train the Classifier

Supervised learning differs from unsupervised learning as supervised learning requires __________

Answer : Labeled data

Clustering is a supervised classification.

Answer : False

Select the correct option which directly achieve multi-class classification (without support of binary
classifiers).

Answer : K Nearest Neighbor

The classification where each data is mapped to more than one class is called ___________

Answer : Multi Label Classification

Email spam data is an example of __________

Answer : unstructed Data

The most widely used package for machine learning in Python is _________

Answer : sklearn

Pruning is a technique associated with __________

Answer : dt

What does the command sentiment_analysis_data['label'].value_counts() return?

Answer : counts of unique values in the 'label' column

Select the pre-processing technique(s) from the following.

Answer : all

Which of the given hyper parameter, when increased, may cause random forest to over fit the data?

Answer : depth of tree

Select the correct statement about Nonlinear classification.

Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes.
Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

Choose the correct sequence for classifier building from the following.

Answer : Initialize -> Train - -> Predict-->Evaluate

What command should be given to tokenize a sentence into words?

Answer : from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

Choose the correct sequence from the following.

Answer : Data Analysis -> PreProcessing -> Model Building--> Predict

The following are all classification techniques, except ___________

Answer : StratifiedShuffleSplit

The commonly used package for machine learning in python is

Answer : sklearn

How many new columns does the following command return?

Answer : iris_series = pd.get_dummies(iris['Species'])

Download the dataset from:


https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.

Answer : 3

Identify the command used to view the dataset SIZE and what is the value returned?Download the
dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.

Answer : iris.shape,(150,6) (Incorrect)

Which type of cross validation is used for imbalanced dataset?

Answer : K fold

To view the first 3 rows of the dataset, which of the following commands are used?Download the
dataset from:
https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f4876
08c537c05e22e4b221/iris.csv to answer the question.

Answer : iris.head(3)

Naive Bayes Algorithm is useful for :

Answer : indepth analysis


Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

A process used to identify data points that are simply unusual

Answer : Anomaly Detection

Is there a class imbalance problem in the given data set?

Answer : no

Which of the following is not a technique to process missing values?

Answer : One hot encoding

Images,documents are examples of

Answer : Unstructured Data

email spam detection is an example of

Answer : The count with unique values in the iris['species'] column

Choose the correct sequence for classifier building from the following:

Answer : Initialize -> Train -> Predict -> Evaluate

Imagine you have just finished training a decision tree for spam classication and it is showing
abnormal bad performance on both your training and test sets. Assume that your implementation
has no bugs. What could be reason for this problem.

Answer : All

Identify the structured data from the following.

Answer : Data from mySQL DB and Excel

True Negative is when the predicted instance and the actual is positive.

Answer : False

What does the command iris['species'].value_counts() return?Download the dataset


fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f
487608c537c05e22e4b221/iris.csv to answer the question.

Answer : The count with unique values in the iris['species'] column

A process used to identify unusual data points is _________

Answer : Anomaly Detection

The following are techniques to process missing values, except _______

Answer : of the options


Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

How many classes will the following command return?(target classes in the dataset) :
classes=list(iris['species'].unique())Download the dataset
fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f
487608c537c05e22e4b221/iris.csv to answer the question.

Answer : 3

Cross-validation causes over-fitting.

Answer : False

True Positive is when the predicted instance and the actual instance is not negative.

Answer : True

What kind of classification is our case study 'Churn Analysis'?

Answer : Binary

Which command is used to identify the unique values of a column?

Answer : unique()

Which preprocessing technique is used to make the data gaussian with zero mean and unit variance?

Answer : Standardisation

Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to
train the classifier and testing set to test the same.

Answer : True

What are the advantages of Naive Bayes?

Answer : Both the options

What kind of classification is the given case study (Iris dataset)?Download the dataset
fromhttps://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/d546eaee765268bf2f
487608c537c05e22e4b221/iris.csv to answer the question.

Answer : Binary classification (Incorrect)

Let's assume you are solving a classification problem with a highly imbalanced class.The majority
class is observed 99% of the time in the training data.Which of thefollowing is true when your model
has 99% accuracy after taking the predictions on test data?

Answer : For imbalanced class problems, the accuracy metric is not a good idea.

The cross-validation technique will provide accurate results when the training set and the testing set
are from two different populations.

Answer : False

You might also like