[go: up one dir, main page]

0% found this document useful (0 votes)
357 views26 pages

AI and ML For Business Antim Prahar WITH ANSWERS

Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection. •Cleaning in case of Missing values. •Cleaning noisy data, where noise is a random or variance error. •Cleaning with Data discrepancy detection and Data transformation tools. •Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common source. •Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
357 views26 pages

AI and ML For Business Antim Prahar WITH ANSWERS

Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection. •Cleaning in case of Missing values. •Cleaning noisy data, where noise is a random or variance error. •Cleaning with Data discrepancy detection and Data transformation tools. •Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common source. •Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ANTIM PRAHAR

The Most Important Questions


By
Dr. Anand Vyas
1 Framework for building ML Systems-KDD process
mode

• Features of machine learning framework


• Machine learning framework allows enterprises to deploy, manage,
and scale their machine learning portfolio. Algorithmia is the fastest
route to deployment, and makes it easy to securely govern machine
learning operations with a healthy ML lifecycle.
• With Algorithmia, you can connect your data and pre-trained models,
deploy and serve as APIs, manage your models and monitor
performance, and secure your machine learning portfolio as it scales.
Management
• Manage MLOps using access controls and governance features that secure and audit the machine
learning models you have in production.
• Split machine learning workflows into reusable, independent parts and pipeline them together
with a micro services architecture.
• Operate your ML portfolio from one, secure location to prevent work silos with a robust ML
management system.
• Protect your models with access control.
• Usage reporting allows you to gain full visibility into server use, model consumption, and call
details to control costs.
Scaling
• A properly scaled machine learning lifecycle scales on demand, operates at peak performance,
and continuously delivers value from one MLOps center.
• Serverless scaling allows you to scale models on demand without latency concerns, providing CPU
and GPU support .
• Reduce data security vulnerabilities by access controlling your model management system.
• Govern models and test model performance for speed, accuracy, and drift
• Multi-cloud flexibility provides the options to deploy on Algorithmia, the cloud, or on-prem to
keep models near data sources.
KDD Process Mode
• Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from
collection.
• Cleaning in case of Missing values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data transformation tools.
• Data Integration: Data integration is defined as heterogeneous data from multiple
sources combined in a common source.
• Data integration using Data Migration tools.
• Data integration using Data Synchronization tools.
• Data integration using ETL(Extract-Load-Transformation) process.
• Data Selection: Data selection is defined as the process where data relevant to the
analysis is decided and retrieved from the data collection.
• Data selection using Neural network.
• Data selection using Decision Trees.
• Data selection using Naive bayes.
• Data selection using Clustering, Regression, etc.
• Data Transformation: Data Transformation is defined as the process of transforming data into
appropriate form required by mining procedure.
• Data Transformation is a two step process:
• Data Mapping: Assigning elements from source base to destination to capture transformations.
• Code generation: Creation of the actual transformation program.
• Data Mining: Data mining is defined as clever techniques that are applied to extract patterns
potentially useful.
• Transforms task relevant data into patterns.
• Decides purpose of model using classification or characterization.
• Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing patterns
representing knowledge based on given measures.
• Find interestingness score of each pattern.
• Uses summarization and Visualization to make data understandable by user.
• Knowledge representation: Knowledge representation is defined as technique which utilizes
visualization tools to represent data mining results.
• Generate reports.
• Generate tables.
• Generate discriminant rules, classification rules, characterization rules, etc.
Connectivity
• Manage source code by pushing models into production directly from the
code repository
• Control data access by running models close to connectors and data
sources for optimal security
• Deploy models from wherever they are with seamless infrastructure
management
Deployment
• Machine learning models only achieve value once they reach production.
Efficient deployment capabilities reduce the time it takes your organization
to get a return on your ML investment.
• Deploy in any language and any format with flexible tooling capabilities.
• Serve models with a git push to a highly scalable API in seconds.
• Version models automatically with a framework that compares and
updates models while maintaining a dependable version for calls.
2 Data Science Vs Machine Learning
Data Science Vs Machine Learning
• Data Science and Machine Learning are closely related to each other
but have different functionalities and different goals. At a glance, Data
Science is a field to study the approaches to find insights from the raw
data. Whereas, Machine Learning is a technique used by the group of
data scientists to enable the machines to learn automatically from the
past data. To understand the difference in-depth, let’s first have a
brief introduction to these two technologies.

Data Science Machine Learning
It is used for making predictions and classifying the
It is used for discovering insights from the data.
result for new data points.

It deals with understanding and finding hidden It is a subfield of data science that enables the
patterns or useful insights from the data, which machine to learn from the past data and
helps to take smarter business decisions. experiences automatically.
It is a broad term that includes various steps to
It is used in the data modelling step of the data
create a model for a given problem and deploy
science as a complete process.
the model.
Machine Learning Engineer needs to have skills
A data scientist needs to have skills to use big
such as computer science fundamentals,
data tools like Hadoop, Hive and Pig, statistics,
programming skills in Python or R, statistics and
programming in Python, R, or Scala.
probability concepts, etc.
It can work with raw, structured, and
It mostly requires structured data to work on.
unstructured data.
ML engineers spend a lot of time for managing the
Data scientists spent lots of time in handling the
complexities that occur during the implementation
data, cleansing the data, and understanding its
of algorithms and mathematical concepts behind
patterns.
that.
3 Metrics for evaluating linear model, Multivariate regression
• Machine Learning is a branch of Artificial Intelligence. It contains many algorithms to solve various real-world problems. Building a
Machine learning model is not only the Goal of any data scientist but deploying a more generalized model is a target of every
Machine learning engineer.
• Regression is also one type of supervised Machine learning.
Regression
• Regression is a type of Machine learning which helps in finding the relationship between independent and dependent variable.
• In simple words, Regression can be defined as a Machine learning problem where we have to predict discrete values like price,
Rating, Fees, etc.
R Square/Adjusted R Square
• R Square measures how much variability in dependent variable can be explained by the model. It is the square of the Correlation
Coefficient(R) and that is why it is called R Square.
Mean Square Error (MSE)/Root Mean Square Error (RMSE)
• While R Square is a relative measure of how well the model fits dependent variables, Mean Square Error is an absolute measure of
the goodness for the fit.
Advantages of MSE
• The graph of MSE is differentiable, so you can easily use it as a loss function.

Disadvantages of MSE
• The value you get after calculating MSE is a squared unit of output. for example, the output variable is in meter(m) then after
calculating MSE the output we get is in meter squared.
• If you have outliers in the dataset then it penalizes the outliers most and the calculated MSE is bigger. So, in short, it is not Robust
to outliers which were an advantage in MAE.
Multivariate regression
• As the name implies, multivariate regression is a technique that
estimates a single regression model with more than one outcome
variable. When there is more than one predictor variable in a
multivariate regression model, the model is a multivariate multiple
regression.
• Multivariate Regression helps use to measure the angle of more than
one independent variable and more than one dependent variable. It
finds the relation between the variables (Linearly related).

4 Application of supervised learning in Solving
business problems
Pricing
Companies can mine their historical pricing data along with data sets on a host of other variables to understand how certain dynamics
from time of day to weather to the seasons impact demand for goods and services. Machine learning algorithms can learn from that
information and combine that insight with additional market and consumer data to help companies dynamically price their goods
based on those vast and numerous variables a strategy that ultimately helps companies maximize revenue.
Customer Relationship management
Sales performance. Is there a way to understand why one middle-level sales executive brings twice as much lead conversion than
another middle-level exec sitting in the same office? Technically, they both send emails, set calls, and participate in conferences,
which somehow result in conversions or lack thereof. Any time we talk about what drives salespeople performance, we make
assumptions prone to bias. A good example of ML use here is People.ai, a startup which tries to address the problem by tracking all
the sales data, including emails, calls, and CRM interactions to use this data as a supervised learning set and predict which kinds of
actions bring better results. Basically, the algorithm aids in developing a playbook for sales reps based on successful cases.

Sales and Marketing


Digital marketing and online-driven sales are the first application fields that you may think of for machine learning adoption. People
interact with the web and leave a detailed footprint to be analyzed. While there are tangible results in unsupervised learning
techniques for marketing and sales, the largest value impact is in the supervised learning field. Let’s have a look.

Lifetime Value.
5 Density Based Methods DBSCAN, OPTICS
• Density-Based Clustering method is one of the clustering methods based on density (local cluster
criterion), such as density-connected points. The basic ideas of density-based clustering involve a
number of new definitions. We intuitively present these definitions and then follow up with an
example.
• The neighborhood within a radius ε of a given object is called the ε-neighborhood of the object.
• Density Reachable:
• A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1, …,
pn, p1 = q, pn = p such that pi+1 is directly density-reachable from pi
• Density Connected
• A point p is density-connected to a point q wrt. Eps, MinPts if there is a point o such that both, p
and q are density-reachable from o wrt. Eps and MinPts.
• Working Of Density-Based Clustering
• Given a set of objects, D’ we say that an object p is directly density-reachable from object q if p is
within the ε-neighborhood of q, and q is a core object.
• Major features:
• It is used to discover clusters of arbitrary shape.
• It is also used to handle noise in the data clusters.
• It is a one scan method.
• It needs density parameters as a termination condition.
• DBSCAN
• It relies on a density-based notion of cluster: A cluster is defined as a
maximal set of density-connected points.
• It discovers clusters of arbitrary shape in spatial databases with noise.
• DBSCAN Algorithm
• Arbitrary select a point p.
• Retrieve all points density-reachable from p wrt Eps and MinPts
• If p is a core point, a cluster is formed.
• If p is a border point, no points are density-reachable from p and DBSCAN
visits the next point of the database.
• Continue the process until all of the points have been processed.
6 Association rules: Introduction, Large Item
sets, Apriori Algorithms and applications
• Association rule learning is a type of unsupervised learning technique that checks for the dependency of one
data item on another data item and maps accordingly so that it can be more profitable. It tries to find some
interesting relations or associations among the variables of dataset. It is based on different rules to discover
the interesting relations between variables in the database.
• Association rule learning is a rule-based machine learning method for discovering interesting relations
between variables in large databases. It is intended to identify strong rules discovered in databases using
some measures of interestingness.
Apriori
• This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases
that contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset
efficiently.
• It is mainly used for market basket analysis and helps to understand the products that can be bought
together. It can also be used in the healthcare field to find drug reactions for patients.
• Association rule learning works on the concept of If and Else Statement, such as if A then B.
• If A -> Then B
• Here the If element is called antecedent, and then statement is called as Consequent. These types of
relationships where we can find out some association or relation between two items is known as single
cardinality. It is all about creating rules, and if the number of items increases, then cardinality also increases
accordingly. So, to measure the associations between thousands of data items, there are several metrics.
Apriori algorithm
• Apriori algorithm refers to an algorithm that is used in mining frequent products sets and
relevant association rules. Generally, the apriori algorithm operates on a database containing a
huge number of transactions. For example, the items customers but at a Big Bazar.

7 Types of layers (Convolutional Layers, Activation
function, Pooling, Fully connected)
Convolutional Layers
Convolutional layers are the major building blocks used in convolutional neural networks. A convolution is the simple
application of a filter to an input that results in an activation. Repeated application of the same filter to an input result in
a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as
an image.
Activation function
Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further
adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.
Neural network has neurons that work in correspondence of weight, bias and their respective activation function. In a
neural network, we would update the weights and biases of the neurons on the basis of the error at the output.
Pooling Layer
The pooling or downsampling layer is responsible for reducing the spacial size of the activation maps. In general, they are
used after multiple stages of other layers (i.e. convolutional and non-linearity layers) in order to reduce the
computational requirements progressively through the network as well as minimizing the likelihood of overfitting.
Fully connected
Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional
multi-layer perceptron neural network (MLP). The flattened matrix goes through a fully connected layer to classify the
images. Fully connected neural networks (FCNNs) are a type of artificial neural network where the architecture is such
that all the nodes, or neurons, in one layer are connected to the neurons in the next layer.
8 DEEP LEARANING
Q-learning is a model-free, off-policy reinforcement learning that will find the best course of action,
given the current state of the agent. Depending on where the agent is in the environment, it will
decide the next action to be taken
• Deep learning (also known as deep structured learning) is part of a broader family of machine
learning methods based on artificial neural networks with representation learning. Learning can
be supervised, semi-supervised or unsupervised.
• Deep-learning architectures such as deep neural networks, deep belief networks, deep
reinforcement learning, recurrent neural networks and convolutional neural networks have been
applied to fields including computer vision, speech recognition, natural language processing,
machine translation, bioinformatics, drug design, medical image analysis, material inspection and
board game programs, where they have produced results comparable to and in some cases
surpassing human expert performance.
• The word “Deep” in “Deep learning” refers to the number of layers through which the data is
transformed. More precisely, deep learning systems have a substantial credit assignment path
(CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe
potentially causal connections between input and output. For a feedforward neural network, the
depth of the CAPs is that of the network and is the number of hidden layers plus one (as the
output layer is also parameterized)
• Architectures:
• Deep Neural Network: It is a neural network with a certain level of complexity (having multiple
hidden layers in between input and output layers). They are capable of modeling and processing
non-linear relationships.
• Deep Belief Network (DBN): It is a class of Deep Neural Network. It is multi-layer belief networks.
Steps for performing DBN:
• Learn a layer of features from visible units using Contrastive Divergence algorithm.
• Treat activations of previously trained features as visible units and then learn features of features.
• Finally, the whole DBN is trained when the learning for the final hidden layer is achieved.

Applications:
• Healthcare: Helps in diagnosing various diseases and treating it.
• Automatic Text Generation: Corpus of text is learned and from this model new text is generated,
word-by-word or character-by-character. Then this model is capable of learning how to spell,
punctuate, form sentences, or it may even capture the style.
• Automatic Machine Translation: Certain words, sentences or phrases in one language is
transformed into another language (Deep Learning is achieving top results in the areas of text,
images).
9 Q Learning: Q Learning function, Q Learning
Algorithm
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action
in a particular state. It does not require a model of the environment (hence “model-free”),
and it can handle problems with stochastic transitions and rewards without requiring
adaptations.

For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the
sense of maximizing the expected value of the total reward over any and all successive
steps, starting from the current state. Q-learning can identify an optimal action-selection
policy for any given FMDP, given infinite exploration time and a partly-random policy. “Q”
refers to the function that the algorithm computes the expected rewards for an action
taken in a given state.

Reinforcement learning involves an agent, a set of states S, and a set A of actions per state.
By performing an action a€A, the agent transitions from state to state. Executing an action
in a specific state provides the agent with a reward (a numerical score).
Q Learning Algorithm
• Q-learning is a model-free reinforcement learning algorithm.
• Q-learning is a values-based learning algorithm. Value based
algorithms updates the value function based on an equation
(particularly Bellman equation). Whereas the other type, policy-based
estimates the value function with a greedy policy obtained from the
last policy improvement.
Q-Table
• Q-Table is the data structure used to calculate the maximum expected future rewards for action at each
state. Basically, this table will guide us to the best action at each state. To learn each value of the Q-table, Q-
Learning algorithm is used.

• Step 1: initialize the Q-Table


• We will first build a Q-table. There are n columns, where n= number of actions. There are m rows, where m=
number of states. We will initialise the values at 0.

• Steps 2 and 3: choose and perform an action


• This combination of steps is done for an undefined amount of time. This means that this step runs until the
time we stop the training, or the training loop stops as defined in the code. We will choose an action (a) in
the state (s) based on the Q-Table. But, as mentioned earlier, when the episode initially starts, every Q-value
is 0.

• Steps 4 and 5: evaluate


• Now we have taken an action and observed an outcome and reward. We need to update the function Q(s,a).
10 Reinforcement Learning, Learning Task,
• Reinforcement learning is an area of machine learning concerned
with how intelligent agents ought to take actions in an environment in
order to maximize the notion of cumulative reward. Reinforcement
learning is one of three basic machine learning paradigms, alongside
supervised learning and unsupervised learning.

You might also like