AI Notes
AI Notes
NEP- SEC
FOURTHE SEM BBA
Prepared By
Dr. Subbulakshmi. S
Assitant Professor
UNIT -1
AZURE AI FUNDAMENTALS
Artificial Intelligence
Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are
programmed to think like humans and mimic their actions. The term may also be applied to
any machine that exhibits traits associated with a human mind such as learning and problem-
solving. The ideal characteristic of artificial intelligence is its ability to rationalize and take
actions that have the best chance of achieving a specific goal.
A subset of artificial intelligence is machine learning (ML), which refers to the concept that
computer programs can automatically learn from and adapt to new data without being
assisted by humans.
1
AI Dr. Subbulakshmi
healthcare. Weak AI tends to be simple and single-task oriented, while strong AI carries on
tasks that are more complex and human-like.
Types of AI
Weak artificial intelligence embodies a system designed to carry out one particular job. Weak
AI systems include video games and personal assistants such as Amazon's Alexa and Apple's
Siri.
Strong artificial intelligence systems are systems that carry on the tasks considered to be
human-like. These tend to be more complex and complicated systems. They are programmed
to handle situations in which they may be required to problem solve without having a person
intervene. These kinds of systems can be found in applications like self-driving cars or in
hospital operating rooms.
AI in Business Management
• Spam filters
• Smart email categorisation
• Voice to text features
• smart personal assistants, such as Siri, and Google Now
• automated responders and online customer support
• process automation
• sales and business forecasting
• security surveillance
• Automated insights, especially for data-driven industries (Eg financial services or e-
commerce)
• Smart searches and relevance features
• personalisation as a service
• product recommendations and purchase predictions
• Fraud detection and prevention for online transactions
• Dynamic price optimisation
AI in Marketing
2
AI Dr. Subbulakshmi
• Recommendations and content curation
• personalisation of news feeds
• Pattern and Image recognition
• language recognition - to digest unstructured data from customers and sales prospects
• Ad targeting and optimised, real-time bidding
• customer Segmentation
• social semantics and sentiment analysis
• automated web design
• predictive customer service
What is AI?
AI is the creation of software that imitates human behaviours and capabilities. Key workloads
include:
• Machine learning - This is often the foundation for an AI system, and is the way we
"teach" a computer model to make prediction and draw conclusions from data.
• Anomaly detection - The capability to automatically detect errors or unusual activity
in a system.
• Computer vision - The capability of software to interpret the world visually through
cameras, video, and images.
• Natural language processing - The capability for a computer to interpret written or
spoken language, and respond in kind.
• Knowledge mining - The capability to extract information from large volumes of
often unstructured data to create a searchable knowledge store.
Machine Learning
How ML works?
The answer is, from data. In today's world, we create huge volumes of data as we go
about our everyday lives. From the text messages, emails, and social media posts we send
to the photographs and videos we take on our phones, we generate massive amounts of
information. More data still is created by millions of sensors in our homes, cars, cities,
public transport infrastructure, and factories.
Data scientists can use all of that data to train machine learning models that can make
predictions and inferences based on the relationships they find in the data.
3
AI Dr. Subbulakshmi
For example, suppose an environmental conservation organization wants volunteers to
identify and catalogue different species of wildflower using a phone app. The following
animation shows how machine learning can be used to enable this scenario.
Feature Capability
Anomaly Detection
• Imagine you're creating a software system to monitor credit card transactions and
detect unusual usage patterns that might indicate fraud.
4
AI Dr. Subbulakshmi
• Or an application that tracks activity in an automated production line and identifies
failures.
• Or a racing car telemetry system that uses sensors to proactively warn engineers about
potential mechanical failures before they happen.
• These kinds of scenario can be addressed by using anomaly detection - a machine
learning based technique that analyzes data over time and identifies unusual changes.
• Let's explore how anomaly detection might help in the racing car scenario
Computer Vision
Computer Vision is an area of AI that deals with visual processing. Let's explore some of
the possibilities that computer vision brings.
The Seeing AI app is a great example of the power of computer vision. Designed for the
blind and low vision community, the Seeing AI app harnesses the power of AI to open
up the visual world and describe nearby people, text and objects.
Most computer vision solutions are based on machine learning models that can be
applied to visual input from cameras, videos, or images.
5
AI Dr. Subbulakshmi
6
AI Dr. Subbulakshmi
7
AI Dr. Subbulakshmi
Computer Vision Services in Microsoft Azure
Service Capabilities
Computer Vision You can use this service to analyze images and video, and extract
descriptions, tags, objects, and text.
Custom Vision Use this service to train custom image classification and object detection
models using your own images.
Face The Face service enables you to build face detection and facial
recognition solutions.
Form Recognizer Use this service to extract information from scanned forms and invoices.
8
AI Dr. Subbulakshmi
•
In Microsoft Azure, you can use the following cognitive services to build natural language
processing solutions.
Service Capabilities
Language Use this service to access features for understanding and analyzing text,
training language models that can understand spoken or text-based commands,
and building intelligent applications.
Translator Use this service to translate text between more than 60 languages.
Speech Use this service to recognize and synthesize speech, and to translate spoken
languages.
Azure Bot This service provides a platform for conversational AI, the capability of a
software "agent" to participate in a conversation. Developers can use the Bot
Framework to create a bot and manage it with Azure Bot Service - integrating
back-end services like Language, and connecting to channels for web chat,
email, Microsoft Teams, and others.
Knowledge Mining
• Knowledge mining is the term used to describe solutions that involve extracting
information from large volumes of often unstructured data to create a searchable
knowledge store.
• Azure Cognitive Search can utilize the built-in AI capabilities of Azure Cognitive
Services such as image processing, content extraction, and natural language processing to
perform knowledge mining of documents.
Bias can affect results A loan-approval model discriminates by gender due to bias in the
data with which it was trained
9
AI Dr. Subbulakshmi
Errors may cause harm An autonomous vehicle experiences a system failure and causes a
collision
Data could be exposed A medical diagnostic bot is trained using sensitive patient data,
which is stored insecurely
Solutions may not work for A home automation assistant provides no audio output for visually
everyone impaired users
Users must trust a complex An AI-based financial tool makes investment recommendations -
system what are they based on?
Who's liable for AI-driven An innocent person is convicted of a crime based on evidence
decisions? from facial recognition – who's responsible?
10
AI Dr. Subbulakshmi
to rigorous testing and deployment management processes to ensure that they work as
expected before release.
4. Inclusiveness
AI systems should empower everyone and engage people. AI should bring benefits to all
parts of society, regardless of physical ability, gender, sexual orientation, ethnicity, or other
factors.
5. Transparency
AI systems should be understandable. Users should be made fully aware of the purpose of the
system, how it works, and what limitations may be expected.
6. Accountability
People should be accountable for AI systems. Designers and developers of AI-based
solutions should work within a framework of governance and organizational principles that
ensure the solution meets ethical and legal standards that are clearly defined.
11
AI Dr. Subbulakshmi
• The specific operation that the f function performs on x to calculate y depends on a
number of factors, including the type of model you're trying to create and the specific
algorithm used to train the model.
Types
Supervised Unsupervised
ML ML
12
AI Dr. Subbulakshmi
Azure Machine Learning Studio
Azure Machine Learning is a cloud-based service that helps simplify some of the tasks it
takes to prepare data, train a model, and deploy a predictive service.
• Prepare data
• Train a model
• Review Performance
• deploy a predictive service.
13
AI Dr. Subbulakshmi
Azure Machine Learning studio is a web portal for machine learning solutions in Azure. It
includes a wide range of features and capabilities that help data scientists prepare data, train
models, publish predictive services, and monitor their usage. To begin using the web portal,
you need to assign the workspace you created in the Azure portal to Azure Machine Learning
studio
Compute targets are cloud-based resources on which you can run model training and data
exploration processes.
In Azure Machine Learning studio, you can manage the compute targets for your data science
activities. There are four kinds of compute resource you can create:
Compute Instances: Development workstations that data scientists can use to work
with data and models.
Compute Clusters: Scalable clusters of virtual machines for on-demand processing of
experiment code.
Inference Clusters: Deployment targets for predictive services that use your trained
models.
Attached Compute: Links to existing Azure compute resources, such as Virtual
Machines or Azure Databricks clusters.
Automated machine learning allows you to train models without extensive data science or
programming knowledge. For people with a data science and programming background, it
provides a way to save time and resources by automating algorithm selection and
hyperparameter tuning.
You can create an automated machine learning job in Azure Machine Learning studio.
14
AI Dr. Subbulakshmi
In Azure Machine Learning, operations that you run are called jobs. You can configure
multiple settings for your job before starting an automated machine learning run. The run
configuration provides the information needed to specify your training script, compute target,
and Azure ML environment in your run configuration and run a training job.
1. Prepare data: Identify the features and label in a dataset. Pre-process, or clean and
transform, the data as needed.
15
AI Dr. Subbulakshmi
2. Train model: Split the data into two groups, a training and a validation set. Train a
machine learning model using the training data set. Test the machine learning model
for performance using the validation data set.
3. Evaluate performance: Compare how close the model's predictions are to the known
labels.
4. Deploy a predictive service: After you train a machine learning model, you can deploy
the model as an application on a server or device so that others can use it.
These are the same steps in the automated machine learning process with Azure Machine
Learning.
Prepare data
Machine learning models must be trained with existing data. Data scientists expend a lot of
effort exploring and pre-processing data, and trying various types of model-training
algorithms to produce accurate models, which is time consuming, and often makes inefficient
use of expensive compute hardware.
In Azure Machine Learning, data for model training and other operations is usually
encapsulated in an object called a dataset. You can create your own dataset in Azure Machine
Learning studio.
Train model
The automated machine learning capability in Azure Machine Learning
supports supervised machine learning models - in other words, models for which the training
data includes known label values. You can use automated machine learning to train models
for:
16
AI Dr. Subbulakshmi
Regression (predicting numeric values)
Time series forecasting (predicting numeric values at a future point in time)
In Automated Machine Learning, you can select configurations for the primary metric, type
of model used for training, exit criteria, and concurrency limits.
Importantly, AutoML will split data into a training set and a validation set. You can configure
the details in the settings before you run the job.
Evaluate performance
After the job has finished you can review the best performing model. In this case, you used
exit criteria to stop the job. Thus the "best" model the job generated might not be the best
possible model, just the best one found within the time allowed for this exercise.
The best model is identified based on the evaluation metric you specified, Normalized root
mean squared error.
A technique called cross-validation is used to calculate the evaluation metric. After the model
is trained using a portion of the data, the remaining portion is used to iteratively test, or cross-
validate, the trained model. The metric is calculated by comparing the predicted value from
the test with the actual known value, or label.
The difference between the predicted and actual value, known as the residuals, indicates the
amount of error in the model. The performance metric root mean squared error (RMSE), is
calculated by squaring the errors across all of the test cases, finding the mean of these
squares, and then taking the square root. What all of this means is that smaller this value is,
the more accurate the model's predictions. The normalized root mean squared
error (NRMSE) standardizes the RMSE metric so it can be used for comparison between
models which have variables on different scales.
The Residual Histogram shows the frequency of residual value ranges. Residuals represent
variance between predicted and true values that can't be explained by the model, in other
words, errors. You should hope to see the most frequently occurring residual values clustered
around zero. You want small errors with fewer errors at the extreme ends of the scale.
The Predicted vs. True chart should show a diagonal trend in which the predicted value
correlates closely to the true value. The dotted line shows how a perfect model should
perform. The closer the line of your model's average predicted value is to the dotted line, the
better its performance. A histogram below the line chart shows the distribution of true values.
After you've used automated machine learning to train some models, you can deploy the best
performing model as a service for client applications to use.
17
AI Dr. Subbulakshmi
Deploy a predictive service
In Azure Machine Learning, you can deploy a service as an Azure Container Instances (ACI)
or to an Azure Kubernetes Service (AKS) cluster. For production scenarios, an AKS
deployment is recommended, for which you must create an inference cluster compute target.
In this exercise, you'll use an ACI service, which is a suitable deployment target for testing,
and does not require you to create an inference cluster.
Each designer project, known as a pipeline, has a left panel for navigation and a canvas on
your right hand side. To use designer, identify the building blocks, or components, needed for
your model, place and connect them on your canvas, and run a machine learning job.
Pipelines
Pipelines let you organize, manage, and reuse complex machine learning workflows across
projects and users. A pipeline starts with the dataset from which you want to train the model.
Each time you run a pipeline, the configuration of the pipeline and its results are stored in
your workspace as a pipeline job.
Components
An Azure Machine Learning component encapsulates one step in a machine learning
pipeline. You can think of a component as a programming function and as a building block
for Azure Machine Learning pipelines. In a pipeline project, you can access data assets and
components from the left panel's Asset Library tab.
Datasets
You can create data assets on the Data page from local files, a datastore, web files, and Open
Datasets. These data assets will appear along with standard sample datasets
in designer's Asset Library.
18
AI Dr. Subbulakshmi
Azure Machine Learning Jobs
An Azure Machine Learning (ML) job executes a task against a specified compute target.
Jobs enable systematic tracking for your machine learning experimentation and workflows.
Once a job is created, Azure ML maintains a run record for the job. All of your jobs' run
records can be viewed in Azure ML studio.
In your designer project, you can access the status of a pipeline job using the Submitted
jobs tab on the left pane.
You can find all the jobs you have run in a workspace on the Jobs page.
1. Prepare data: Identify the features and label in a dataset. Pre-process, or clean and
transform, the data as needed.
2. Train model: Split the data into two groups, a training and a validation set. Train a
machine learning model using the training data set. Test the machine learning model
for performance using the validation data set.
3. Evaluate performance: Compare how close the model's predictions are to the known
labels.
4. Deploy a predictive service: After you train a machine learning model, you need to
convert the training pipeline into a real-time inference pipeline. Then you can deploy
the model as an application on a server or device so that others can use it.
Prepare data
Azure machine learning designer has several pre-built components that can be used to
prepare data for training. These components enable you to clean data, normalize features, join
tables, and more.
Train model
To train a regression model, you need a dataset that includes historical features,
characteristics of the entity for which you want to make a prediction, and known label values.
The label is the quantity you want to train a model to predict.
19
AI Dr. Subbulakshmi
It's common practice to train the model using a subset of the data, while holding back some
data with which to test the trained model. This enables you to compare the labels that the
model predicts with the actual known labels in the original dataset.
You will use designer's Score Model component to generate the predicted class label value.
Once you connect all the components, you will want to run an experiment, which will use the
data asset on the canvas to train and score a model.
Evaluate performance
After training a model, it is important to evaluate its performance. There are many
performance metrics and methodologies for evaluating how well a model makes predictions.
You can review evaluation metrics on the completed job page by right-clicking on
the Evaluate model component.
Mean Absolute Error (MAE): The average difference between predicted values and
true values. This value is based on the same units as the label, in this case dollars. The
lower this value is, the better the model is predicting.
Root Mean Squared Error (RMSE): The square root of the mean squared difference
between predicted and true values. The result is a metric based on the same unit as
the label (dollars). When compared to the MAE (above), a larger difference indicates
greater variance in the individual errors (for example, with some errors being very
small, while others are large).
Relative Squared Error (RSE): A relative metric between 0 and 1 based on the square
of the differences between predicted and true values. The closer to 0 this metric is,
the better the model is performing. Because this metric is relative, it can be used to
compare models where the labels are in different units.
Relative Absolute Error (RAE): A relative metric between 0 and 1 based on the
absolute differences between predicted and true values. The closer to 0 this metric is,
the better the model is performing. Like RSE, this metric can be used to compare
models where the labels are in different units.
Coefficient of Determination (R2): This metric is more commonly referred to as R-
Squared, and summarizes how much of the variance between predicted and true
values is explained by the model. The closer to 1 this value is, the better the model is
performing.
20
AI Dr. Subbulakshmi
Inference pipeline
To deploy your pipeline, you must first convert the training pipeline into a real-time inference
pipeline. This process removes training components and adds web service inputs and outputs
to handle requests.
The inference pipeline performs the same data transformations as the first pipeline
for new data. Then it uses the trained model to infer, or predict, label values based on its
features. This model will form the basis for a predictive service that you can publish for
applications to use.
You can create an inference pipeline by selecting the menu above a completed job.
Deployment
After creating the inference pipeline, you can deploy it as an endpoint. In the endpoints page,
you can view deployment details, test your pipeline service with sample data, and find
credentials to connect your pipeline service to a client application.
It will take a while for your endpoint to be deployed. The Deployment state on the Details tab
will indicate Healthy when deployment is successful.
On the Test tab, you can test your deployed service with sample data in a JSON format. The
test tab is a tool you can use to quickly check to see if your model is behaving as expected.
Typically it is helpful to test the service before connecting it to an application.
You can find credentials for your service on the Consume tab. These credentials are used to
connect your trained machine learning model as a service to a client application.
2. Computer Vision
Computer vision is one of the core areas of artificial intelligence (AI), and focuses on creating so
lutions that enable AI applications to "see" the world and make sense of it.
Of course, computers don't have biological eyes that work the way ours do, but they are capable of
processing images; either from a live camera feed or from digital photographs or videos. This ability
to process images is the key to creating software that can emulate human visual perception.
Content Organization: Identify people or objects in photos and organize them based on that
identification. Photo recognition applications like this are commonly used in photo storage
and social media applications.
21
AI Dr. Subbulakshmi
Text Extraction: Analyze images and PDF documents that contain text and extract the text into
a structured format.
Spatial Analysis: Identify people or objects, such as cars, in a space and map their movement
within that space.
To an AI application, an image is just an array of pixel values. These numeric values can be used
as features to train machine learning models that make predictions about the image and its
contents.
To use the Computer Vision service, you need to create a resource for it in your Azure subscription.
You can use either of the following resource types:
Computer Vision: A specific resource for the Computer Vision service. Use this resource type if
you don't intend to use any other cognitive services, or if you want to track utilization and
costs for your Computer Vision resource separately.
Cognitive Services: A general cognitive services resource that includes Computer Vision along
with many other cognitive services; such as Text Analytics, Translator Text, and others. Use
this resource type if you plan to use multiple cognitive services and want to simplify
administration and development.
Whichever type of resource you choose to create, it will provide two pieces of information that you
will need to use it:
After you've created a suitable resource in your subscription, you can submit images to the Computer
Vision service to perform a wide range of analytical tasks.
Describing an image
Computer Vision has the ability to analyze an image, evaluate the objects that are detected, and
generate a human-readable phrase or sentence that can describe what was detected in the image.
Depending on the image contents, the service may return multiple results, or phrases. Each returned
phrase will have an associated confidence score, indicating how confident the algorithm is in the
supplied description. The highest confidence phrases will be listed first.
To help you understand this concept, consider the following image of the Empire State building in
New York. The returned phrases are listed below the image in the order of confidence.
22
AI Dr. Subbulakshmi
Tagging visual features
The image descriptions generated by Computer Vision are based on a set of thousands of recognizable
objects, which can be used to suggest tags for the image. These tags can be associated with the image
as metadata that summarizes attributes of the image; and can be particularly useful if you want to
index an image along with a set of key terms that might be used to search for images with specific
attributes or contents.
For example, the tags returned for the Empire State building image include:
skyscraper
tower
building
Detecting objects
The object detection capability is similar to tagging, in that the service can identify common objects;
but rather than tagging, or providing tags for the recognized objects only, this service can also return
what is known as bounding box coordinates. Not only will you get the type of object, but you will
also receive a set of coordinates that indicate the top, left, width, and height of the object detected,
which you can use to identify the location of the object in the image, like this:
Detecting brands
This feature provides the ability to identify commercial brands. The service has an existing database
of thousands of globally recognized logos from commercial brands of products.
When you call the service and pass it an image, it performs a detection task and determine if any of
the identified objects in the image are recognized brands. The service compares the brands against its
database of popular brands spanning clothing, consumer electronics, and many more categories. If a
known brand is detected, the service returns a response that contains the brand name, a confidence
score (from 0 to 1 indicating how positive the identification is), and a bounding box (coordinates) for
where in the image the detected brand was found.
For example, in the following image, a laptop has a Microsoft logo on its lid, which is identified and
located by the Computer Vision service.
Detecting faces
The Computer Vision service can detect and analyze human faces in an image, including the ability to
determine age and a bounding box rectangle for the location of the face(s). The facial analysis
capabilities of the Computer Vision service are a subset of those provided by the dedicated Face
Service. If you need basic face detection and analysis, combined with general image analysis
capabilities, you can use the Computer Vision service; but for more comprehensive facial analysis and
facial recognition functionality, use the Face service.
The following example shows an image of a person with their face detected and approximate age
estimated.
23
AI Dr. Subbulakshmi
Categorizing an image
Computer Vision can categorize images based on their contents. The service uses a parent/child
hierarchy with a "current" limited set of categories. When analyzing an image, detected objects are
compared to the existing categories to determine the best way to provide the categorization. As an
example, one of the parent categories is people_. This image of a person on a roof is assigned a
category of people_.
A slightly different categorization is returned for the following image, which is assigned to the
category people_group because there are multiple people in the image:
When categorizing an image, the Computer Vision service supports two specialized domain models:
Celebrities - The service includes a model that has been trained to identify thousands of well-
known celebrities from the worlds of sports, entertainment, and business.
Landmarks - The service can identify famous landmarks, such as the Taj Mahal and the Statue
of Liberty.
For example, when analyzing the following image for landmarks, the Computer Vision service
identifies the Eiffel Tower, with a confidence of 99.41%.
The Computer Vision service can use optical character recognition (OCR) capabilities to detect
printed and handwritten text in images. This capability is explored in the Read text with the Computer
Vision service module on Microsoft Learn.
Additional capabilities
Detect image types - for example, identifying clip art images or line drawings.
Detect image color schemes - specifically, identifying the dominant foreground, background,
and overall colors in an image.
Generate thumbnails - creating small versions of images.
Moderate content - detecting images that contain adult content or depict violent, gory scenes.
24
AI Dr. Subbulakshmi
Product identification: performing visual searches for specific products in online searches or
even, in-store using a mobile device.
Disaster investigation: identifying key infrastructure for major disaster preparation efforts.
For example, identifying bridges and roads in aerial images can help disaster relief teams plan
ahead in regions that are not well mapped.
Medical diagnosis: evaluating images from X-ray or MRI devices could quickly classify specific
issues found as cancerous tumors, or many other medical conditions related to medical
imaging diagnosis.
Image classification is a machine learning technique in which the object being classified is an image,
such as a photograph.
To create an image classification model, you need data that consists of features and their labels. The
existing data is a set of categorized images. Digital images are made up of an array of pixel values,
and these are used as features to train the model based on the known image classes.
The model is trained to match the patterns in the pixel values to a set of class labels. After the model
has been trained, you can use it with new sets of features to predict unknown label values.
Most modern image classification solutions are based on deep learning techniques that make use
of convolutional neural networks (CNNs) to uncover patterns in the pixels that correspond to
particular classes. Training an effective CNN is a complex task that requires considerable expertise in
data science and machine learning.
Common techniques used to train image classification models have been encapsulated into
the Custom Vision cognitive service in Microsoft Azure; making it easy to train a model and publish it
as a software service with minimal knowledge of deep learning techniques. You can use the Custom
Vision cognitive service to train image classification models and deploy them as services for
applications to use.
You can perform image classification using the Custom Vision service, available as part of the Azure
Cognitive Services offerings. This is generally easier and quicker than writing your own model
training code, and enables people with little or no machine learning expertise to create an effective
image classification solution.
25
AI Dr. Subbulakshmi
Creating an image classification solution with Custom Vision consists of two main tasks. First you
must use existing images to train the model, and then you must publish the model so that client
applications can use it to generate predictions.
For each of these tasks, you need a resource in your Azure subscription. You can use the following
types of resource:
Custom Vision: A dedicated resource for the custom vision service, which can be training,
a prediction, or both resources.
Cognitive Services: A general cognitive services resource that includes Custom Vision along
with many other cognitive services. You can use this type of resource for training, prediction,
or both.
The separation of training and prediction resources is useful when you want to track resource
utilization for model training separately from client applications using the model to predict image
classes. However, it can make development of an image classification solution a little confusing.
The simplest approach is to use a general Cognitive Services resource for both training and prediction.
This means you only need to concern yourself with one endpoint (the HTTP address at which your
service is hosted) and key (a secret value used by client applications to authenticate themselves).
It's also possible to take a mix-and-match approach in which you use a dedicated Custom Vision
resource for training, but deploy your model to a Cognitive Services resource for prediction. For this
to work, the training and prediction resources must be created in the same region.
Model training
To train a classification model, you must upload images to your training resource and label them with
the appropriate class labels. Then, you must train the model and evaluate the training results.
You can perform these tasks in the Custom Vision portal, or if you have the necessary coding
experience you can use one of the Custom Vision service programming language-specific software
development kits (SDKs).
One of the key considerations when using images for classification, is to ensure that you have
sufficient images of the objects in question and those images should be of the object from many
different angles.
Model evaluation
Model training process is an iterative process in which the Custom Vision service repeatedly trains the
model using some of the data, but holds some back to evaluate the model. At the end of the training
process, the performance for the trained model is indicated by the following evaluation metrics:
Precision: What percentage of the class predictions made by the model were correct? For
example, if the model predicted that 10 images are oranges, of which eight were actually
oranges, then the precision is 0.8 (80%).
26
AI Dr. Subbulakshmi
Recall: What percentage of class predictions did the model correctly identify? For example, if
there are 10 images of apples, and the model found 7 of them, then the recall is 0.7 (70%).
Average Precision (AP): An overall metric that takes into account both precision and recall).
After you've trained the model, and you're satisfied with its evaluated performance, you can publish
the model to your prediction resource. When you publish the model, you can assign it a name (the
default is "IterationX", where X is the number of times you have trained the model).
To use your model, client application developers need the following information:
Project ID: The unique ID of the Custom Vision project you created to train the model.
Model name: The name you assigned to the model during publishing.
Prediction endpoint: The HTTP address of the endpoints for the prediction resource to which
you published the model (not the training resource).
Prediction key: The authentication key for the prediction resource to which you published the
model (not the training resource).
Object Detection
What is object detection?
An object detection model might be used to identify the individual objects in this image and return the
following information:
You can create an object detection machine learning model by using advanced deep learning
techniques. However, this approach requires significant expertise and a large volume of training data.
The Custom Vision cognitive service in Azure enables you to create object detection models that
meet the needs of many computer vision scenarios with minimal deep learning expertise and fewer
training images.
Creating an object detection solution with Custom Vision consists of three main tasks. First you must
use upload and tag images, then you can train the model, and finally you must publish the model so
that client applications can use it to generate predictions.
27
AI Dr. Subbulakshmi
For each of these tasks, you need a resource in your Azure subscription. You can use the following
types of resource:
Custom Vision: A dedicated resource for the custom vision service, which can be either
a training, a prediction or a both resource.
Cognitive Services: A general cognitive services resource that includes Custom Vision along
with many other cognitive services. You can use this type of resource for training, prediction,
or both.
The separation of training and prediction resources is useful when you want to track resource
utilization for model training separately from client applications using the model to predict image
classes. However, it can make development of an image classification solution a little confusing.
The simplest approach is to use a general Cognitive Services resource for both training and prediction.
This means you only need to concern yourself with one endpoint (the HTTP address at which your
service is hosted) and key (a secret value used by client applications to authenticate themselves).
It's also possible to take a mix-and-match approach in which you use a dedicated Custom Vision
resource for training, but deploy your model to a Cognitive Services resource for prediction. For this
to work, the training and prediction resources must be created in the same region.
Image tagging
Before you can train an object detection model, you must tag the classes and bounding box
coordinates in a set of training images. This process can be time-consuming, but the Custom Vision
portal provides a graphical interface that makes it straightforward. The interface will automatically
suggest areas of the image where discrete objects are detected, and you can apply a class label to these
suggested bounding boxes or drag to adjust the bounding box area. Additionally, after tagging and
training with an initial dataset, the Computer Vision service can use smart tagging to suggest classes
and bounding boxes for images you add to the training dataset.
Key considerations when tagging training images for object detection are ensuring that you have
sufficient images of the objects in question, preferably from multiple angles; and making sure that the
bounding boxes are defined tightly around each object.
To train the model, you can use the Custom Vision portal, or if you have the necessary coding
experience you can use one of the Custom Vision service programming language-specific software
development kits (SDKs). Training an object detection model can take some time, depending on the
number of training images, classes, and objects within each image.
Model training process is an iterative process in which the Custom Vision service repeatedly trains the
model using some of the data, but holds some back to evaluate the model. At the end of the training
process, the performance for the trained model is indicated by the following evaluation metrics:
28
AI Dr. Subbulakshmi
Precision: What percentage of class predictions did the model correctly identify? For example,
if the model predicted that 10 images are oranges, of which eight were actually oranges, then
the precision is 0.8 (80%).
Recall: What percentage of the class predictions made by the model were correct? For
example, if there are 10 images of apples, and the model found 7 of them, then the recall is
0.7 (70%).
Mean Average Precision (mAP): An overall metric that takes into account both precision and
recall across all classes.
After you've trained the model, and you're satisfied with its evaluated performance, you can publish
the model to your prediction resource. When you publish the model, you can assign it a name (the
default is "IterationX", where X is the number of times you have trained the model).
To use you model, client application developers need the following information:
Project ID: The unique ID of the Custom Vision project you created to train the model.
Model name: The name you assigned to the model during publishing.
Prediction endpoint: The HTTP address of the endpoints for the prediction resource to which
you published the model (not the training resource).
Prediction key: The authentication key for the prediction resource to which you published the
model (not the training resource).
Face Detection
Introduction
Face detection and analysis is an area of artificial intelligence (AI) in which we use algorithms to
locate and analyze human faces in images or video content.
Face detection
Face detection involves identifying regions of an image that contain a human face, typically by
returning bounding box coordinates that form a rectangle around the face, like this:
Facial analysis
Moving beyond simple face detection, some algorithms can also return other information, such as
facial landmarks (nose, eyes, eyebrows, lips, and others).
These facial landmarks can be used as features with which to train a machine learning model.
Facial recognition
29
AI Dr. Subbulakshmi
A further application of facial analysis is to train a machine learning model to identify known
individuals from their facial features. This usage is more generally known as facial recognition, and
involves using multiple images of each person you want to recognize to train a model so that it can
detect those individuals in new images on which it wasn't trained.
There are many applications for face detection, analysis, and recognition. For example,
Security - facial recognition can be used in building security applications, and increasingly it is
used in smart phones operating systems for unlocking devices.
Social media - facial recognition can be used to automatically tag known friends in
photographs.
Intelligent monitoring - for example, an automobile might include a system that monitors the
driver's face to determine if the driver is looking at the road, looking at a mobile device, or
shows signs of tiredness.
Advertising - analyzing faces in an image can help direct advertisements to an appropriate
demographic audience.
Missing persons - using public cameras systems, facial recognition can be used to identify if a
missing person is in the image frame.
Identity validation - useful at ports of entry kiosks where a person holds a special entry permit.
When used responsibly, facial recognition is an important and useful technology that can improve
efficiency, security, and customer experiences. Face is a building block for creating a facial
recognition system.
Microsoft Azure provides multiple cognitive services that you can use to detect and analyze faces,
including:
Computer Vision, which offers face detection and some basic face analysis, such as returning
the bounding box coordinates around an image.
Video Indexer, which you can use to detect and identify faces in a video.
Face, which offers pre-built algorithms that can detect, recognize, and analyze faces.
Face
Face can return the rectangle coordinates for any human faces that are found in an image, as well as a
series of attributes related to those faces such as:
Blur: how blurred the face is (which can be an indication of how likely the face is to be the
main focus of the image)
Exposure: aspects such as underexposed or over exposed and applies to the face in the image
and not the overall image exposure
Glasses: if the person is wearing glasses
Head pose: the face's orientation in a 3D space
Noise: refers to visual noise in the image. If you have taken a photo with a high ISO setting for
darker settings, you would notice this noise in the image. The image looks grainy or full of tiny
dots that make the image less clear
30
AI Dr. Subbulakshmi
Occlusion: determines if there may be objects blocking the face in the image
To use Face, you must create one of the following types of resource in your Azure subscription:
Face: Use this specific resource type if you don't intend to use any other cognitive services, or
if you want to track utilization and costs for Face separately.
Cognitive Services: A general cognitive services resource that includes Computer Vision along
with many other cognitive services; such as Custom Vision, Form Recognizer, Language, and
others. Use this resource type if you plan to use multiple cognitive services and want to
simplify administration and development.
Whichever type of resource you choose to create, it will provide two pieces of information that you
will need to use it:
The ability for computer systems to process written or printed text is an area of artificial intelligence
(AI) where computer vision intersects with natural language processing. You need computer vision
capabilities to "read" the text, and then you need natural language processing capabilities to make
sense of it.
The basic foundation of processing printed text is optical character recognition (OCR), in which a
model can be trained to recognize individual shapes as letters, numerals, punctuation, or other
elements of text. Much of the early work on implementing this kind of capability was performed by
postal services to support automatic sorting of mail based on postal codes. Since then, the state-of-the-
art for reading text has moved on, and it's now possible to build models that can detect printed or
handwritten text in an image and read it line-by-line or even word-by-word.
In this module, we'll focus on the use of OCR technologies to detect text in images and convert it into
a text-based data format, which can then be stored, printed, or used as the input for further processing
or analysis.
Uses of OCR
The ability to recognize printed and handwritten text in images, is beneficial in many scenarios such
as:
note taking
digitizing forms, such as medical records or historical documents
scanning printed or handwritten checks for bank deposits
31
AI Dr. Subbulakshmi
Get started with the Read API on Azure
The ability to extract text from images is handled by the Computer Vision service, which also
provides image analysis capabilities.
The first step towards using the Computer Vision service is to create a resource for it in your Azure
subscription. You can use either of the following resource types:
Computer Vision: A specific resource for the Computer Vision service. Use this resource type if
you don't intend to use any other cognitive services, or if you want to track utilization and
costs for your Computer Vision resource separately.
Cognitive Services: A general cognitive services resource that includes Computer Vision along
with many other cognitive services; such as Text Analytics, Translator Text, and others. Use
this resource type if you plan to use multiple cognitive services and want to simplify
administration and development.
Whichever type of resource you choose to create, it will provide two pieces of information that you
will need to use it:
If you create a Cognitive Services resource, client applications use the same key and endpoint
regardless of the specific service they are using.
Many times an image contains text. It can be typewritten text or handwritten. Some common
examples are images with road signs, scanned documents that are in an image format such as JPEG or
PNG file formats, or even just a picture taken of a white board that was used during a meeting.
The Computer Vision service provides one application programming interface (APIs) that you can use
to read text in images: the Read API.
The Read API uses the latest recognition models and is optimized for images that have a significant
amount of text or has considerable visual noise.
The Read API can handle scanned documents that have a lot of text. It also has the ability to
automatically determine the proper recognition model to use, taking into consideration lines of text
and supporting images with printed text as well as recognizing handwriting.
Because the Read API can work with large documents, it works asynchronously so as not to block
your application while it is reading the content and returning results to your application. This means
that to use the Read API, your application must use a three-step process:
32
AI Dr. Subbulakshmi
1. Submit an image to the API, and retrieve an operation ID in response.
2. Use the operation ID to check on the status of the image analysis operation, and wait until it
has completed.
3. Retrieve the results of the operation.
The results from the Read API are arranged into the following hierarchy:
Pages - One for each page of text, including information about the page size and orientation.
Lines - The lines of text on a page.
Words - The words in a line of text, including the bounding box coordinates and text itself.
Each line and word includes bounding box coordinates indicating its position on the page.
Introduction
Analyzing text is a process where you evaluate different aspects of a document or phrase, in order to
gain insights into the content of that text. For the most part, humans are able to read some text and
understand the meaning behind it. Even without considering grammar rules for the language the text
is written in, specific insights can be identified in the text.
As an example, you might read some text and identify some key phrases that indicate the main talking
points of the text. You might also recognize names of people or well-known landmarks such as the
Eiffel Tower. Although difficult at times, you might also be able to get a sense for how the person was
feeling when they wrote the text, also commonly known as sentiment.
Text analytics is a process where an artificial intelligence (AI) algorithm, running on a computer,
evaluates these same attributes in text, to determine specific insights. A person will typically rely on
their own experiences and knowledge to achieve the insights. A computer must be provided with
similar knowledge to be able to perform the task. There are some commonly used techniques that can
be used to build software to analyze text, including:
Statistical analysis of terms used in the text. For example, removing common "stop words"
(words like "the" or "a", which reveal little semantic information about the text), and
performing frequency analysis of the remaining words (counting how often each word
appears) can provide clues about the main subject of the text.
Extending frequency analysis to multi-term phrases, commonly known as N-grams (a two-
word phrase is a bi-gram, a three-word phrase is a tri-gram, and so on).
Applying stemming or lemmatization algorithms to normalize words before counting them -
for example, so that words like "power", "powered", and "powerful" are interpreted as being
the same word.
Applying linguistic structure rules to analyze sentences - for example, breaking down
sentences into tree-like structures such as a noun phrase, which itself
contains nouns, verbs, adjectives, and so on.
Encoding words or terms as numeric features that can be used to train a machine learning
model. For example, to classify a text document based on the terms it contains. This technique
33
AI Dr. Subbulakshmi
is often used to perform sentiment analysis, in which a document is classified as positive or
negative.
Creating vectorized models that capture semantic relationships between words by assigning
them to locations in n-dimensional space. This modeling technique might, for example, assign
values to the words "flower" and "plant" that locate them close to one another, while
"skateboard" might be given a value that positions it much further away.
While these techniques can be used to great effect, programming them can be complex. In Microsoft
Azure, the Language cognitive service can help simplify application development by using pre-
trained models that can:
The Language service is a part of the Azure Cognitive Services offerings that can perform advanced
natural language processing over raw text.
To use the Language service in an application, you must provision an appropriate resource in your
Azure subscription. You can choose to provision either of the following types of resource:
A Language resource - choose this resource type if you only plan to use natural language
processing services, or if you want to manage access and billing for the resource separately
from other services.
A Cognitive Services resource - choose this resource type if you plan to use the Language
service in combination with other cognitive services, and you want to manage access and
billing for these services together.
Language detection
Use the language detection capability of the Language service to identify the language in which text is
written. You can submit multiple documents at a time for analysis. For each document submitted to it,
the service will detect:
For example, consider a scenario where you own and operate a restaurant where customers can
complete surveys and provide feedback on the food, the service, staff, and so on. Suppose you have
received the following reviews from customers:
Review 1: "A fantastic place for lunch. The soup was delicious."
34
AI Dr. Subbulakshmi
Review 3: "The croque monsieur avec frites was terrific. Bon appetit!"
You can use the text analytics capabilities in the Language service to detect the language for each of
these reviews; and it might respond with the following results:
Notice that the language detected for review 3 is English, despite the text containing a mix of English
and French. The language detection service will focus on the predominant language in the text. The
service uses an algorithm to determine the predominant language, such as length of phrases or total
amount of text for the language compared to other languages in the text. The predominant language
will be the value returned, along with the language code. The confidence score may be less than 1 as a
result of the mixed language text.
There may be text that is ambiguous in nature, or that has mixed language content. These situations
can present a challenge to the service. An ambiguous content example would be a case where the
document contains limited text, or only punctuation. For example, using the service to analyze the text
":-)", results in a value of unknown for the language name and the language identifier, and a score
of NaN (which is used to indicate not a number).
Sentiment analysis
The text analytics capabilities in the Language service can evaluate text and return sentiment scores
and labels for each sentence. This capability is useful for detecting positive and negative sentiment in
social media, customer reviews, discussion forums and more.
Using the pre-built machine learning classification model, the service evaluates the text and returns a
sentiment score in the range of 0 to 1, with values closer to 1 being a positive sentiment. Scores that
are close to the middle of the range (0.5) are considered neutral or indeterminate.
For example, the following two restaurant reviews could be analyzed for sentiment:
"We had dinner at this restaurant last night and the first thing I noticed was how courteous the staff
was. We were greeted in a friendly manner and taken to our table right away. The table was clean,
the chairs were comfortable, and the food was amazing."
and
"Our dining experience at this restaurant was one of the worst I've ever had. The service was slow,
and the food was awful. I'll never eat at this establishment again."
The sentiment score for the first review might be around 0.9, indicating a positive sentiment; while
the score for the second review might be closer to 0.1, indicating a negative sentiment.
35
AI Dr. Subbulakshmi
Indeterminate sentiment
A score of 0.5 might indicate that the sentiment of the text is indeterminate, and could result from text
that does not have sufficient context to discern a sentiment or insufficient phrasing. For example, a list
of words in a sentence that has no structure, could result in an indeterminate score. Another example
where a score may be 0.5 is in the case where the wrong language code was used. A language code
(such as "en" for English, or "fr" for French) is used to inform the service which language the text is
in. If you pass text in French but tell the service the language code is en for English, the service will
return a score of precisely 0.5.
Key phrase extraction is the concept of evaluating the text of a document, or documents, and then
identifying the main talking points of the document(s). Consider the restaurant scenario discussed
previously. Depending on the volume of surveys that you have collected, it can take a long time to
read through the reviews. Instead, you can use the key phrase extraction capabilities of the Language
service to summarize the main points.
"We had dinner here for a birthday celebration and had a fantastic experience. We were greeted by a
friendly hostess and taken to our table right away. The ambiance was relaxed, the food was amazing,
and service was terrific. If you like great food and attentive service, you should try this place."
Key phrase extraction can provide some context to this review by extracting the following phrases:
attentive service
great food
birthday celebration
fantastic experience
table
friendly hostess
dinner
ambiance
place
Not only can you use sentiment analysis to determine that this review is positive, you can use the key
phrases to identify important elements of the review.
Entity recognition
You can provide the Language service with unstructured text and it will return a list of entities in the
text that it recognizes. The service can also provide links to more information about that entity on the
web. An entity is essentially an item of a particular type or a category; and in some cases, subtype,
such as those as shown in the following table.
36
AI Dr. Subbulakshmi
Type SubType Example
Quantity Number "6" or "six"
Quantity Percentage "25%" or "fifty percent"
Quantity Ordinal "1st" or "first"
Quantity Age "90 day old" or "30 years old"
Quantity Currency "10.99"
Quantity Dimension "10 miles", "40 cm"
Quantity Temperature "45 degrees"
DateTime "6:30PM February 4, 2012"
DateTime Date "May 2nd, 2017" or "05/02/2017"
DateTime Time "8am" or "8:00"
DateTime DateRange "May 2nd to May 5th"
DateTime TimeRange "6pm to 7pm"
DateTime Duration "1 minute and 45 seconds"
DateTime Set "every Tuesday"
URL "https://www.bing.com"
Email "support@microsoft.com"
US-based Phone Number "(312) 555-0176"
IP Address "10.0.1.125"
Speech Recognition
Introduction
Increasingly, we expect artificial intelligence (AI) solutions to accept vocal commands and provide
spoken responses. Consider the growing number of home and auto systems that you can control by
speaking to them - issuing commands such as "turn off the lights", and soliciting verbal answers to
questions such as "will it rain today?"
To enable this kind of interaction, the AI system must support two capabilities:
Speech recognition
Speech recognition is concerned with taking the spoken word and converting it into data that can be
processed - often by transcribing it into a text representation. The spoken words can be in the form of
a recorded voice in an audio file, or live audio from a microphone. Speech patterns are analyzed in the
audio to determine recognizable patterns that are mapped to words. To accomplish this feat, the
software typically uses multiple types of models, including:
An acoustic model that converts the audio signal into phonemes (representations of specific
sounds).
A language model that maps phonemes to words, usually using a statistical algorithm that
predicts the most probable sequence of words based on the phonemes.
37
AI Dr. Subbulakshmi
The recognized words are typically converted to text, which you can use for various purposes, such
as.
Speech synthesis
Speech synthesis is in many respects the reverse of speech recognition. It is concerned with vocalizing
data, usually by converting text to speech. A speech synthesis solution typically requires the following
information:
To synthesize speech, the system typically tokenizes the text to break it down into individual words,
and assigns phonetic sounds to each word. It then breaks the phonetic transcription into prosodic units
(such as phrases, clauses, or sentences) to create phonemes that will be converted to audio format.
These phonemes are then synthesized as audio by applying a voice, which will determine parameters
such as pitch and timbre; and generating an audio wave form that can be output to a speaker or written
to a file.
You can use the output of speech synthesis for many purposes, including:
Microsoft Azure offers both speech recognition and speech synthesis capabilities through
the Speech cognitive service, which includes the following application programming interfaces
(APIs):
To use the Speech service in an application, you must create an appropriate resource in your Azure
subscription. You can choose to create either of the following types of resource:
A Speech resource - choose this resource type if you only plan to use the Speech service, or if
you want to manage access and billing for the resource separately from other services.
A Cognitive Services resource - choose this resource type if you plan to use the Speech service
in combination with other cognitive services, and you want to manage access and billing for
these services together.
38
AI Dr. Subbulakshmi
The speech-to-text API
You can use the speech-to-text API to perform real-time or batch transcription of audio into a text
format. The audio source for transcription can be a real-time audio stream from a microphone or an
audio file.
The model that is used by the speech-to-text API, is based on the Universal Language Model that was
trained by Microsoft. The data for the model is Microsoft-owned and deployed to Microsoft Azure.
The model is optimized for two scenarios, conversational and dictation. You can also create and train
your own custom models including acoustics, language, and pronunciation if the pre-built models
from Microsoft do not provide what you need.
Real-time transcription
Real-time speech-to-text allows you to transcribe text in audio streams. You can use real-time
transcription for presentations, demos, or any other scenario where a person is speaking.
In order for real-time transcription to work, your application will need to be listening for incoming
audio from a microphone, or other audio input source such as an audio file. Your application code
streams the audio to the service, which returns the transcribed text.
Batch transcription
Not all speech-to-text scenarios are real time. You may have audio recordings stored on a file share, a
remote server, or even on Azure storage. You can point to audio files with a shared access signature
(SAS) URI and asynchronously receive transcription results.
Batch transcription should be run in an asynchronous manner because the batch jobs are scheduled on
a best-effort basis. Normally a job will start executing within minutes of the request but there is no
estimate for when a job changes into the running state.
The text-to-speech API enables you to convert text input to audible speech, which can either be
played directly through a computer speaker or written to an audio file.
When you use the text-to-speech API, you can specify the voice to be used to vocalize the text. This
capability offers you the flexibility to personalize your speech synthesis solution and give it a specific
character.
The service includes multiple pre-defined voices with support for multiple languages and regional
pronunciation, including standardvoices as well as neural voices that leverage neural networks to
overcome common limitations in speech synthesis with regard to intonation, resulting in a more
natural sounding voice. You can also develop custom voices and use them with the text-to-speech API
As organizations and individuals increasingly need to collaborate with people in other cultures and
geographic locations, the removal of language barriers has become a significant problem.
39
AI Dr. Subbulakshmi
One solution is to find bilingual, or even multilingual, people to translate between languages.
However the scarcity of such skills, and the number of possible language combinations can make this
approach difficult to scale. Increasingly, automated translation, sometimes known as machine
translation, is being employed to solve this problem.
Early attempts at machine translation applied literal translations. A literal translation is where each
word is translated to the corresponding word in the target language. This approach presents some
issues. For one case, there may not be an equivalent word in the target language. Another case is
where literal translation can change the meaning of the phrase or not get the context correct.
For example, the French phrase "éteindre la lumière" can be translated to English as "turn off the
light". However, in French you might also say "fermer la lumiere" to mean the same thing. The
French verb fermer literally means to "close", so a literal translation based only on the words would
indicate, in English, "close the light"; which for the average English speaker, doesn't really make
sense, so to be useful, a translation service should take into account the semantic context and return an
English translation of "turn off the light".
Artificial intelligence systems must be able to understand, not only the words, but also
the semantic context in which they are used. In this way, the service can return a more accurate
translation of the input phrase or phrases. The grammar rules, formal versus informal, and
colloquialisms all need to be considered.
Text translation can be used to translate documents from one language to another, translate email
communications that come from foreign governments, and even provide the ability to translate web
pages on the Internet. Many times you will see a Translate option for posts on social media sites, or
the Bing search engine can offer to translate entire web pages that are returned in search results.
Speech translation is used to translate between spoken languages, sometimes directly (speech-to-
speech translation) and sometimes by translating to an intermediary text format (speech-to-text
translation).
Microsoft Azure provides cognitive services that support translation. Specifically, you can
use the following services:
40
AI Dr. Subbulakshmi
There are dedicated Translator and Speech resource types for these services, which you can
use if you want to manage access and billing for each service individually.
Alternatively, you can create a Cognitive Services resource that provides access to both
services through a single Azure resource, consolidating billing and enabling applications to
access both services through a single endpoint and authentication key.
The Translator service supports text-to-text translation between more than 60 languages.
When using the service, you must specify the language you are translating from and the
language you are translating to using ISO 639-1 language codes, such as en for English, fr for
French, and zh for Chinese. Alternatively, you can specify cultural variants of languages by
extending the language code with the appropriate 3166-1 cultural code - for example, en-
US for US English, en-GB for British English, or fr-CA for Canadian French.
When using the Translator service, you can specify one from language with
multiple to languages, enabling you to simultaneously translate a source document into
multiple languages.
Optional Configurations
The Translator API offers some optional configuration to help you fine-tune the results that
are returned, including:
Profanity filtering. Without any configuration, the service will translate the input text,
without filtering out profanity. Profanity levels are typically culture-specific but you
can control profanity translation by either marking the translated text as profane or by
omitting it in the results.
Selective translation. You can tag content so that it isn't translated. For example, you
may want to tag code, a brand name, or a word/phrase that doesn't make sense when
localized.
You can use the Speech Translation API to translate spoken audio from a streaming source,
such as a microphone or audio file, and return the translation as text or an audio stream. This
enables scenarios such as real-time closed captioning for a speech or simultaneous two-way
translation of a spoken conversation.
As with the Translator service, you can specify one source language and one or more target
languages to which the source should be translated. You can translate speech into over 60
languages.
The source language must be specified using the extended language and culture code format,
such as es-US for American Spanish. This requirement helps ensure that the source is
understood properly, allowing for localized pronunciation and linguistic idioms.
The target languages must be specified using a two-character language code, such as en for
English or de for German.
Voice calls
Messaging services
Online chat applications
Email
Social media platforms
Collaborative workplace tools
We've become so used to ubiquitous connectivity, that we expect the organizations we deal
with to be easily contactable and immediately responsive through the channels we already
use. Additionally, we expect these organizations to engage with us individually, and be able
to answer complex questions at a personal level.
Conversational AI
While many organizations publish support information and answers to frequently asked
questions (FAQs) that can be accessed through a web browser or dedicated app. The
complexity of the systems and services they offer means that answers to specific questions
are hard to find. Often, these organizations find their support personnel being overloaded
with requests for help through phone calls, email, text messages, social media, and other
channels.
42
AI Dr. Subbulakshmi
Increasingly, organizations are turning to artificial intelligence (AI) solutions that make use
of AI agents, commonly known as bots to provide a first-line of automated support through
the full range of channels that we use to communicate. Bots are designed to interact with
users in a conversational manner, as shown in this example of a chat interface:
Conversations typically take the form of messages exchanged in turns; and one of the most
common kinds of conversational exchange is a question followed by an answer. This pattern
forms the basis for many user support bots, and can often be based on existing FAQ
documentation. To implement this kind of solution, you need:
A knowledge base of question and answer pairs - usually with some built-in natural
language processing model to enable questions that can be phrased in multiple ways
to be understood with the same semantic meaning.
A bot service that provides an interface to the knowledge base through one or more
channels.
Language service. The Language service includes a custom question answering feature
that enables you to create a knowledge base of question and answer pairs that can be
queried using natural language input.
Note
The question answering capability in the Language service is a newer version of the
QnA Maker service - which is still available as a separate service.
Azure Bot service. This service provides a framework for developing, publishing, and
managing bots on Azure.
To create a knowledge base, you must first provision a Language service resource in your
Azure subscription.
43
AI Dr. Subbulakshmi
Define questions and answers
After provisioning a Language service resource, you can use the Language Studio's custom
question answering feature to create a knowledge base that consists of question-and-answer
pairs. These questions and answers can be:
In many cases, a knowledge base is created using a combination of all of these techniques;
starting with a base dataset of questions and answers from an existing FAQ document and
extending the knowledge base with additional manual entries.
Questions in the knowledge base can be assigned alternative phrasing to help consolidate
questions with the same meaning. For example, you might include a question like:
You can anticipate different ways this question could be asked by adding an alternative
phrasing such as:
After creating a set of question-and-answer pairs, you must save it. This process analyzes
your literal questions and answers and applies a built-in natural language processing model to
match appropriate answers to questions, even when they are not phrased exactly as specified
in your question definitions. Then you can use the built-in test interface in the Language
Studio to test your knowledge base by submitting questions and reviewing the answers that
are returned.
When you're satisfied with your knowledge base, deploy it. Then you can use it over its
REST interface. To access the knowledge base, client applications require:
44
AI Dr. Subbulakshmi
Create a bot for your knowledge base
You can create a custom bot by using the Microsoft Bot Framework SDK to write code that
controls conversation flow and integrates with your knowledge base. However, an easier
approach is to use the automatic bot creation functionality, which enables you create a bot for
your deployed knowledge base and publish it as an Azure Bot Service application with just a
few clicks.
After creating your bot, you can manage it in the Azure portal, where you can:
For simple updates, you can edit bot code directly in the Azure portal. However, for more
comprehensive customization, you can download the source code and edit it locally;
republishing the bot directly to Azure when you're ready.
Connect channels
When your bot is ready to be delivered to users, you can connect it to multiple channels;
making it possible for users to interact with it through web chat, email, Microsoft Teams, and
other common communication media.
Users can submit questions to the bot through any of its channels, and receive an appropriate
answer from the knowledge base on which the bot is based.
4.Knowledge Mining
Azure Cognitive Search results contain only your data, which can include text inferred or
extracted from images, or new entities and key phrases detection through text analytics. It's a
45
AI Dr. Subbulakshmi
Platform as a Service (PaaS) solution. Microsoft manages the infrastructure and availability,
allowing your organization to benefit without the need to purchase or manage dedicated
hardware resources.
Data from any source: Azure Cognitive Search accepts data from any source provided
in JSON format, with auto crawling support for selected data sources in Azure.
Full text search and analysis: Azure Cognitive Search offers full text search capabilities
supporting both simple query and full Lucene query syntax.
AI powered search: Azure Cognitive Search has Cognitive AI capabilities built in for
image and text analysis from raw content.
Multi-lingual: Azure Cognitive Search offers linguistic analysis for 56 languages to
intelligently handle phonetic matching or language-specific linguistics. Natural
language processors available in Azure Cognitive Search are also used by Bing and
Office.
Geo-enabled: Azure Cognitive Search supports geo-search filtering based on proximity
to a physical location.
Configurable user experience: Azure Cognitive Search has several features to improve
the user experience including autocomplete, autosuggest, pagination, and hit
highlighting.
AI processing is achieved by adding and combining skills in a skillset. A skillset defines the
operations that extract and enrich data to make it searchable. These AI skills can be either
built-in skills, such as text translation or Optical Character Recognition (OCR), or custom
skills that you provide.
46
AI Dr. Subbulakshmi
Built in skills
Built-in skills are based on pre-trained models from Microsoft, which means you can't train
the model using your own training data. Skills that call the Cognitive Resources APIs have a
dependency on those services and are billed at the Cognitive Services pay-as-you-go price
when you attach a resource. Other skills are metered by Azure Cognitive Search, or are utility
skills that are available at no charge.
Natural language processing skills: with these skills, unstructured text is mapped as
searchable and filterable fields in an index.
Key Phrase Extraction: uses a pre-trained model to detect important phrases based on
term placement, linguistic rules, proximity to other terms, and how unusual the term is
within the source data.
Text Translation Skill: uses a pre-trained model to translate the input text into various
languages for normalization or localization use cases.
Image processing skills: creates text representations of image content, making it searchable
using the query capabilities of Azure Cognitive Search.
Image Analysis Skill: uses an image detection algorithm to identify the content of an
image and generate a text description.
Optical Character Recognition Skill: allows you to extract printed or handwritten text
from images, such as photos of street signs and products, as well as from documents—
invoices, bills, financial reports, articles, and more.
Important Questions
2 Marks
What is AI?
What is ML?
What is computer vision?
What is NLP?
What is knowledge mining?
What are the two types of AI?
What is Anomaly detection?
What is Machine learning in Azure?
47
AI Dr. Subbulakshmi
What is automated ML in azure?
What is Azure ML designer?
What is datasets?
What are pipelines?
What is cognitive search? And what are all the services available under cognitive search?
What is supervised ML?
What is unsupervised ML?
What is categorization?
What is classification?
What is regression?
What is resource?
What is workspace?
What are components?
What is Machine learning Jobs?
What is describing in computer vision
What is tagging in computer vision
What is detecting objects
What is detecting brands
What is cognitive service
Differentiate between computer vision and custom vision
What is key and endpoints?
What is precision in model evaluation?
What is recall in model evaluation?
What is object detection?
What is natural language processing?
What is speech recognition?
What is speech synthesize?
What is speech to text API?
What is text to speech API?
What is real time transcription?
What is batch transcription?
What is literal and semantic translation?
What is conversational AI?
What is knowledge mining?
What is cognitive search?
5 Marks
What is AI? What are the applications of AI in business?
What is workloads? What are the key workloads in AI?
What are the steps in the wildflower identification ML model?
What are the components of computer vision?
What are the steps in natural language processing?
Explain the six components of responsible AI?
What are the challenges and issues with AI?
Explain the basis of mathematical model in ML with example?
Discuss the types of ML with examples?
What are the different steps / tasks involved machine learning in Azure?
48
AI Dr. Subbulakshmi
Discuss the steps involved in creating a resource in Azure Machine Learning
Discuss the steps involved in creating and loading the data set in Azure Machine Learning
studio.
Discuss the steps involved in creating the job configuration of classification training model
in Azure Machine Learning studio using Automated ML.
Discuss the steps involved in creating the job configuration of regression training model in
Azure Machine Learning studio using Automated ML.
Discuss the steps involved in creating the job configuration of regression training model in
Azure Machine Learning designer.
What are the Uses of computer vision
What are the different ways of analysing images
What are the different ways of creating resources in Computer vision?
What are the uses of image classification?
What are the steps involved in image classification in custom vision?
What are the steps involved in object detection in custom vision?
What are the uses of object detection?
What are the steps involved in image classification in custom vision?
What are the uses of face detection and Analysis?
What are the attributes of face services?
What are the steps involved in reading text using OCR in computer vision?
What are the different statistical analysis can be performed in the text analysis?
What are the different aspects of text analysis in Azure?
Explain briefly about speech synthesize and speech recognition?
What are the steps involved in speech synthesize and speech recognition?
What are the different ways of creating resources in Natural language processing?
What are the steps involved in speech and text translation?
What are the steps involved in creating a conversational AI?
What are the steps involved in Knowledge mining?
49
AI Dr. Subbulakshmi