[go: up one dir, main page]

0% found this document useful (0 votes)
15 views38 pages

Module 10 Applying Predictive Analytics - Elounge

The document is a workbook on applying predictive analytics, detailing the steps for machine learning model deployment, including training, validation, deployment, and monitoring. It also discusses the ethical considerations and risks associated with predictive models, emphasizing the importance of evaluating their impact in a social context. Additionally, it covers model testing methodologies, such as A/B testing, to ensure model performance post-deployment.

Uploaded by

rucklidgeman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views38 pages

Module 10 Applying Predictive Analytics - Elounge

The document is a workbook on applying predictive analytics, detailing the steps for machine learning model deployment, including training, validation, deployment, and monitoring. It also discusses the ethical considerations and risks associated with predictive models, emphasizing the importance of evaluating their impact in a social context. Additionally, it covers model testing methodologies, such as A/B testing, to ensure model performance post-deployment.

Uploaded by

rucklidgeman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

APPLYING PREDICTIVE

ANALYTICS
Module Workbook
Applying Predictive Analytics

Contents

Introduction.............................................................................................................................. 3
Module performance objectives .................................................................................... 4
Lesson 1 : Manage the predictive model deployment ............................................. 5
Machine learning model deployment .......................................................................... 6
Model testing .................................................................................................................. 10
Lesson Summary .................................................................................................................. 13
Notes: ............................................................................................................................... 15
Lesson 2 : Understanding risks ....................................................................................... 16
Common ground considerations ................................................................................ 17
The ethical dimension ................................................................................................... 19
Risks in predictive models ............................................................................................ 21
Lesson Summary .................................................................................................................. 23
2
Notes: ............................................................................................................................... 25
Lesson 3 : Limits of prediction and ethical considerations ................................ 26
Black Swans .................................................................................................................... 27
Good practices to ensure the ethical use of predictive analytics ........................... 30
Lesson Summary .................................................................................................................. 34
Notes: ............................................................................................................................... 36
References .............................................................................................................................. 37
Applying Predictive Analytics

Introduction 3

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

Module performance objectives

Notes:
After completing the lessons, you should be able to:

• Describe the main steps and key arrangements for a successful machine learning
model deployment.
• Identify risks and apply good practices for an ethical use of predictive models.

4
Applying Predictive Analytics

Lesson 1 : Manage the predictive model


deployment 5

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

Machine learning model deployment

What is Machine Learning Model Deployment?

Definition
Model Deployment: Machine learning model deployment is the process of placing a
finished machine learning model into a live environment where it can be used for its
intended purpose.

From offline to online


As we usually develop Machine learning models in an offline environment, they need to be
deployed online to be used with live data. In a real-world situation, one data project could
create tons of models, and only the optimal ones could make it to the deployment stage.
Both the development and deployment stages are resource intensive.

Steps of Machine Learning Model Deployment


6
In the machine learning model deployment, we will need to integrate a combination of
skills to deliver the task. Similar to the model development, the deployment stage also has
4 steps to follow that are training, validation, deployment, and monitoring.

1. Training
• Training environment setup
To begin, we must perform an analysis of the production environment, which entails
comprehending the model's forward and backward processes. This includes
identifying the platforms that provide the data, the resources required for the model
to operate effectively, and the applications that utilize the prediction results.

Once the analysis is complete, we can proceed to design an environment that is


tailored to the specific needs of this model training, utilizing the tools and resources
identified during the analysis phase.
Applying Predictive Analytics

• Training design
To train the model, we will first choose an appropriate algorithm, configure its parameters,
and use cleaned data to train it. The algorithm must be tailored to suit the deployment
environment, and its implementation will be managed by the deployment team.
Additionally, we will conduct frequent retraining to update the model and ensure it remains
relevant over time.
Once we have trained multiple models, we select the few that meet our desired objectives
and proceed to the next stage. Often, we train several models, but only a select few are
deemed suitable to advance to the subsequent phase.
2. Validation
Having identified models that can produce favorable results during the training phase, our
next step is to validate their performance to ensure stability and reliability in producing
valuable outcomes.
During the validation process, we will retest the models using a new dataset and compare
the results against those obtained during the training phase. In addition, we will review the
methodology and data used during the training to ensure they align with the initial
requirements of the organization and users. By conducting this thorough analysis, we can
ensure that the selected models are suitable for deployment and will deliver the expected
outcomes.
What to be validated in this step: 7

• Efficiency: the model keeps generate successful predictions.


• Ethics: The model meets the regulatory compliance and organizational governance.
Among the models that are validated, we will only deploy the most successful ones.

3. Deployment
Applying Predictive Analytics

Three sub-steps are required when we finally deploy the model to the live process:
1. Move the model into the deployed environment, so it will access the essential
resources and the live data.

2. Integrate the model into the organization process. Users will not directly run your
coding, instead, they will use a website or software, which is where the model will
be embedded.

3. End users start using the model by activating it, accessing the data, and interpreting
the output.

Common approaches to deploying the model


a. On-demand
Create a REST API (application programming interface) to receive requests, and then the
model gets input to predict. It can provide cheaper and near real-time predictions, with less
concern about the CPU (Central Processing Unit) power when running on cloud service.
The model can be easily made available to other applications.
b. Batch
Collect input data in a storage file, and generate the prediction to a selected destination. It 8
optimizes the required resources to a minimum. For example, it reduces dependencies on
external data sources and cloud services, and sometimes even the local processing power
is enough for computing complex models.
c. Real-time
Package the model together with the application using its prediction. This approach
generates real-time prediction, while ensuring efficient network bandwidth and power
consumption, as well as privacy. However, the application needs to have a new version
when the model is updated.

4. Monitoring
After the model has been deployed, we will need to closely monitor its performance. This
will be part of the regular monitoring process of the organization, as not only the functions
of the model and effectiveness of the prediction will be monitored, but also the supporting
software and other resources.
The real-world setting is a dynamic scenario. Any issues could happen after deployment.
Thus, below four problems are the most common things we need to pay extra attention to.
o Deployed data variance:
Applying Predictive Analytics

The deployed data will not be cleaned anymore. Unlike the training and testing data, real-
time data will have lots of issues the model needs to be careful with. Otherwise, the model
will be changed by these issues.
o Data integrity changes:
Data changes over time under manual effects. The deployed data might come with new
formats, new categories, and switched orders of indicators, which don't match the initial
model anymore.
o Data drift:
With the context evolving, the current scenario could be totally different from the initial
ones, where we developed and trained the model. Data drift happens when the population
demography changes, the market crashes, or conflicts emerge, which makes the model
irrelevant to the situation now.
o Concept drift:
With the scenario switching, end users could have evolved needs. Thus, the criteria for a
successful output could change with their expectations.

The crucial part of conducting monitoring is the automation of the process. Monitoring is
routine work to evaluate the model performance. It could be easily automated with other
industrial tools to track essential metrics. You will only need to take action when alerts are 9
made on accuracy and precision.

Key concerns for model deployment


• Environment: Highly demanding capacity change when scaling up from a training
environment to real-world data.
• Infrastructure: Creating the right infrastructure and environment for the
deployment.
• Performance: In a real-world setting, it is extra hard to monitor model accuracy and
efficiency.
• Interpretation: The model might show predictions difficult to explain.
• Capacity: Potential skill sets and expertise gap between scientists who developed
the model and the ones who deploy it
Applying Predictive Analytics

Model testing

In the validation, we will select the model that generates the smallest error among all the
models we got after training. However, with rounds of testing going on, the model might
start learning the noise within the dataset, so the model couldn't fit into other datasets
which leads to overfitting.
We usually will carry out one more measurement to confirm the performance of the final
model with a third independent dataset, which is called model testing. It will be conducted
after we deployed the model to live data during the monitoring step.

The motivation of Model Testing


A model’s performance in the post-deployment phase is often different from its
performance in the validation phase. Below are some of the common causes of this
discrepancy.
• The independent variables developed a new feature that doesn't fit the model
• How we measure the performance of the model may differ from the beginning
• The model implementation has an unexpected impact on the model performance
As part of the monitoring step, model testing addresses the following questions:
10
o Does the final model behave as expected?
o Is the final model indeed the best model, or does a different model perform better on
new, unseen data?
To answer these questions, we will introduce the most common frameworks to identify the
best-performing models among a competing set.

A/B Model Testing


The idea of A/B testing is simply straightforward. We use a new machine learning model
(B) to test and compare it with the current model (A). By comparing the prediction, we
decide if the new model is an improvement.
We design the randomized experimentation process, split the live data randomly into half
based on one indicator (e.g. location, device), and simultaneously apply Model A to the
first batch and Model B to the second for a defined period of time.
We don't always need to split the data into 50-50. It depends on the deployment
environment and requirements of the application. For example, sometimes if we want to
ensure the proper functioning of the application, we might direct 80% of incoming traffic to
Model A and the rest 20% to the challenger Model B. Below is an example of the testing.
Applying Predictive Analytics

As we deployed the model to operation, we often don't just deploy one. Alternatively, we
will simultaneously deploy and compare multiple models with similar training
performances. Thus, most of the time, we will compare more than two models, e.g. Model
A will be compared to Models B, C, D, … N. Then, an A/B test becomes an A/B/C/D/.../N
test.
11

A/B test steps

Glossary
Below are some key terms in the experimentation. They will guide us design a proper
model testing experiment.
• Performance metric: Measures of the model quality, which completely depends on
the type of model and the implementation plan of the model.
• Test type: Options for statistical tests, e.g. One-tailed t-Test, Two-tailed t-Test, etc.
• Effect size: The difference between the two models’ performance metrics, often
normalized by standard deviation.
Applying Predictive Analytics

• Sample size: N, based on your choice of selected minimum effect size, significance
level, statistical power, and estimated sample variance.
• Statistical power (sensitivity): The probability of correctly rejecting the null
hypothesis, i.e., correctly identifying an effect when there is one. Often, the desired
power of an experiment is between 80-95%.
• Confidence Level: Concern on the cost-benefit:The probability of correctly
retaining the null hypothesis when no difference in effects exists. Often, the
confidence level is set to 95%

It is tricky to design a balanced model testing when considering the quality of the results
and the costs of implementation. The higher accuracy level requires more resources and a
longer time to deliver. Sometimes, even a slightly better model might require a huge
improvement in the computational infrastructure to support it in the long term. Does this
improvement worth the input?

12
Applying Predictive Analytics

Lesson Summary 13

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

IN SUMMARY:
Machine learning model deployment
• Machine learning models are developed in an offline environment, they need
to be deployed online to be used with live data.
• The model deployment stage has 4 steps: training, validation, deployment,
and monitoring.
Model testing
• Model testing is the post-deployment measurement to confirm the
performance of the final model with a third independent dataset.
• A/B testing is a simple and useful method to compare multiple models with
similar training performances.
• We need to conduct cost-benefit analyses before going to a higher
accuracy level testing which requires more resources and a longer time to
deliver.

14
Applying Predictive Analytics

Notes:

15
Applying Predictive Analytics

Lesson 2 : Understanding risks 16

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

Common ground considerations

Definition - Artificial intelligence


Artificial Intelligence is a computer science discipline that focuses on creating intelligent
programs. They try to behave or mimic the human brain's conduct.
For an intelligent system to be fully autonomous and adaptive, different aspects of
intelligent processing must be engaged and integrated, such as:
1. Reasoning focuses on how to extract conclusions from premises. Humans have
complex ways of reasoning, and some of our reasoning can be wrong or
misleading; however, we can find a common way of how the human brain works
when reasoning.
2. Knowledge representation is a complex area, where we try to understand how the
human brain stores information. How do we keep facts, ideas, and concepts in our
memory? How can we effectively use these?
3. Perception is about how humans relate to the environment and how changes in the
surroundings are acknowledged and stored as experience and knowledge. The
human brain shows excellent capabilities to understand the world around us, after
all, millions of years of evolution have allowed human beings to survive in hostile
environments. 17
4. Learning is related to perception and expertise. How do we learn? It is clear that
humans have superb learning capabilities, and we learn from experience. In the last
module, we explored how self-learning algorithms are programmed to refine their
own performance.

Machine Learning is a subset of Artificial Intelligence focusing on a specific goal: setting


computers up to be able to perform tasks without the need for explicit programming.
Deep Learning is a sub-type of Machine Learning that uses vast volumes of data and
complex algorithms to train a model.
Applying Predictive Analytics

Predictive analytics works with large amounts of complex data. To deal with this
complexity, we sometimes apply the self-learning algorithm methodologies developed in
Artificial Intelligence.
Artificial intelligence and intelligent systems present areas of ethical concern to society,
such as privacy and transparency, bias and discrimination, and the role of human 18
judgment. As a matter of fact, the last one is considered the most difficult:
Applying Predictive Analytics

The ethical dimension

Intelligent systems and the social context


Ethics in intelligent systems are of utmost importance because those systems might be
operating in a social context, and therefore, have a significant impact on people. Thus,
each predictive model needs to be evaluated under the social context to eliminate potential
ethical charges.

19

A system that predicts the wind speed to improve the performance of a wind power plant
has no ethical charge. It does have risks, but no ethical impacts as the predictive system
do not impact any person. The system can be accurate or inaccurate, but it can't harm
people. For this reason, we must consider the risks, but the ethical implications are limited
here.
A system that predicts the risk credit scoring of an individual has risks (underperforming,
costs, etc). And at the same time, it has ethical implications as it can harm people (by
denying loans),) or discriminate (if a group of people are overweighted in the credit
scoring).
For example, redlining is a discriminatory practice using predictive analytics. Services like
loans and healthcare are withheld from potential customers who reside in racial and ethnic
minorities, and low-income residents' neighborhoods, because the predictive model
classified them as "hazardous" to investment.
Applying Predictive Analytics

How can we define ethics in the predictive analytics context?


Ethics
Are moral principles that govern a person's or a system's behaviour or the conducting of
an activity.
Systems that have biases, incomplete data, or manage private data without consent have
a potential impact on Human Rights, such as the right to privacy, fairness of treatment,
and freedom from discrimination. Those elements become ethical concerns in predictive
applications.
We may think that by recording information that is not individualized, we are protecting
everyone. But in reality, predictive analytics inferences may violate privacy rights by using
the data provided anonymously to target and discriminate against specific groups.
When predictive algorithms try to predict individual behaviour, we are fully entirely within
the scope of ethics analysis boundaries.
Some impacts are not visible at first sight. Therefore, we need to put in place a system to
carefully review the ethical dimension of the models and work with experts.

20
Applying Predictive Analytics

Risks in predictive models

Benefits and risks of predictive analytics models


There are many benefits of using Predictive Analytics.
It offers the possibility to allocate resources in more effective and efficient ways, reveals
patterns and trends that are invisible to the human eye, performs more complex analyses
and responds to problems proactively rather than reactively.
However, there are also inherent risks derived, among others, from the technology
applied, the data used, and the purpose of the model.

1. Risks related to Data Privacy


Data is an essential part of a predictive system. How to obtain the data (by polling, or by
accessing an existing dataset), the structure of the data, origin and content, are relevant
considerations and have an impact on the results, utility and sustainability of the model.
The challenge to data privacy is more complex with Predictive Analytics. By leveraging
less sensitive and more readily available information (proxy data) left by millions of
individuals, sensitive information about individuals and groups can be predicted without the
data subjects' knowledge. 21
We can violate an individual or group's predictive privacy, if sensitive information about
that individual is statistically estimated against their will or without their knowledge or
consent.
In this regard, Rainer Muehlhoff has coined the concept of predictive privacy, as an ethical
principle that seeks to protect individuals and groups against unfair treatment and
infringements on their autonomy, dignity and well-being resulting from the use of predicted
information by predictive analytics systems.
Source: Mulhoff 2021. Predictive privacy: towards applied ethics of data analytics.

2. Risks related to biased algorithm


There are 'multiple' decision 'points’ within the design and implementation of a Predictive
Analytics model, at which human prejudices and bias can affect accuracy and efficacy
(Glaberson, 2019).
In fact, bias can be intentional or unintentional. Algorithms are expected to act without any
of the biases or prejudices that affect human decision-making. However, an algorithm can
unintentionally learn bias from a variety of different sources. For instance, the datasets
used to train the algorithm, or even the people who are developing the algorithm can be
sources of a biased algorithm.
There is a high risk of discrimination and unfair treatment in biased algorithms.
Applying Predictive Analytics

For example, in models for predicting whether individuals should be detained or released
on bail pending trial, a race- or gender-biased algorithm assessing an individual's
likelihood to commit a future offence can give more extended detention periods for those
discriminated groups.

3. Risks related to biased datasets


Poor data quality is the main enemy of a predictive model. The negative implications of
poor data in predictive analytics are worse than in other types of analytics, as data is used
twice: First in the historical data used to train the predictive model, and a second time, in
the new data used by the model to make future predictions.
Thus, historical data must meet exceptionally broad and high-quality standards to properly
train a predictive model. We need to have the data 'right', meaning it is adequately
labelled, as well as the 'right' data, meaning it doesn't have bias and is representative of
our universe.

22
Applying Predictive Analytics

Lesson Summary 23

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

IN SUMMARY:
Common ground considerations
• Artificial intelligence and predictive analytics are intimately related and
sometimes used as synonyms. However, they are quite different.
• Artificial intelligence is the simulation of human intelligence processes by
machines, especially computer systems. Machine learning is a subset of
artificial intelligence. Many predictive models are based on machine learning
applications.
The ethical dimension
• Some ethical areas of concern include privacy and explainability risks,
biased algorithms, and unrepresentative data.
Risks in predictive models
• Risks in predictive models are concerning data privacy, biased algorithm,
and biased datasets.
• We can violate predictive privacy if sensitive information about that individual
is statistically estimated against one's will or without one's knowledge or
consent.
• Biased algorithms lead to discrimination and unfair treatment of minority
groups. 24
• Missing data, improper sample size, and misclassification or measurement
error are common reasons that lead to biased datasets.
Applying Predictive Analytics

Notes:

25
Applying Predictive Analytics

Lesson 3 : Limits of prediction and ethical


considerations 26

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

Black Swans

“Prediction is challenging, especially if it’s about the future” Niels Böhr


In 2007 in his book "The Black Swan" Nassim Taleb explains how an event can be called
a black swan:

"First, a black swan event is an outlier, as it lies outside the realm of regular expectations
because nothing in the past can convincingly point to its possibility. Second, it carries an
extreme 'impact.' Third, despite its outlier status, human nature makes us concoct
explanations for its occurrence after the fact, making it explainable and predictable."

COVID-19 pandemic
Let's think for a moment about the definition of a black swan, and events that can be
considered black swan events. To what extent can we consider the COVID-19 pandemic a
black swan event?
This has been a controversial topic of discussion. Many researchers have argued that the
COVID-19 pandemic is not a black swan event as pandemics are unfortunately not new in
global history, and it was expected with a great level of certainty that a global pandemic
would eventually occur. 27
Applying Predictive Analytics

The graph below shows the minimum number of deaths from epidemics and their duration
from 1850 to date.

28

The Fukushima nuclear disaster.


A magnitude nine earthquake caused the Fukushima Daiichi tsunami (March 2011, 14 m
wave). There were two recorded events of similar magnitude in that area (Sanriku coast):
• 9th century: an earthquake of estimated magnitude 8.6
• 17th century: an earthquake of estimated magnitude 8.1, leading to a 20 m tsunami
These historical records seem not to have been considered when designing the
Fukushima Daiichi nuclear power plant, as its design was based on a maximum wave
height of 5.7 m. Based on this information, the nuclear power plant could have avoided the
2011 tsunami if built somewhere else using available historical data.
Applying Predictive Analytics

The largest crash in crypto history


In the first half of May 2022, the algorithmic Terra stablecoin experienced a dramatic
collapse. The drop in Terra stablecoin exacerbated weakness in investor sentiment,
causing Bitcoin to fall to its lowest level in over 14 months, resulting in the worst crypto
market crash to date.
This event was entirely unexpected due to the previous evolution and the health of the
crypto market.

29

How are we supposed to cope with the unpredictable?


Research suggests avoiding attempting to predict the unpredictable. Black swans, properly
understood, are inevitable outliers.
It is also essential to remember that...
Black Swans depend on circumstances and perspectives. By understanding our context
and vulnerabilities, we might be more able to explore other options.
Going back to the Black Swan metaphor, Europeans were in no position to know about
black swans until they encountered one. However, Australia's population had been well
aware of their existence for years!
Applying Predictive Analytics

Good practices to ensure the ethical use of predictive analytics

Purpose, infrastructure, data, and design practices


There are different trustworthy and ethical practices to guide the design and
implementation of predictive models. Here we propose a framework that focuses on four
main areas:

30

1. Purpose
Developing a vision and a plan for how the predictive model is going to be used will help
steer the direction of a predictive analytics effort.
Without such planning, predictive analytics may be used in a way that does more harm
than good for our target group, leaves out key stakeholders who should be included in the
planning process, and/or fails to identify how the success of this effort will be measured.
Consider the following factors when developing the plan:
a. The purposes of the Predictive Analytics model
The plan should include the questions the project expects to answer and the goals to
achieve. It should also explore the potential pitfalls of using existing data or collecting new
ones for intended purposes. The project should ensure that data will not be used for
discriminatory purposes.
b. The unintended consequences of Predictive Analytics
The plan should also include an analysis of any possible unintended consequences and
steps to mitigate them. For example, using Predictive Analytics could lead to removing
human judgment from decision-making. This may result in decisions typically requiring
Applying Predictive Analytics

holistic review becoming partly or wholly automated by data alone. The project should also
establish protocols to avoid underestimations, biased algorithms, or biased data.
c. The monitoring protocols for outcome measurement
The plan should lay out the measurable outcomes that the model is expecting to achieve
as a result of using Predictive Analytics. Strong monitoring protocols in place ensure
healthy data sources, identify valuable features, and check outcome quality. Otherwise,
poor input and output will lead our work in the wrong direction.
d. Costs and sustainability aspects
Predictive Analytics models can be expensive and might have 'hidden' costs, for instance,
feasibility studies. Additionally, costs are also associated with the training and re-training of
the model, while adjustments must be made periodically. It is imperative to plan and
budget the costs involved for implementation and monitoring well in advance, and to
ensure the model is sustainable.
2. Infrastructure
A supportive infrastructure ensures that the benefits of Predictive Analytics are understood
and welcomed by key stakeholders, as well as processes and other supports are put in
place to assist the data effort.
Consider the following enabling factors:
a. Strengthening communication practices 31

A data project is a complex cross-function task. Communication efforts are important with
all that might be involved and benefit from the project results. Without a clear articulation of
how predictive analytics can add value, well-devised plans may fail to receive the support
they need to succeed.
b. Creating supportive culture and climate
With new tools often come new processes, reporting structures, people, and partners who
bring new skills. Therefore, it is vital to assess institutional capacity to use Predictive
Analytics. The appropriate technology, data infrastructure, talent, services, financial
resources, and data analysis skills are essential.
c. Involving robust change management processes
A clear management mechanism will hugely benefit the organization to mitigate potential
risks from the changes. Transparent communication between data engineers, data
scientists, and software developers would save everyone effort and time in the end..
3. Data
Data is vital in predictive models. To build and use these predictive tools ethically, consider
the following aspects related to data.
a. Access the accurate data
Applying Predictive Analytics

The best possible data generate the best possible prediction. It is hard to have enough
historical data to get a full picture. For full context and perspective, we will probably need
to start incorporating external data sources. The good thing about the UN system is we
have tons of data across the organizations which could be leveraged under the data
protocol.
b. Holistic data pipeline ensures complete data
Our datasets should be comprehensive and representative of our universe. What will
happen when you move from the small sample in the developing stage to the large live
datasets in the production stage? This scale-up process and data pipeline need to be
carefully designed in the beginning.
c. Understand the limits of prediction
Even with good quality data, understanding the limits of prediction is essential to set
realistic expectations about our predictive model. Predictive models are not certainties and
should not be treated as such.
d. Ensure data is accurately interpreted
Include team members who are knowledgeable about data and can accurately interpret
predictive models derived from this information. Those analyzing the data must consider
context and historical background.
e. Guarantee data privacy 32
Communicate with data stakeholders, including how consent is obtained and how long the
information will be stored. Make them aware that their data will be used for Predictive
Analytics. Be vigilant that data are well protected so that the information does not get into
the hands of those who intend to misuse it. It is especially important to protect the data
privacy of vulnerable groups.
f. Monitor data security
Security threats occur without notice. Therefore, monitoring threats and risks should be a
regular undertaking. Data security requires security protocols that adhere to individual
privacy laws and meet industry best practices. To keep institutional data secure, involve
your entity's information technology (IT) department.
4. Model design practices
Predictive models can help determine interventions and policymaking. Therefore, they
must be, at the very least, created to reduce rather than amplify bias and be tested for
their accuracy. It is key to ensure that models and algorithms are created in collaboration
with vendors who can commit to designing them in a way that does not intentionally codify
bias and that they can be tested for veracity.
In this regard, think about the following aspects of model design:
a. Design models to produce desirable outcomes
It is crucial to address bias in predictive models, ensure the statistical significance of
predictions beyond race, ethnicity, or socioeconomic status, and avoid using algorithms
Applying Predictive Analytics

that produce discriminatory results at all costs. An algorithm should never be designed to
pigeonhole any group. As such, being involved in the design and knowing how predictive
models are created are important to ensure desirable outcomes as determined by the
vision and plan. Failing to take this approach may lead to inadvertent discrimination.
b. Test and be transparent about predictive models
Before predictive models can be used, test them for accuracy, perhaps by an external
evaluator. Unfortunately, mistakes can be made every time we use information. Thus,
predictive tools should not be used without examining their potentially adverse effects.
c. Be aware of the risks of biased algorithms
No matter how carefully interventions are designed, biased algorithms always carry risks.
As we have discussed in the previous lesson, implicit bias may be heightened with
predictive systems because analytics may serve to "confirm" bias or make implicit bias
even more invisible. Data can empower individuals to assume they no longer have to
address their implicit biases because they are being led by numbers rather than by their
beliefs, acknowledged and unacknowledged.
d. Start small and evolve
You may also want to limit the variables used in predictive models to those that can be
easily explained and work to ensure algorithms can be understood by those who will be
impacted by them.
33
e. Choose vendors wisely
To ensure models and algorithms are sound, transparent, and free from bias, it is essential
to be involved with expert vendors that have deep knowledge regarding how predictive
models and algorithms are built, even if you plan to outsource the model development.
Many vendors are transparent about their models and algorithms and allow institutions to
work collaboratively. Not all vendors, however, take this approach. Many consider their
models and algorithms proprietary, meaning institutions are not very much involved in the
design process. It is important to ensure transparency as a key criterion when choosing to
work with any vendor.
We can see how there is not a single aspect to consider but many!
The powerful applications of predictive models should not overshadow the need to make
sure predictive tools are deployed purposefully and securely; have the proper support and
infrastructure to be sustainable; are built with quality data; do not further entrench
inequitable structures; and can be tested for and produce evidence of effectiveness.
Predictive analytics are already changing how multilateral institutions such as the UN
work. Addressing and ensuring their ethical use will become more and more critical.
Applying Predictive Analytics

Lesson Summary 34

Applying Predictive Analytics

Module Workbook
Applying Predictive Analytics

IN SUMMARY:
Black Swan
• Some events are unpredictable, but unfortunately, they happen. The black
swan is a metaphor often used in predictive analytics to describe events that
we can’t predict with the information available to date.
Good practices to ensure the ethical use of predictive analytics
• Aspects to consider to ensure the ethical and trustworthy use of predictive
analytics should include: A purposeful model; Right support and
infrastructure to guarantee sustainability and use; Data security, privacy, and
quality protocols; People trained in the use and interpretation of data;
Transparency procedures and consistent ethical design practices.

• Decisions often require holistic review and sound interpretation of data that
cannot be left fully automated without human judgment.

35
Applying Predictive Analytics

Notes:

36
Applying Predictive Analytics

References

• Davenport, T. H., & Ronanki, R. (2022, November 7). Artificial Intelligence for
the Real World. Harvard Business Review. Retrieved December 13, 2022, from
https://hbr.org/2018/01/artificial-intelligence-for-the-real-world
• Deep learning will radically change the ways we interact with technology. Harvard
Business Review. (2018, July 24). Retrieved December 13, 2022, from
https://hbr.org/2017/01/deep-learning-will-radically-change-the-ways-we-interact-
with-technology
• Gianfrancesco MA, Tamang S, Yazdany J, et al. Potential Biases in Machine
Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med.
2018;178(11):1544-1547. doi:10.1001/jamainternmed.2018.3763
• International Telecommunication Union. (2021) United Nations Activities on
Artificial Intelligence (AI). Retrieved December 13 from
https://www.itu.int/hub/publication/s-gen-unact-2021/
• Knight, W. (2020, April 2). The dark secret at the heart of ai. MIT Technology
Review. Retrieved December 13, 2022, from
https://www.technologyreview.com/2017/04/11/5113/the-dark-secret-at-the-heart-of-
ai/ 37
• Mühlhoff, R. Predictive privacy: towards an applied ethics of data analytics. Ethics
Inf Technol 23, 675–690 (2021). https://doi.org/10.1007/s10676-021-09606-x
• UN Office of Information and Communication Technology. (September 2020). Alba
Chatbot Skills Overview. Retrieved December 13, 2022 from
https://oictpsgdettbotdevblob.blob.core.windows.net/en-asset-
userguide/AlbaUserGuide.pdf
Applying Predictive Analytics

38

@UNSSC

facebook.com/UNSSC

linkedin.com/company/unssc

www.unssc.org

You might also like