Module 10 Applying Predictive Analytics - Elounge
Module 10 Applying Predictive Analytics - Elounge
ANALYTICS
Module Workbook
Applying Predictive Analytics
Contents
Introduction.............................................................................................................................. 3
Module performance objectives .................................................................................... 4
Lesson 1 : Manage the predictive model deployment ............................................. 5
Machine learning model deployment .......................................................................... 6
Model testing .................................................................................................................. 10
Lesson Summary .................................................................................................................. 13
Notes: ............................................................................................................................... 15
Lesson 2 : Understanding risks ....................................................................................... 16
Common ground considerations ................................................................................ 17
The ethical dimension ................................................................................................... 19
Risks in predictive models ............................................................................................ 21
Lesson Summary .................................................................................................................. 23
2
Notes: ............................................................................................................................... 25
Lesson 3 : Limits of prediction and ethical considerations ................................ 26
Black Swans .................................................................................................................... 27
Good practices to ensure the ethical use of predictive analytics ........................... 30
Lesson Summary .................................................................................................................. 34
Notes: ............................................................................................................................... 36
References .............................................................................................................................. 37
Applying Predictive Analytics
Introduction 3
Module Workbook
Applying Predictive Analytics
Notes:
After completing the lessons, you should be able to:
• Describe the main steps and key arrangements for a successful machine learning
model deployment.
• Identify risks and apply good practices for an ethical use of predictive models.
4
Applying Predictive Analytics
Module Workbook
Applying Predictive Analytics
Definition
Model Deployment: Machine learning model deployment is the process of placing a
finished machine learning model into a live environment where it can be used for its
intended purpose.
1. Training
• Training environment setup
To begin, we must perform an analysis of the production environment, which entails
comprehending the model's forward and backward processes. This includes
identifying the platforms that provide the data, the resources required for the model
to operate effectively, and the applications that utilize the prediction results.
• Training design
To train the model, we will first choose an appropriate algorithm, configure its parameters,
and use cleaned data to train it. The algorithm must be tailored to suit the deployment
environment, and its implementation will be managed by the deployment team.
Additionally, we will conduct frequent retraining to update the model and ensure it remains
relevant over time.
Once we have trained multiple models, we select the few that meet our desired objectives
and proceed to the next stage. Often, we train several models, but only a select few are
deemed suitable to advance to the subsequent phase.
2. Validation
Having identified models that can produce favorable results during the training phase, our
next step is to validate their performance to ensure stability and reliability in producing
valuable outcomes.
During the validation process, we will retest the models using a new dataset and compare
the results against those obtained during the training phase. In addition, we will review the
methodology and data used during the training to ensure they align with the initial
requirements of the organization and users. By conducting this thorough analysis, we can
ensure that the selected models are suitable for deployment and will deliver the expected
outcomes.
What to be validated in this step: 7
3. Deployment
Applying Predictive Analytics
Three sub-steps are required when we finally deploy the model to the live process:
1. Move the model into the deployed environment, so it will access the essential
resources and the live data.
2. Integrate the model into the organization process. Users will not directly run your
coding, instead, they will use a website or software, which is where the model will
be embedded.
3. End users start using the model by activating it, accessing the data, and interpreting
the output.
4. Monitoring
After the model has been deployed, we will need to closely monitor its performance. This
will be part of the regular monitoring process of the organization, as not only the functions
of the model and effectiveness of the prediction will be monitored, but also the supporting
software and other resources.
The real-world setting is a dynamic scenario. Any issues could happen after deployment.
Thus, below four problems are the most common things we need to pay extra attention to.
o Deployed data variance:
Applying Predictive Analytics
The deployed data will not be cleaned anymore. Unlike the training and testing data, real-
time data will have lots of issues the model needs to be careful with. Otherwise, the model
will be changed by these issues.
o Data integrity changes:
Data changes over time under manual effects. The deployed data might come with new
formats, new categories, and switched orders of indicators, which don't match the initial
model anymore.
o Data drift:
With the context evolving, the current scenario could be totally different from the initial
ones, where we developed and trained the model. Data drift happens when the population
demography changes, the market crashes, or conflicts emerge, which makes the model
irrelevant to the situation now.
o Concept drift:
With the scenario switching, end users could have evolved needs. Thus, the criteria for a
successful output could change with their expectations.
The crucial part of conducting monitoring is the automation of the process. Monitoring is
routine work to evaluate the model performance. It could be easily automated with other
industrial tools to track essential metrics. You will only need to take action when alerts are 9
made on accuracy and precision.
Model testing
In the validation, we will select the model that generates the smallest error among all the
models we got after training. However, with rounds of testing going on, the model might
start learning the noise within the dataset, so the model couldn't fit into other datasets
which leads to overfitting.
We usually will carry out one more measurement to confirm the performance of the final
model with a third independent dataset, which is called model testing. It will be conducted
after we deployed the model to live data during the monitoring step.
As we deployed the model to operation, we often don't just deploy one. Alternatively, we
will simultaneously deploy and compare multiple models with similar training
performances. Thus, most of the time, we will compare more than two models, e.g. Model
A will be compared to Models B, C, D, … N. Then, an A/B test becomes an A/B/C/D/.../N
test.
11
Glossary
Below are some key terms in the experimentation. They will guide us design a proper
model testing experiment.
• Performance metric: Measures of the model quality, which completely depends on
the type of model and the implementation plan of the model.
• Test type: Options for statistical tests, e.g. One-tailed t-Test, Two-tailed t-Test, etc.
• Effect size: The difference between the two models’ performance metrics, often
normalized by standard deviation.
Applying Predictive Analytics
• Sample size: N, based on your choice of selected minimum effect size, significance
level, statistical power, and estimated sample variance.
• Statistical power (sensitivity): The probability of correctly rejecting the null
hypothesis, i.e., correctly identifying an effect when there is one. Often, the desired
power of an experiment is between 80-95%.
• Confidence Level: Concern on the cost-benefit:The probability of correctly
retaining the null hypothesis when no difference in effects exists. Often, the
confidence level is set to 95%
It is tricky to design a balanced model testing when considering the quality of the results
and the costs of implementation. The higher accuracy level requires more resources and a
longer time to deliver. Sometimes, even a slightly better model might require a huge
improvement in the computational infrastructure to support it in the long term. Does this
improvement worth the input?
12
Applying Predictive Analytics
Lesson Summary 13
Module Workbook
Applying Predictive Analytics
IN SUMMARY:
Machine learning model deployment
• Machine learning models are developed in an offline environment, they need
to be deployed online to be used with live data.
• The model deployment stage has 4 steps: training, validation, deployment,
and monitoring.
Model testing
• Model testing is the post-deployment measurement to confirm the
performance of the final model with a third independent dataset.
• A/B testing is a simple and useful method to compare multiple models with
similar training performances.
• We need to conduct cost-benefit analyses before going to a higher
accuracy level testing which requires more resources and a longer time to
deliver.
14
Applying Predictive Analytics
Notes:
15
Applying Predictive Analytics
Module Workbook
Applying Predictive Analytics
Predictive analytics works with large amounts of complex data. To deal with this
complexity, we sometimes apply the self-learning algorithm methodologies developed in
Artificial Intelligence.
Artificial intelligence and intelligent systems present areas of ethical concern to society,
such as privacy and transparency, bias and discrimination, and the role of human 18
judgment. As a matter of fact, the last one is considered the most difficult:
Applying Predictive Analytics
19
A system that predicts the wind speed to improve the performance of a wind power plant
has no ethical charge. It does have risks, but no ethical impacts as the predictive system
do not impact any person. The system can be accurate or inaccurate, but it can't harm
people. For this reason, we must consider the risks, but the ethical implications are limited
here.
A system that predicts the risk credit scoring of an individual has risks (underperforming,
costs, etc). And at the same time, it has ethical implications as it can harm people (by
denying loans),) or discriminate (if a group of people are overweighted in the credit
scoring).
For example, redlining is a discriminatory practice using predictive analytics. Services like
loans and healthcare are withheld from potential customers who reside in racial and ethnic
minorities, and low-income residents' neighborhoods, because the predictive model
classified them as "hazardous" to investment.
Applying Predictive Analytics
20
Applying Predictive Analytics
For example, in models for predicting whether individuals should be detained or released
on bail pending trial, a race- or gender-biased algorithm assessing an individual's
likelihood to commit a future offence can give more extended detention periods for those
discriminated groups.
22
Applying Predictive Analytics
Lesson Summary 23
Module Workbook
Applying Predictive Analytics
IN SUMMARY:
Common ground considerations
• Artificial intelligence and predictive analytics are intimately related and
sometimes used as synonyms. However, they are quite different.
• Artificial intelligence is the simulation of human intelligence processes by
machines, especially computer systems. Machine learning is a subset of
artificial intelligence. Many predictive models are based on machine learning
applications.
The ethical dimension
• Some ethical areas of concern include privacy and explainability risks,
biased algorithms, and unrepresentative data.
Risks in predictive models
• Risks in predictive models are concerning data privacy, biased algorithm,
and biased datasets.
• We can violate predictive privacy if sensitive information about that individual
is statistically estimated against one's will or without one's knowledge or
consent.
• Biased algorithms lead to discrimination and unfair treatment of minority
groups. 24
• Missing data, improper sample size, and misclassification or measurement
error are common reasons that lead to biased datasets.
Applying Predictive Analytics
Notes:
25
Applying Predictive Analytics
Module Workbook
Applying Predictive Analytics
Black Swans
"First, a black swan event is an outlier, as it lies outside the realm of regular expectations
because nothing in the past can convincingly point to its possibility. Second, it carries an
extreme 'impact.' Third, despite its outlier status, human nature makes us concoct
explanations for its occurrence after the fact, making it explainable and predictable."
COVID-19 pandemic
Let's think for a moment about the definition of a black swan, and events that can be
considered black swan events. To what extent can we consider the COVID-19 pandemic a
black swan event?
This has been a controversial topic of discussion. Many researchers have argued that the
COVID-19 pandemic is not a black swan event as pandemics are unfortunately not new in
global history, and it was expected with a great level of certainty that a global pandemic
would eventually occur. 27
Applying Predictive Analytics
The graph below shows the minimum number of deaths from epidemics and their duration
from 1850 to date.
28
29
30
1. Purpose
Developing a vision and a plan for how the predictive model is going to be used will help
steer the direction of a predictive analytics effort.
Without such planning, predictive analytics may be used in a way that does more harm
than good for our target group, leaves out key stakeholders who should be included in the
planning process, and/or fails to identify how the success of this effort will be measured.
Consider the following factors when developing the plan:
a. The purposes of the Predictive Analytics model
The plan should include the questions the project expects to answer and the goals to
achieve. It should also explore the potential pitfalls of using existing data or collecting new
ones for intended purposes. The project should ensure that data will not be used for
discriminatory purposes.
b. The unintended consequences of Predictive Analytics
The plan should also include an analysis of any possible unintended consequences and
steps to mitigate them. For example, using Predictive Analytics could lead to removing
human judgment from decision-making. This may result in decisions typically requiring
Applying Predictive Analytics
holistic review becoming partly or wholly automated by data alone. The project should also
establish protocols to avoid underestimations, biased algorithms, or biased data.
c. The monitoring protocols for outcome measurement
The plan should lay out the measurable outcomes that the model is expecting to achieve
as a result of using Predictive Analytics. Strong monitoring protocols in place ensure
healthy data sources, identify valuable features, and check outcome quality. Otherwise,
poor input and output will lead our work in the wrong direction.
d. Costs and sustainability aspects
Predictive Analytics models can be expensive and might have 'hidden' costs, for instance,
feasibility studies. Additionally, costs are also associated with the training and re-training of
the model, while adjustments must be made periodically. It is imperative to plan and
budget the costs involved for implementation and monitoring well in advance, and to
ensure the model is sustainable.
2. Infrastructure
A supportive infrastructure ensures that the benefits of Predictive Analytics are understood
and welcomed by key stakeholders, as well as processes and other supports are put in
place to assist the data effort.
Consider the following enabling factors:
a. Strengthening communication practices 31
A data project is a complex cross-function task. Communication efforts are important with
all that might be involved and benefit from the project results. Without a clear articulation of
how predictive analytics can add value, well-devised plans may fail to receive the support
they need to succeed.
b. Creating supportive culture and climate
With new tools often come new processes, reporting structures, people, and partners who
bring new skills. Therefore, it is vital to assess institutional capacity to use Predictive
Analytics. The appropriate technology, data infrastructure, talent, services, financial
resources, and data analysis skills are essential.
c. Involving robust change management processes
A clear management mechanism will hugely benefit the organization to mitigate potential
risks from the changes. Transparent communication between data engineers, data
scientists, and software developers would save everyone effort and time in the end..
3. Data
Data is vital in predictive models. To build and use these predictive tools ethically, consider
the following aspects related to data.
a. Access the accurate data
Applying Predictive Analytics
The best possible data generate the best possible prediction. It is hard to have enough
historical data to get a full picture. For full context and perspective, we will probably need
to start incorporating external data sources. The good thing about the UN system is we
have tons of data across the organizations which could be leveraged under the data
protocol.
b. Holistic data pipeline ensures complete data
Our datasets should be comprehensive and representative of our universe. What will
happen when you move from the small sample in the developing stage to the large live
datasets in the production stage? This scale-up process and data pipeline need to be
carefully designed in the beginning.
c. Understand the limits of prediction
Even with good quality data, understanding the limits of prediction is essential to set
realistic expectations about our predictive model. Predictive models are not certainties and
should not be treated as such.
d. Ensure data is accurately interpreted
Include team members who are knowledgeable about data and can accurately interpret
predictive models derived from this information. Those analyzing the data must consider
context and historical background.
e. Guarantee data privacy 32
Communicate with data stakeholders, including how consent is obtained and how long the
information will be stored. Make them aware that their data will be used for Predictive
Analytics. Be vigilant that data are well protected so that the information does not get into
the hands of those who intend to misuse it. It is especially important to protect the data
privacy of vulnerable groups.
f. Monitor data security
Security threats occur without notice. Therefore, monitoring threats and risks should be a
regular undertaking. Data security requires security protocols that adhere to individual
privacy laws and meet industry best practices. To keep institutional data secure, involve
your entity's information technology (IT) department.
4. Model design practices
Predictive models can help determine interventions and policymaking. Therefore, they
must be, at the very least, created to reduce rather than amplify bias and be tested for
their accuracy. It is key to ensure that models and algorithms are created in collaboration
with vendors who can commit to designing them in a way that does not intentionally codify
bias and that they can be tested for veracity.
In this regard, think about the following aspects of model design:
a. Design models to produce desirable outcomes
It is crucial to address bias in predictive models, ensure the statistical significance of
predictions beyond race, ethnicity, or socioeconomic status, and avoid using algorithms
Applying Predictive Analytics
that produce discriminatory results at all costs. An algorithm should never be designed to
pigeonhole any group. As such, being involved in the design and knowing how predictive
models are created are important to ensure desirable outcomes as determined by the
vision and plan. Failing to take this approach may lead to inadvertent discrimination.
b. Test and be transparent about predictive models
Before predictive models can be used, test them for accuracy, perhaps by an external
evaluator. Unfortunately, mistakes can be made every time we use information. Thus,
predictive tools should not be used without examining their potentially adverse effects.
c. Be aware of the risks of biased algorithms
No matter how carefully interventions are designed, biased algorithms always carry risks.
As we have discussed in the previous lesson, implicit bias may be heightened with
predictive systems because analytics may serve to "confirm" bias or make implicit bias
even more invisible. Data can empower individuals to assume they no longer have to
address their implicit biases because they are being led by numbers rather than by their
beliefs, acknowledged and unacknowledged.
d. Start small and evolve
You may also want to limit the variables used in predictive models to those that can be
easily explained and work to ensure algorithms can be understood by those who will be
impacted by them.
33
e. Choose vendors wisely
To ensure models and algorithms are sound, transparent, and free from bias, it is essential
to be involved with expert vendors that have deep knowledge regarding how predictive
models and algorithms are built, even if you plan to outsource the model development.
Many vendors are transparent about their models and algorithms and allow institutions to
work collaboratively. Not all vendors, however, take this approach. Many consider their
models and algorithms proprietary, meaning institutions are not very much involved in the
design process. It is important to ensure transparency as a key criterion when choosing to
work with any vendor.
We can see how there is not a single aspect to consider but many!
The powerful applications of predictive models should not overshadow the need to make
sure predictive tools are deployed purposefully and securely; have the proper support and
infrastructure to be sustainable; are built with quality data; do not further entrench
inequitable structures; and can be tested for and produce evidence of effectiveness.
Predictive analytics are already changing how multilateral institutions such as the UN
work. Addressing and ensuring their ethical use will become more and more critical.
Applying Predictive Analytics
Lesson Summary 34
Module Workbook
Applying Predictive Analytics
IN SUMMARY:
Black Swan
• Some events are unpredictable, but unfortunately, they happen. The black
swan is a metaphor often used in predictive analytics to describe events that
we can’t predict with the information available to date.
Good practices to ensure the ethical use of predictive analytics
• Aspects to consider to ensure the ethical and trustworthy use of predictive
analytics should include: A purposeful model; Right support and
infrastructure to guarantee sustainability and use; Data security, privacy, and
quality protocols; People trained in the use and interpretation of data;
Transparency procedures and consistent ethical design practices.
• Decisions often require holistic review and sound interpretation of data that
cannot be left fully automated without human judgment.
35
Applying Predictive Analytics
Notes:
36
Applying Predictive Analytics
References
• Davenport, T. H., & Ronanki, R. (2022, November 7). Artificial Intelligence for
the Real World. Harvard Business Review. Retrieved December 13, 2022, from
https://hbr.org/2018/01/artificial-intelligence-for-the-real-world
• Deep learning will radically change the ways we interact with technology. Harvard
Business Review. (2018, July 24). Retrieved December 13, 2022, from
https://hbr.org/2017/01/deep-learning-will-radically-change-the-ways-we-interact-
with-technology
• Gianfrancesco MA, Tamang S, Yazdany J, et al. Potential Biases in Machine
Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med.
2018;178(11):1544-1547. doi:10.1001/jamainternmed.2018.3763
• International Telecommunication Union. (2021) United Nations Activities on
Artificial Intelligence (AI). Retrieved December 13 from
https://www.itu.int/hub/publication/s-gen-unact-2021/
• Knight, W. (2020, April 2). The dark secret at the heart of ai. MIT Technology
Review. Retrieved December 13, 2022, from
https://www.technologyreview.com/2017/04/11/5113/the-dark-secret-at-the-heart-of-
ai/ 37
• Mühlhoff, R. Predictive privacy: towards an applied ethics of data analytics. Ethics
Inf Technol 23, 675–690 (2021). https://doi.org/10.1007/s10676-021-09606-x
• UN Office of Information and Communication Technology. (September 2020). Alba
Chatbot Skills Overview. Retrieved December 13, 2022 from
https://oictpsgdettbotdevblob.blob.core.windows.net/en-asset-
userguide/AlbaUserGuide.pdf
Applying Predictive Analytics
38
@UNSSC
facebook.com/UNSSC
linkedin.com/company/unssc
www.unssc.org