Credit Risk - Predictive Modelling
Credit Risk - Predictive Modelling
Predictive Modelling
4EK614
09 Feb 2022
With You Today
Our services
AQR Regulatory
Our projects
Model Development Data Analysis Stress Testing
Model Validation Data Mining Impairment
Study materials
Course Structure
1. PowerPoint slides, provided after the course
Day 1: Credit Risk, Market Risk, Climate Risk
Day 2: Predictive Modelling in Credit Risk
Day 3: Assignment + Seminar Data Walkthrough Prerequisites
Course Assessment
2. Outputs – PPT presentation or PDF, summarizing the abovementioned outputs, and scripts used.
3. Output presentation – Short (10-15 minute) presentation about results of this assessment.
Goal Resources
5. Lunch 11:50-13:00
Assets Liabilities
Cash Deposits from customers
Deposits at central bank Loans from other banks
Balance
Insurance Other
▪ Probability of Default: The likelihood the borrower will default on its obligation
PD either over the life of the obligation.
▪ Loss Given Default: Loss that lender would incur in the event of borrower’s
default. It is the exposure that cannot be recovered through bankruptcy
LGD proceedings, collateral recovery or some other form of settlement. Usually
expressed as a percentage of exposure at default.
▪ Exposure at Default: The exposure that the borrower would have at default.
EAD Takes into account both on-balance sheet (capital) and off-balance sheet
(unused lines, derivatives or repo transactions) exposures and payment
schedule.
► Risk management function reshaping roadmap ► Diagnostics on the effectiveness & efficiency of the
collections process
► Credit risk strategy and linkage to business strategy
► Development of a collections strategy, strategic and
► Risk appetite framework and statements tactical (cost-benefit) analysis of available
► Credit risk processes and segregation of duties outsourcing options
► Business model request ► Model design / validation / ► Design of impairment ► LGD estimates design and
specification internal audit reviews methodology in line with IFRS validation
► Application scorecard design ► Regulatory compliance ► Effective interest rate and ► LGD (scoring) models design
and validation ► PD estimation recognitions of fees and and validation
► Design and review of the Model usage for business commissions
► ► LGD data warehouse
application processes purposes ► Back-testing analyses specification
► Support with application ► Proprietary IT tools ► Collateral valuation scenarios
workflow technology
• Underwriting (UW) process is the processing of credit application and making a decision about
the final approval or decline of the application.
• Generally the UW process can end up in several different states: approval, decline, cancelation
from client side, non-eligibility (for example the applicant is not meeting minimum age criteria,
etc.)
Entrepreneurs
Private individuals Small business Corporates
Freelancers
• Usually automated process • Usually automated process with • Partially automated process, but • Typically manual assessment on
• Scoring applications in order to assess possibly manual inputs mostly manual assessment yearly basis (rating process using
riskiness of newly issued loans/credits • Scoring applications in order to • Scoring applications for automated financial, qualitative and behavioral
• Scoring client behavior on monthly assess riskiness of newly issued products scoring)
basis on credit and deposit products loans/credits • Process for manual yearly rating • Sometimes not sufficient data to use
• Large data sets → statistical approach • Scoring client behavior on (typically financial scoring, statistical approach – especially in
• Need to verify income and over- monthly basis qualitative scoring and behavioral case of project financing
indebtedness • Large data sets → statistical scoring) • Industry dependent and seasonal
• Credit registers (BRKI, NRKI, Solus) approach • Sufficient data sets for statistical • Credit registers (CRÚ, Cribis,
• No need to verify income and approach Bisnode, etc.)
over-indebtedness • Credit registers (CRÚ, Cribis,
• Credit registers Bisnode, etc.)
• Financing housing needs • Purpose or non-purpose • Credit limit that can be utilized, but • Typical financing for corporate and
• Subject to consumer protection • Subject to consumer protection it is not a must small business segments, but also for
• Requires real estate collateral and • Can have collaterals or guarantors, • Client can flexibly utilize whatever entrepreneurs
insurance but usually it doesn’t part of the limit he needs to • Processed manually
• Large financed amount • Automated, easy and fast UW • Grace period • Very high financed amount
• Typically longer maturity process • High interest rates • Based on business and financial plan
• More thorough and detailed UW • Higher interest rates • Typically no collaterals • Usually with collaterals and
process • Co-applicants possible, but not • Lower financed amount guarantees
• Partially manual assessment that frequent as for mortgages • Maturity is not specified (contract
• Loan to value condition • Medium financed amount terminates on request when fully
• Lower interest rates • Medium maturity repaid)
• Fixation periods • Medium risk • High risk
• Co-applicants possible • Credit cards come with plastic card
• Second step is the assessment of client eligibility for the given product and channel
• Is the client below prescribed age when applying for a long term product such as mortgage?
• Does the client have eligible income for the particular product and process?
• Does the client have all prescribed documents (valid ID card and valid second ID document)?
• Is the collateral for the issued loan eligible and sufficient (LTV threshold)?
• There are several laws and directives that affect the underwriting process
Consumer needs to be protected from dishonest and
Law on consumer loan malicious practices including intentional over-
indebting, but also non-intentional over-indebting –
the responsibility of not over-indebting the client is
Consumer protection now on the borrower
Mortgage credit directive (MCD) Market and economy needs to be protected against
adverse economic impacts originating in the
financial system
Consumer credit directive (CCD)
Society needs to be protected against criminal acts
EBA guidelines and terrorism
•
• Client authentication •
Expiry date check – ID not expired
Check on validity in MPSV database
• Anti-fraud module • Issue date consistency check (based on linear regression below)
• Check on issue date – not week-end or public holiday
• Check on address at MěÚ or OÚ
• Control on ID manipulation (color histogram, fonts)
• Check on consistency of bar-code and ID number
• Consistency of sex and birth number (third digit)
• Birth date divisible by 11 after 1953
• Overall control number check
• Expiry date control number check
• Birth date control number check
• Frequency checks in on-line underwriting process (applications are tracked with respect to different identificators and
their combinations
• Device fingerprint (publicly available libraries)
Hardware: CPU architecture & device memory, GPU canvas, Audio stack
Software: User agent, OS version,
Storage: local storage, session storage
Display: color depth, screen size
Browser customizations: fonts, plug-ins, codecs, mime types, time zone, user language,
Miscellaneous: floating point calculations, callbacks / objects to DOM
• Phone number
• Account number
• ID card number
• E-mail address
• Birth number
• IP address
• Geolocation (via IP address and Google API) – can be used for anti-fraud as well as for scoring
• Individuals / Entrepreneurs:
• BRKI – Banking Register of Client Information
• Information about applications and loan contracts shared among the banks operating in Czech Republic. Generally only banks
can access it.
• Information is stored in BRKI during the existence of credit relationship and 4 years after it terminates. If the contract with the
bank has not been signed is this information in BRKI stored for one year.
• NRKI – Non-Banking Register of Client Information
• Information about applications and loan contracts shared among non-bank credit providers. Generally only those that
participate on the sharing can access it.
• SOLUS
• Information about applications and loan contracts shared among participating credit providers and some other companies.
Generally only those that participate on the sharing can access it. It contains both – register of negative as well as register of
positive information.
• In SOLUS participate also TELCO companies and utility providers.
• Companies / Entrepreneurs:
• CRÚ – Kreditní Registr Úvěrů
• Information about loan contracts of entrepreneurs and companies – compulsory register operated by Czech National Bank.
2017 2021
• Scoring is one of the tools to measure the creditworthiness of a business or person. It is the
result of scoring, where different scales are given different weight. This procedure results in
a credit score
Physical location,
work location,
Transactional
household location
profile,
behavior on
deposits
DATA
Text analytics,
Credit registers,
friends, posts,
social security,
activity, job
health insurance,
history
government
Relatives,
Device price, age
transactional networks
(suppliers, cost
and attractivity,
structure) level of user
experience
predictors target
i=1 i=2 i=3 i=4 …..
▪ Probability of Default: The likelihood the borrower will default on its obligation
PD either over the life of the obligation.
▪ Loss Given Default: Loss that lender would incur in the event of borrower’s
default. It is the exposure that cannot be recovered through bankruptcy
LGD proceedings, collateral recovery or some other form of settlement. Usually
expressed as a percentage of exposure at default.
▪ Exposure at Default: The exposure that the borrower would have at default.
EAD Takes into account both on-balance sheet (capital) and off-balance sheet
(unused lines, derivatives or repo transactions) exposures and payment
schedule.
𝑖 𝑜 𝑎𝑡𝑒 𝑎 𝑒𝑎 𝑖 𝑎𝑡𝑖𝑜𝑛
Predictors' values as at DD-12M 𝑒𝑐𝑜 𝑒 𝑖𝑒𝑠
Exposure at default
Collateral realization
CF 1
CF 2
CF 3
CF 4
DD-12M DD-9M DD-6M DD-3M DD DD+3M DD+6M DD+9M DD+12M DD+15M DD+18M DD+21M
Arbitrarily chosen
end of recovery
process
Prediction of the LGD Discounting
Page 28
LGD models
• “U-shape”
• It does not make sense to use average LGD = 45% for these clients
• Real LGD is lower then 10% for the best 1/3 of the clients and higher then 90% for the worst 1/3 of the
clients
Page 29
Predictive Modelling - Goal
predictors target
i=1 i=2 i=3 i=4 …..
• 1) Data exclusions
• 2) Missing values analysis
• 3) Outlier treatment
• 4) Variable transformation (feature engineering)
• 5) Univariate analysis
• 6) Correlation analysis
• 7) Modelling
• Selection of shortlist of variables
• Estimation of coefficients based
• 1) Data exclusions
• 2) Missing values analysis (> 50%?)
• 3) Outlier treatment (< 5th Q/> 95th Q?)
• 4) Variable transformation (feature engineering) (Binning)
• 5) Univariate analysis (GINI below .2?)
• 6) Correlation analysis (Spearman >.5?)
• 7) Modelling
• Selection of shortlist of variables
• Estimation of coefficients based
𝑓 𝑥Ԧ ≔ 𝛼 𝛽𝑗 𝑥𝑗
𝑗 Adam John Jane
• We can choose other functions, but market standard is to use the logit link function
• Using linear function is not proper as it can give estimates above 1 or below 0, which is not
convenient for estimating probability of default
• The reason for choosing logit function instead of others is mainly interpretational – the log-
odds ratio defined below is a linear combination of the predictors
𝑃𝐷
𝐿𝑜𝑔 − 𝑜𝑑𝑑𝑠 𝑎𝑡𝑖𝑜 ln 𝑓 − 𝑃𝐷
1 − 𝑃𝐷
• By central limit theorem under very general conditions the log-odds ratio distribution
converges in distribution to a normal distribution
• Let’s say we have processed our data (deduplication, formatting, primary keys, consistency
checks…)
• We could take advantage of models with some sort of elimination
• E.g. – Lasso regression
• Least absolute shrinkage and selection operator
• Performs both variable selection and regularization
► Are powerful, but can be easily overfitted and can have high ► Are based on developing many models on random subsamples or
impact to reject inference (should be used as challengers) with different predictors and putting them together by ensemble
► Support vector machines (SVM) rule (random forests, etc.)
► Neural networks
PROS CONS PROS CONS
► Higher prediction power than ► Overfitting ► Higher prediction power than ► Overfitting
other methods ► Not interpretable standard linear methods ► Not interpretable
► Not deterministic optimization ► Sometimes higher stability ► Not sufficient track record
• We found that the predictive power of the logistic regression model and more advanced
approaches is in the same league
Cumulative % of defaults
reduced by the loss of information from categorization.
60 %
► Hence, the advantage of Random Forests to cover also
nonlinearity in the model is only of minor importance. 40 %
Log. regression
► The features for the logistic model were selected by 20 % Random Forest
stepwise regression. Perfect
Random
0%
► We further used a Random Forest implementation in VBA
0% 20 % 40 % 60 % 80 % 100 %
in order to validate the result which we obtained using Cumulative % of all contracts
the H20 algorithm in R.
• Why binning? solves leverage points, solves informative missings, solves non-
numerical (either ordinal or multinomial) variables, assesses robustness
• Why WoE transformation? normalizes predictors values, enables easy interpretation
(under reasonable conditions always attains negative and
positive values, zero value represents portfolio default rate)
• PSI (Population Stability Index) is a measure of difference between two discrete distributions
• It is typically used in order to assess representativity – i.e. assess whether distribution of a
binned variable differs in two different data samples which are typically from two different
time periods (threshold of 0.2 is frequently used)
𝐴𝑐𝑡𝑢𝑎 %𝑖
𝑃𝑆𝐼 𝐴𝑐𝑡𝑢𝑎 %𝑖 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑%𝑖 ∗ ln
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑%𝑖
𝑖
where n is number of bins