0% found this document useful (0 votes)

96 views12 pages

Alternative Data for Financial Insights

The document discusses alternative data analytics and describes modules for harnessing alternative data including a data mart, feature store, ML models, and use cases. It provides examples of alternative data sources like mobile, telecom, and social media data. It also describes how the feature store can be used to accelerate feature engineering for predictive modeling. Additionally, it provides examples of how natural language processing can be used to extract features from text data like SMS messages to generate customer insights and features for decisioning.

Uploaded by

Brijesh Kumar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views12 pages

Alternative Data for Financial Insights

Uploaded by

Brijesh Kumar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

2 Alternative Data Analytics

Click to add text

Alternative Data Components
Modules for harnessing the power of Alternate Data

DataMart Feature Store ML Models Use Cases

 Mobile Device  Seamless transformation of raw  ML Algorithms  Customer Profiling and

 Telecom data to Features, to be used for  Model Landscape Segmentation
 E-Commerce predictive modelling  Model Development  Credit Scoring
 Utility and Payments (POS)  Model Documentation  Income Estimation
 Social Media  Model Validation  Pricing
 E-Mail  Model Deployment  Propensity
 Insurance  Independent Review
 Others: Travel, Rent, Web, Tax,  Policy Framework
Government Records,
Psychometrics etc.
 Bank Statement ***
 Alternative Lending Products
Payment data
Leverage our AGGREGATOR Leverage our FEATURE STORE We use Advanced Machine Leverage our expertise for multiple
DATAMART to accelerate data to accelerate Feature Engineering Learning algorithms to build use cases to get a 360 degree view
architecture and storage for building predictive models and Explainable predictive models for of a customer relationship
decision analytics Financial Institutions

*** Physical Copy of Bank Statement has been used for manual underwriting in consumer lending for long. However, the information typically does not flow as a
feature in a credit scoring engine. In Digital Lending paradigm, bank statement are being digitized and its information is being used for credit scoring
Alternative Data Feature Store
Automated Feature Engineering

Feature Primitives

Feature Synthesis

Raw Data Feature Classification Feature Store Predictive Model

Pattern Matching

Automated Feature Engineering

Layer

Expert Judgment

Raw data points are transformed to features using Feature Synthesis (applying library of transformations to raw data) and Feature Mining using NLP (e.g. extraction of
features from Text data such as SMS, Email), with an overlay of expert judgement.
Illustrative Feature Mining from SMS Data using NLP
Automated Feature Mining

Data SMS Tagging Data Insights Feature Engg. Decisioning

 SMS classification to  Rules to extract  Roll-up of individual SMS Scoring Engine

standard L1 and L2 information from each level data at customer level Customer Risk
(Id / pool) Score
SMS1 categories SMS such as ID, Amount, to generate features for
Customer1 0.99
Transaction Type, Date etc. model training, such as:
Customer2 0.80
 L1 such as Savings, Customer 3 0.50
SMS2 Current, Debit Card, Credit • Monthly Income
Card, E-Wallets etc. • Total Loans O/s Customer4 0.25

• Total EMI
 L2 such as Savings > • Expected Monthly Spend
SMS3
Salary, Spend, Balance, and Savings
Investment, Loan / EMI • Delinquency pattern
related, Account Info
SMS4
Process Process
 NLP based classification Process  Feature engineering by data
SMS5 (SMS embeddings using  Pattern matching based science team
neural networks) data extraction rules
Feature Mining: Bank Statement with Text Recognition and NLP
Aptivaa’s Bank Statement API supports English and Arabic Bank Statement
Customer Score

Feature Generation and

Pattern Recognition
AutoScoring

 Usage of Computer Vision and NLP

Peer classification
 Transaction comparison as per Text
algorithms for scanning & digitization  Identification of the language available  Key insights generated around Income
in the statement and translation to English Patterns/Classification Rules into standard pattern, Customer behavior and
 Custom Neural Network Models for Credit
transaction typesAnalysis Psychographic Segmentation and further,
English and Arabic  Identification of Text
 Auto-summary generation using various metrics generated for Risk Scoring
 Support for both languages in the same Patterns/Classification Rules in a master
table (e.g. transaction description customizable, user-defined metrics  Feature generation (for adding to
sentence as well
containing ‘Salary’/’Payroll’ are of type exposed on user interface providing full Application Scorecard and creating
 Easily trainable for specific fonts types Salary control of analysis to user* internal Feature Store)
and sizes  Pivoting by different transaction types and
 Minimizes data errors through present  Auto Scoring (automated scorecard,
validation rules and users’ validation as other dimensions (such as Time period, provided historical performance data)
well Debit/Credit etc.)
 Final reports analysis is available in
both PDF as well as in smart HTML
formats

Digitization of the input Transaction Classification

statement and Analysis

Income Estimation

Spend Analytics

Fixed Obligations
Alternative Data Modelling
Explainable Machine Learning for superior predictive power with full model transparency

Bin 1 Bin 2 Bin 3 Bin 4

XgBoost
Feature 1
Bin 1 Bin 2 Bin 3 Bin 4
Feature 2
Feature 1 Explainable ML
Bin 1 Bin 2 Bin 3 Bin 4
Feature 3
Feature Store

Feature 2 Bin 1 Bin 2 Bin 3 Bin 4

Feature 4
… Bin 1 Bin 2 Bin 3 Bin 4
Feature 5
Feature M Bin 1 Bin 2 Bin 3 Bin 4
…
Bin 1 Bin 2 Bin 3 Bin 4
Feature N Neural Net

Important Feature Predictive Model

ML Algorithms
Features Discretization

Non-linear Machine Learning Models are used for feature selection. Discretization and Transformed (such as WoE transformation) Features are passed as an input to a Linear
Algorithm or XgBoost (with Monotonic Constraints) to build fully-explainable predictive models
Alternative Data Model Landscape for different customer segments
Illustrative Model Landscape
Approach 1 Approach 2

Step1 Step1

Alternate + Traditional Data Model

Alternate Data Model
for Bureau Hit Segment
for all customers

Step2 Step2

Alternate +
Traditional Data for some segments Alternate Data Model
for No Hit Segment

Some Segments (e.g. Medium Risk Customers) are Combined Model is used for Hit Segment and
rescored using a Combined Data Model (for Bureau Standalone Alternate Data Model is used for No Hit
Hit cases only) Segment

The final approach is selected on basis of product (ticket size, loan tenor), data cost (bureau pull, alternate data cost) and marginal contribution of a source of data to predictive power
Combining Alternative Data with Traditional Data
Prevalent methodologies to combine alternative data with traditional data

Approaches to combine Alternative and Traditional Data

Traditional Data Alternative Data

Features Features

Single Model trained on combined Alternative Model Score added Traditional Model Score added Two independent models are
dataset, with features from both sources as a feature to traditional data as a feature to alternative data trained, and a matrix of scores
for model training for model training from both models is used for
decisioning
Illustrative Alternative Data Use Case
Credit Scoring using Telco Data

Call Location
User Info
Records Data

Internet Top-Ups
VAS Data Demograp Income Spend
Usage Data
hics Related Related

Daily Postpaid
SMS Data Usage Social Employme
Balance Payment
Duration Network nt

Mobile Device
Apps Data
Wallet Txn Info

Data Category Feature Category ML Algorithms Scoring Engine

Illustrative Alternative Data Use Case
Credit Scoring using Device Data

XgBoost

Call Location Demograp Income Spend

SMS Data
Records Data hics Related Related

Contacts Device Fixed Social

Apps Info Assets
Info Info Obligation Network

Data Category Feature Category ML Algorithms Scoring Engine

Business Benefit of Analytics
Improved ROA

Use of predictive models instead of heuristic/rule-based models can significantly improve profitability, business volume and ROA

1. For instance, for a default prediction model, an improvement of Gini coefficient from 40% to 50% 2. This would result in either higher business
would result in Lower Default Rate for same approval rate (reduction to 1.3% DR from 3.0% DR volumes at same delinquency rates; or lower
at same score cut-off for the ‘illustrative portfolio’) or Higher Approval Rate for same default rate delinquency rates at same business volume. In
(improvement in Approval Rate from 72.7% to 89.1% at ~3% DR for the ‘illustrative portfolio’). either case, ROA would improve significantly.

Score Cut-Off Band Applications Defaults Gini = 40% Gini = 50%

DR for Approved Cases Approval Rate ROA DR for Approved Cases Approval Rate ROA

1 10 8 5.7% 98.2% 0.1% 5.6% 98.2% 0.2%

2 20 6 4.8% 94.5% 0.6% 4.2% 94.5% 0.9%
3 30 5 4.1% 89.1% 1.0% 2.9% 89.1% 1.6%
4 40 4 3.6% 81.8% 1.2% 1.8% 81.8% 2.1%
5 50 4 3.0% 72.7% 1.5% 1.3% 72.7% 2.4%
6 60 3 2.6% 61.8% 1.7% 0.9% 61.8% 2.6%
7 70 3 2.2% 49.1% 1.9% 0.7% 49.1% 2.6%
8 80 2 2.1% 34.5% 1.9% 0.5% 34.5% 2.7%
9 90 2 2.0% 18.2% 2.0% 0.0% 18.2% 3.0%
10 100 2 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Challenges in using Alternative Data
Not all data is equal

1 Compliance with GDPR guidelines for expats 2 Data sparsity (incomplete datasets)

3 4 Unstructured formats (e.g. SMS data), not suitable for saving in

Data Integration challenges (e.g. customers will not
RDBMS
have a common ID across data sources)

5 Vendor Risk (e.g. financial strength of third-party data 6 Data Quality and Veracity
providers)

7 Commercial Implications (Cost vs. Benefit) 8 Different predictive power for different data sources, so cannot be used
with performance assessment

Persian Darbar Byculla Sales Report
No ratings yet
Persian Darbar Byculla Sales Report
16 pages
Late Registered Students For Job Utsav
No ratings yet
Late Registered Students For Job Utsav
53 pages
Test Selects - TGI Kolkata Sept9 - 2016
No ratings yet
Test Selects - TGI Kolkata Sept9 - 2016
88 pages
Corim - Risk & Crisis Corporate Campaign (Full)
No ratings yet
Corim - Risk & Crisis Corporate Campaign (Full)
40 pages
Top 111 AdNetworks in India by IZooto
No ratings yet
Top 111 AdNetworks in India by IZooto
15 pages
MXWD As of Jun 28 20221
No ratings yet
MXWD As of Jun 28 20221
124 pages
Data Versus Metrics
No ratings yet
Data Versus Metrics
9 pages
Registration Details
No ratings yet
Registration Details
14 pages
Regional Satisfaction Trends
No ratings yet
Regional Satisfaction Trends
363 pages
Mumbai - PPN List 18092018
No ratings yet
Mumbai - PPN List 18092018
30 pages
Ckyc 20 Cases - Tss
No ratings yet
Ckyc 20 Cases - Tss
25 pages
Mohali Chat
No ratings yet
Mohali Chat
55 pages
Untitled
No ratings yet
Untitled
200 pages
Inc42's Q4 2022 Fintech Report
No ratings yet
Inc42's Q4 2022 Fintech Report
60 pages
Final New Profile
No ratings yet
Final New Profile
26 pages
Customer Mapping-Dec21
No ratings yet
Customer Mapping-Dec21
3,726 pages
Sampel For Ty Bcom 2021
No ratings yet
Sampel For Ty Bcom 2021
18 pages
Paper 105
No ratings yet
Paper 105
6 pages
Deck MaXight All 2020 v3 Reduced
No ratings yet
Deck MaXight All 2020 v3 Reduced
32 pages
Service Sector
No ratings yet
Service Sector
273 pages
New DSR
No ratings yet
New DSR
53 pages
MBA DSA Applicants List 2020
No ratings yet
MBA DSA Applicants List 2020
9 pages
DM2TR
No ratings yet
DM2TR
1,055 pages
SBIReg
No ratings yet
SBIReg
50 pages
Process Acct Cust Name
No ratings yet
Process Acct Cust Name
34 pages
PlacementBrochure201517 PDF
No ratings yet
PlacementBrochure201517 PDF
100 pages
South Korea List - Oct '23
No ratings yet
South Korea List - Oct '23
165 pages
IOCL Data 575
No ratings yet
IOCL Data 575
52 pages
HBTUDetails of Placed Students in Academic Session 2022-23
No ratings yet
HBTUDetails of Placed Students in Academic Session 2022-23
16 pages
List
No ratings yet
List
195 pages
Sno Ugatrollno Enrollno Name Fname
No ratings yet
Sno Ugatrollno Enrollno Name Fname
28 pages
Iacovone Exploring Data
No ratings yet
Iacovone Exploring Data
23 pages
Shop Deck
No ratings yet
Shop Deck
11 pages
202203090536139211434COE19 20VolII
No ratings yet
202203090536139211434COE19 20VolII
520 pages
Banking Export Data
No ratings yet
Banking Export Data
104 pages
Kanpur12th Pass Students Data .....
No ratings yet
Kanpur12th Pass Students Data .....
50 pages
IEC Plan, Social Communication, Awareness - Svamitva
No ratings yet
IEC Plan, Social Communication, Awareness - Svamitva
14 pages
Nifty 50 Companies
No ratings yet
Nifty 50 Companies
3 pages
SEEPZ SEZ RCMC List
No ratings yet
SEEPZ SEZ RCMC List
1,245 pages
Sun TV - Satellite Channel - Live - Online - Ad Agent
No ratings yet
Sun TV - Satellite Channel - Live - Online - Ad Agent
26 pages
Commerce Course Overview
No ratings yet
Commerce Course Overview
38 pages
Doctor List
No ratings yet
Doctor List
4 pages
3 Google XRay LinkedIn 2020 748893069
100% (1)
3 Google XRay LinkedIn 2020 748893069
12 pages
Naukri KalaiyarasiP (2y 0m)
No ratings yet
Naukri KalaiyarasiP (2y 0m)
5 pages
Digital Marketing Expert Profile
No ratings yet
Digital Marketing Expert Profile
4 pages
MBBS/BDS Entrance Results 2011
100% (1)
MBBS/BDS Entrance Results 2011
220 pages
2021043010375552559
No ratings yet
2021043010375552559
149 pages
Heritage Survey
No ratings yet
Heritage Survey
168 pages
Dept/UBSC/Institution Wise List of Students
No ratings yet
Dept/UBSC/Institution Wise List of Students
3 pages
Blacklisted Employers
No ratings yet
Blacklisted Employers
2 pages
MH-DL Technical - Presentation Modified
No ratings yet
MH-DL Technical - Presentation Modified
27 pages
TopPicks Sharekhan 100211
No ratings yet
TopPicks Sharekhan 100211
66 pages
Corporate Relations & Business Analyst Career
No ratings yet
Corporate Relations & Business Analyst Career
2 pages
Mysheet: Serial Number Resume Id Postal Address Telephone No. Mobile No. Date of Birth Email Name of The Candidate
No ratings yet
Mysheet: Serial Number Resume Id Postal Address Telephone No. Mobile No. Date of Birth Email Name of The Candidate
14 pages
Male Pre Engineering
0% (1)
Male Pre Engineering
2 pages
Aurangabad Unit Codes List
No ratings yet
Aurangabad Unit Codes List
12 pages
Isha Resume
No ratings yet
Isha Resume
2 pages
Dis Plant Cust Num. Customer. Material Code Sample
No ratings yet
Dis Plant Cust Num. Customer. Material Code Sample
720 pages
Presentation - Women Micro Bank
No ratings yet
Presentation - Women Micro Bank
16 pages
Lending Analytics: Scorecard and Portfolio Analytics: Proposal On
No ratings yet
Lending Analytics: Scorecard and Portfolio Analytics: Proposal On
32 pages
Community Engagement for Health
No ratings yet
Community Engagement for Health
16 pages
A Beginner
No ratings yet
A Beginner
11 pages
Screw Forces
No ratings yet
Screw Forces
10 pages
1 - MT 207 - Introduction To Bacteriology
No ratings yet
1 - MT 207 - Introduction To Bacteriology
25 pages
Multiple Regression Edit - Removed
No ratings yet
Multiple Regression Edit - Removed
14 pages
Olt Qe 0-0-1
No ratings yet
Olt Qe 0-0-1
35 pages
Service Bulletin: AB Volvo Penta
No ratings yet
Service Bulletin: AB Volvo Penta
3 pages
JEE Main 2024 Magnetic Properties Questions
No ratings yet
JEE Main 2024 Magnetic Properties Questions
9 pages
Fracturing Engineering Manual-Data FRAC Service
100% (6)
Fracturing Engineering Manual-Data FRAC Service
81 pages
(A) Devoutly (B) Serenely (C) Hysterically (D) Forcefully
No ratings yet
(A) Devoutly (B) Serenely (C) Hysterically (D) Forcefully
18 pages
Rivers Pt4 - River Features
No ratings yet
Rivers Pt4 - River Features
4 pages
About CPCL
No ratings yet
About CPCL
64 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Final
No ratings yet
Final
22 pages
Chapter14 Ans
No ratings yet
Chapter14 Ans
4 pages
Higher Unit 01a Check in Test - Calculations, Checking, Rounding
No ratings yet
Higher Unit 01a Check in Test - Calculations, Checking, Rounding
3 pages
Christian Reflection Insights
No ratings yet
Christian Reflection Insights
11 pages
TechRiskCompliancePro Handbook 202405.2.4
No ratings yet
TechRiskCompliancePro Handbook 202405.2.4
37 pages
Guide EU RoHS Exemption List PC GD 200625
No ratings yet
Guide EU RoHS Exemption List PC GD 200625
32 pages
Life Skills Tut
No ratings yet
Life Skills Tut
2 pages
Ballistics: Theory and Design of Guns and Ammunition, Third Edition Donald E. Carlucci Online PDF
No ratings yet
Ballistics: Theory and Design of Guns and Ammunition, Third Edition Donald E. Carlucci Online PDF
73 pages
Data Based Questions Bio
No ratings yet
Data Based Questions Bio
5 pages
Automated Glaucoma Detection Using Support Vector Machine Classification Method
No ratings yet
Automated Glaucoma Detection Using Support Vector Machine Classification Method
13 pages
Graeber - Dancing With Corpses Reconsidered - An Interpretation of 'Famadihana' (In A
No ratings yet
Graeber - Dancing With Corpses Reconsidered - An Interpretation of 'Famadihana' (In A
22 pages
Project English On Importance of Communication Skills in Our Life
No ratings yet
Project English On Importance of Communication Skills in Our Life
7 pages
Death Society and Human Experience 12th Ed Kastenbaum Ebook and TestBank Bundle Test Bank Available Instantly
0% (1)
Death Society and Human Experience 12th Ed Kastenbaum Ebook and TestBank Bundle Test Bank Available Instantly
410 pages
Standard 44 Standards and Acceptance Checklist Well Operations
100% (1)
Standard 44 Standards and Acceptance Checklist Well Operations
25 pages
K 5 Science Lesson Plan
No ratings yet
K 5 Science Lesson Plan
2 pages
Chapter 8 ARIMA Models: 8.1 Stationarity and Differencing
100% (1)
Chapter 8 ARIMA Models: 8.1 Stationarity and Differencing
46 pages
SK700 Prospekt
No ratings yet
SK700 Prospekt
28 pages

Alternative Data for Financial Insights

Uploaded by

Alternative Data for Financial Insights

Uploaded by

2 Alternative Data Analytics

Click to add text

DataMart Feature Store ML Models Use Cases

 Mobile Device  Seamless transformation of raw  ML Algorithms  Customer Profiling and

Raw Data Feature Classification Feature Store Predictive Model

Automated Feature Engineering

Data SMS Tagging Data Insights Feature Engg. Decisioning

 SMS classification to  Rules to extract  Roll-up of individual SMS Scoring Engine

Feature Generation and

 Usage of Computer Vision and NLP

Digitization of the input Transaction Classification

Bin 1 Bin 2 Bin 3 Bin 4

Feature 2 Bin 1 Bin 2 Bin 3 Bin 4

Important Feature Predictive Model

Alternate + Traditional Data Model

Approaches to combine Alternative and Traditional Data

Traditional Data Alternative Data

Data Category Feature Category ML Algorithms Scoring Engine

Call Location Demograp Income Spend

Contacts Device Fixed Social

Data Category Feature Category ML Algorithms Scoring Engine

Score Cut-Off Band Applications Defaults Gini = 40% Gini = 50%

1 10 8 5.7% 98.2% 0.1% 5.6% 98.2% 0.2%

3 4 Unstructured formats (e.g. SMS data), not suitable for saving in

You might also like