0% found this document useful (0 votes)

20 views15 pages

Chapter 9

Ch 9

Uploaded by

dar mal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views15 pages

Chapter 9

Ch 9

Uploaded by

dar mal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter – 9.

1
Development of Data Processing

Data processing (DP) involves organizing, categorizing, and

manipulating data to extract useful information, such as trends
and connections that can help solve important problems.
Recently, advancements in technology have greatly enhanced the
capacity and efficiency of DP, reducing the need for extensive
human labor. Modern techniques and algorithms in DP have
improved, particularly in areas like facial data classification for
recognition and time series analysis for stock market data. The
success of DP in extracting valuable information largely depends
on the quality of the data, which can be compromised by issues
such as missing data, duplications, incorrect equipment design,
and biased data collection. These factors present significant
challenges in ensuring the effectiveness of data processing.

The history of DP can be divided into three phases as a result of technological advancements

s
Manual DP: Manual DP involves Mechanical DP: This phase Electronic DP: And

processing data without much began in 1890 (Bohme et finally, the electronic

assistance from machines. Prior al., 1991) when a system DP replaced the other

to the phase of mechanical DP made up of intricate punch two that resulted fall in

only small-scale data processing card machines was installed mistakes and rising

was possible using manual by the US Bureau of the productivity. Data

efforts. However, in some special Census in order to assist in processing is being done

cases Manual DP is still in use compiling the findings of a electronically using

today, and it is typically due to recent national population computers and other

the data’s difficulty in census. Use of mechanical cutting-edge

digitization or inability to be DP made it quicker and electronics. It is now

read by machines, like in the case easier to search and widely used in industry,

of retrieving data from outdated compute the data than research institutions

texts or documents. manual process. and academia.

How data processing and data science is relevant for finance?

The relevance of data processing and data science in the area of finance is increasing every day.
The eleven significant areas where data science play important role are:

Risk analytics: In the financial sector, the integration of data

science and analytics is crucial for managing risks and
improving customer service. Techniques like risk analytics,
powered by machine learning, allow for the analysis of both
structured and unstructured data to identify and prioritize
potential risks. This helps in preventing fraud, enhancing
customer segmentation, and providing personalized services.
Additionally, real-time analytics enable dynamic risk
assessment models that adapt to new transactions or
changes in customer data, thus optimizing decision-making
and reducing the likelihood of human error. Overall,
leveraging data science in finance helps firms navigate risks
more effectively and make informed decisions.

Real-time analytics: have revolutionized data processing,

enabled by advancements in Data Engineering, such as
Airflow, Spark, and cloud technologies. Previously, data was
only available in historical batches, delaying analysis and
potentially leading to outdated conclusions. Now, with real-
time analytics, businesses can immediately analyze data as it
is generated, allowing for instant assessments of customer
value, precise credit ratings, and accurate transactions. This
integration of Data Engineering, Data Science, Machine
Learning, and Business Intelligence significantly enhances
decision-making and user experience.
Data science has transformed customer data management,
especially in the face of challenges posed by big data and
unstructured data sources like social media and IoT devices.
Traditional methods used in Business Intelligence are no
longer sufficient for analyzing this complex data landscape.
Data science employs advanced techniques such as text
analytics, data mining, and natural language processing to
handle large volumes of unstructured data. This not only
improves data accessibility but also enhances a company’s
analytical capabilities, providing deeper insights into market
trends and customer behavior.

Consumer analytics, empowered by machine learning and

real-time analytics, allows insurance companies to efficiently
process vast amounts of customer data. This capability
supports personalized customer service by swiftly analyzing
transaction histories and prior data patterns. It enables firms
to identify less profitable customers, enhance cross-selling
opportunities, and estimate the lifetime value of consumers.
As a result, financial institutions can maintain security and
provide tailored assessments for each client application.

Customer segmentation is a critical strategy where consumers

are categorized based on attributes like geography, age, and
purchasing patterns. This segmentation allows organizations to
evaluate the current and potential long-term value of different
customer groups. By leveraging machine learning algorithms,
data scientists can automate the segmentation process, assigning
relevance scores to various attributes. This approach helps
businesses identify and focus on high-value customers while
minimizing resources spent on less promising ones. Comparing
these segments with historical data further aids in predicting the
future value of relationships with each customer segment.
Personalized services in the finance sector are increasingly
important for enhancing customer satisfaction and increasing
lifetime value. By analyzing customer interactions through
natural language processing (NLP) and voice recognition
technologies, companies can tailor services to individual needs,
facilitating effective cross-selling and improving overall customer
service. As NLP technology continues to advance, the potential
for even more refined and effective personalization is significant,
promising further improvements in how businesses engage with
their customers.

Advanced customer service in data science leverages real-

time analytics and natural language processing to enhance
interactions. This allows customer service agents to provide
more effective recommendations, offer practical financial
advice, and opportunistically cross-sell or up-sell based on
the customer's immediate needs and conversation cues. Over
time, insights gathered from each interaction improve the
system's overall effectiveness, continually enhancing
customer service delivery.

Predictive analytics in the financial sector uses machine

learning to analyze historical data and forecast future
trends and patterns. This technology helps in making
informed investment decisions and developing trading
strategies. Deep learning techniques, which do not require
manual data preparation, often outperform shallower
learning methods by autonomously adjusting data,
leading to more accurate predictions.
Fraud detection in financial institutions has become more
effective with the advancement of artificial intelligence and
machine learning technologies. These tools analyze vast
amounts of data to identify patterns and respond in real-
time to suspicious activities, such as uncharacteristic large
purchases on a credit card. This capability allows immediate
actions like card blocking and notifications to prevent further
misuse, protecting the customer, the bank, and associated
insurers. Additionally, in trading, machine learning detects
irregularities, prompting swift investigations to mitigate risks.

Anomaly detection in financial services utilizes advanced

algorithms like Recurrent Neural Networks, Long Short-Term
Memory models, and Transformers to identify unusual activities,
such as illegal insider trading, before significant damage occurs.
These technologies analyze trading behaviors and patterns to
detect illegal advantages taken in stock market forecasts, helping
to protect investors and maintain market integrity.

Algorithmic trading utilizes data science to automate

stock market trades, minimizing losses due to human
indecision and error. This method employs
Reinforcement Learning, where trading algorithms are
developed and refined through a system of penalties and
rewards, learning from each transaction. Key benefits
include high-frequency trading and precision, as the
computer rapidly executes trades based on learned
behaviors and predefined rules, engaging only when
there's a perceived profit opportunity.
Chapter – 9.2
Functions of Data Processing
Data processing generally involves the following processes:

(i) Validation: Data validation is a crucial process that

ensures data meets specific quality standards and rule
compliance before being accepted for use. It verifies that
data values fall within an acceptable range and
maintains the integrity of the final data set. In the
context of official statistics, data validation contributes
to several quality dimensions such as relevance,
accuracy, timeliness, accessibility, comparability,
coherence, and comprehensiveness, helping to ensure
that the data is fit for its intended use.

(ii) Sorting: Data sorting is a process that organizes

data into meaningful order to facilitate easier
understanding, analysis, and visualization. It can be
applied to raw data or aggregated information and
is often used in various applications such as data
cleaning, ranking records, and optimizing data
presentation in visualizations like tables and charts.
Sorting can be done by numerical values, labels, or
other factors, and is essential for accurate data
interpretation. Different software has various default
sorting behaviors and capabilities, which can impact
the sorting process, especially when dealing with
non-unique data. Sorting is a fundamental feature in
most analytical and statistical software, crucial across
all stages of data processing.
(iii) Aggregation: Data aggregation is the process of
collecting and summarizing data, often transforming
individual data rows into statistical summaries. This is
commonly used in data warehouses to enhance the
efficiency of querying large datasets by reducing the
time required. Aggregated data can represent large
quantities of individual entries, making it quicker to
query and analyze. As businesses handle increasing
volumes of data, aggregation helps in efficiently
accessing the most significant and frequently requested
information.

(iv) Analysis: Data analysis involves cleaning,

transforming, and modeling data to extract insights and
support decision-making. This process helps businesses
understand past performance and predict future
outcomes to make informed decisions. Whether
addressing stagnation or fostering further growth, data
analysis is crucial for evaluating and improving business
strategies and operations.

(v) Reporting: Data reporting involves gathering, organizing, and

presenting raw data in a consumable format to assess an
organization's ongoing performance. It helps answer fundamental
questions about business status through tools like Excel or data
visualization platforms. Typically, static in format and sourcing from
consistent data points, these reports provide key insights into areas
such as financial health or sales performance, highlighting metrics like
revenue, KPIs, and net profits. Effective data reporting is crucial across
industries, aiding decisions in healthcare, education, and business, by
translating vast data into actionable insights. Despite its benefits,
traditional static reporting can lack real-time updates, which may
limit its applicability in dynamic decision-making scenarios.
(vi) Classification: Data classification is a process that organizes
data into specific categories to enhance its use and ensure
effective protection. This process makes data easier to locate,
reduces duplication, saves storage costs, and accelerates
retrieval. It's crucial for risk management, compliance, and
data security, often being a regulatory requirement. Data
classification involves tagging data with labels to denote its
type, sensitivity, and integrity, which aids in applying suitable
security measures. The three primary methods of data
classification are content-based, which analyzes file contents
for sensitive data; context-based, which uses metadata like
location or creator; and user-based, which depends on human
judgment during document handling. Overall, data
classification helps manage data security and compliance by
defining how data is accessed and protected.

In managing data, organizations classify their data and

systems into three risk categories to ensure appropriate
handling and security measures:

Low Risk: This category includes data that is public and easily
recoverable. Such data poses minimal risk if accessed by
unauthorized individuals because it is intended for wide
distribution or has limited sensitivity.

Moderate Risk: Data in this category is not public but also not
critical to operations. It includes internal data such as
proprietary operating processes, product costs, and some
corporate documents. While this data is not intended for public
access, its exposure poses a lesser threat than high-risk data.

High Risk: This category encompasses data that is highly

sensitive or critical to operational security. High-risk data
includes any information that, if compromised, could lead to
significant financial loss, legal repercussions, or reputational
damage. This also covers data that is difficult to recreate or
retrieve if lost.
The data classification process is crucial for managing and protecting an organization's data
effectively. It involves several key steps:

1. Determine Classification Criteria and Categories: Organizations need to establish clear criteria
and categories for classifying data. This involves understanding and defining the organization's
objectives for the data, and the implications of each category in terms of security, privacy, and
compliance requirements.

2. Set Up Operational Framework: Once categories are defined, organizations must outline the
roles and responsibilities of employees and third parties involved in data management. This
includes detailing how data should be stored, transferred, and retrieved within these roles.

3. Develop and Implement Policies and Procedures: Policies should clearly articulate the security
needs, confidentiality requirements, and handling procedures for each data type. These policies
need to be simple enough for all staff members to understand and follow, ensuring compliance
and mitigating security risks.

4. Understanding the Current Setup: Before classifying data, it's essential to have a comprehensive
understanding of where all organizational data is stored and any relevant legislation that may
affect its handling. This step ensures that all data is accounted for and that the classification
aligns with legal requirements.

5. Create a Data Classification Policy: Developing a formal data classification policy is critical. This
policy serves as the backbone for all data classification efforts, providing guidelines that help
maintain compliance with data protection standards.

6. Prioritize and Organize Data: With a policy in place, data can be systematically categorized
according to its sensitivity and privacy requirements. This involves tagging data accurately and
prioritizing security measures based on the classification level of the data.
Chapter – 9.3
Data Organization and Distribution

Data organization involves structuring Data distribution is a statistical function that

unstructured data into clear categories and organizes and quantifies the possible values of

groups to facilitate easier access, analysis, and a variable and their probabilities of

manipulation. This process, essential for occurrence. This process is essential for

efficient data management, includes techniques determining the type of distribution a

such as classification, frequency distribution population follows, allowing for the

tables, and various graphical representations. appropriate statistical methods to be applied

As data volumes grow, organizing data becomes for analysis. In practice, data distributions

critical to reduce search times and enhance are often visualized using graphs such as

usability. In business contexts, both semi- histograms, box plots, and pie charts, which

structured and unstructured data are analyzed help to estimate the likelihood of specific

and integrated into comprehensive data observations within a data set. Probability

systems using advanced technological tools. distributions, a key aspect of data

Effective data organization is vital for distributions, provide a mathematical

businesses across industries, enabling improved framework for predicting the outcomes of

business intelligence, streamlined operations, various scenarios based on the types of

and overall enhancement of business models. It random variables involved, whether discrete

transforms raw, unstructured data into or continuous. This facilitates decision-

valuable assets that drive decision-making and making through statistical measures like

strategic planning. mean, mode, range, and probability.

Types of distribution

Distributions are basically classified based on the type of data:

(i) Discrete distributions: A discrete distribution that results from countable data and has a finite
number of potential values. In addition, discrete distributions may be displayed in tables, and the
values of the random variable can be counted. Example: rolling dice, selecting a specific amount
of heads, etc.
Following are the discrete distributions of various types:

(a) Binomial distributions: The binomial distribution quantifies the chance of obtaining a specific

number of successes or failures each experiment. Binomial distribution applies to attributes that
are categorised into two mutually exclusive and exhaustive classes, such as number of
successes/failures and number of acceptances/rejections.

Example: When tossing a coin: The likelihood of a coin falling on its head is one-half and the
probability of a coin landing on its tail is one-half.
(b) Poisson distribution: The Poisson distribution is the discrete probability distribution that
quantifies the chance of a certain number of events occurring in a given time period, where the
events occur in a well-defined order.

Poisson distribution applies to attributes that can potentially take on huge values, but in practise
take on tiny ones.
Example: Number of flaws, mistakes, accidents, absentees etc.
(c) Hypergeometric distribution: The hypergeometric distribution is a discrete distribution that

assesses the chance of a certain number of successes in (n) trials, without replacement, from a
sufficiently large population (N). Specifically, sampling without replacement. The hypergeometric
distribution is comparable to the binomial distribution; the primary distinction between the two
is that the chance of success is not the same for all trials in the binomial distribution but it is in
the hypergeometric distribution.

(d) Geometric distribution: The geometric distribution is a discrete distribution that assesses the
probability of the occurrence of the first success. A possible extension is the negative binomial
distribution.

Example: A marketing representative from an advertising firm chooses hockey players from
several institutions at random till he discovers an Olympic participant.

(ii) Continuous distributions: A distribution with an unlimited number of (variable) data points
that may be represented on a continuous measuring scale. A continuous random variable is a
random variable with an unlimited and uncountable set of potential values. It is more than a
simple count and is often described using probability density functions (pdf). The probability
density function describes the characteristics of a random variable. Normally clustered frequency
distribution is seen. Therefore, the probability density function views it as the distribution’s
“shape.”
Following are the continuous distributions of various types:

(i) Normal distribution: Gaussian distribution is another name for normal distribution. It is a bell-
shaped curve with a greater frequency (probability density) around the core point. As values go
away from the centre value on each side, the frequency drops dramatically. In other words,
features whose dimensions are expected to fall on either side of the target value with equal
likelihood adhere to normal distribution.

(ii) Lognormal distribution: A continuous random variable x follows a lognormal distribution if the
distribution of its natural logarithm, ln(x), is normal. As the sample size rises, the distribution of
the sum of random variables approaches a normal distribution, independent of the distribution
of the individuals.

(iii) F distribution: The F distribution is often employed to examine the equality of variances
between two normal populations. The F distribution is an asymmetric distribution with no
maximum value and a minimum value of 0. The curve approaches 0 but never reaches the
horizontal axis.

(iv) Chi square distributions: When independent variables with standard normal distribution are
squared and added, the chi square distribution occurs. Example: y = Z12+ Z22 +Z32 +Z42+....+
Zn2 if Z is a typical normal random variable. The distribution of chi square values is symmetrical
and constrained below zero. And approaches the form of the normal distribution as the number
of degrees of freedom grows.

(v) Exponential distribution: The exponential distribution is a probability distribution and one of
the most often employed continuous distributions. Used frequently to represent products with a
consistent failure rate. The exponential distribution and the Poisson distribution are closely
connected. Has a constant failure rate since its form characteristics remain constant.

(vi) T student distribution: The t distribution or student’s t distribution is a probability distribution

with a bell shape that is symmetrical about its mean. Used frequently for testing hypotheses and
building confidence intervals for means. Substituted for the normal distribution when the
standard deviation cannot be determined. When random variables are averages, the distribution
of the average tends to be normal, similar to the normal distribution, independent of the
distribution of the individuals.
Chapter – 9.4
Data Cleaning and Validation
Data cleaning is the crucial process of identifying and correcting or removing inaccurate,
corrupted, or irrelevant records from a dataset. This process becomes particularly important
when datasets from multiple sources are merged, leading to potential duplications and
mislabeling. Proper data cleaning ensures the reliability and accuracy of analytics and algorithms.
Unlike data transformation, which involves changing the format or structure of data, data
cleaning focuses solely on purifying the dataset by removing flawed, duplicate, or irrelevant data.
Establishing a standardized data cleaning procedure, tailored to specific datasets, is essential for
maintaining data integrity during analysis and decision-making processes.

Data cleaning is a fundamental process in data management, consisting of several key steps to
ensure the accuracy and utility of a dataset:

1. Removal of Duplicate and Irrelevant Information: Identify and eliminate any duplicate entries
or data points that are not relevant to the study's focus, such as data from unrelated demographic
groups, to streamline the dataset and align it more closely with research objectives.

2. Fix Structural Errors: Correct inconsistencies in the data, such as typos, unusual naming
conventions, or inconsistent capitalization, which can lead to mislabeled categories or erroneous
classifications.

3. Filter Unwanted Outliers: Evaluate outliers to determine whether they represent errors or are
valid data points that could potentially confirm a hypothesis. Remove outliers only if they are
clearly erroneous or irrelevant to the analysis.

4. Handle Missing Data: Address missing values, which are problematic for many analytical
algorithms. Options include removing observations with missing data, which risks losing valuable
information, or imputing missing values based on other observations, which could introduce bias.

5. Validation and QA: After cleaning, validate the data to ensure it makes sense, adheres to
applicable standards, and supports or refutes the working hypothesis. Check if the data patterns
can generate further hypotheses, and establish a culture of data quality within the organization
to prevent the future generation of flawed data.
Benefits of quality data

Determining the quality of data needs an analysis of its properties and a weighting of those
attributes based on what is most essential to the company and the application(s) for which the
data will be utilised.

Main characteristics of quality data are:

(i) Validity
(ii) Accuracy
(iii) Completeness
(iv) Consistency
Benefits of data cleaning

Ultimately, having clean data would boost overall productivity and provide with the greatest
quality information for decision-making. Benefits include:
(i) Error correction when numerous data sources are involved.
(ii) Fewer mistakes result in happier customers and less irritated workers.
(iii) Capability to map the many functions and planned uses of your data.
(iv) Monitoring mistakes and improving reporting to determine where errors are originating can
make it easier to repair inaccurate or damaged data in future applications.
(v) Using data cleaning technologies will result in more effective corporate procedures and speedier
decision-making.

Data validation is a critical yet often overlooked step in data management that ensures the
accuracy, clarity, and relevance of data before its use. It involves verifying the precision and
appropriateness of both the data inputs and the data model itself. Modern data integration
systems can automate and integrate validation into the workflow, streamlining the process and
preventing it from being a bottleneck. Validating data helps avoid "garbage in, garbage out" issues
and ensures that decisions are based on reliable and current information, ultimately supporting
the validity of analytical conclusions and mitigating the risk of project failures.

Types of data validation

1. Data type check: A data type check verifies that the entered data has the appropriate data
type. For instance, a field may only take numeric values. If this is the case, the system should
reject any data containing other characters, such as letters or special symbols.
2. Code check: A code check verifies that a field’s value is picked from a legitimate set of options
or that it adheres to specific formatting requirements. For instance, it is easy to verify the validity
of a postal code by comparing it to a list of valid codes. The same principle may be extended to
other things, including nation codes and NIC industry codes.

3. Range check: A range check determines whether or not input data falls inside a specified range.
Latitude and longitude, for instance, are frequently employed in geographic data. A latitude value
must fall between -90 and 90 degrees, whereas a longitude value must fall between -180 and
180 degrees. Outside of this range, values are invalid.

4. Format check: Numerous data kinds adhere to a set format. Date columns that are kept in a
fixed format, such as “YYYY-MM-DD” or “DD-MM-YYYY,” are a popular use case. A data
validation technique that ensures dates are in the correct format contributes to data and
temporal consistency.

5. Consistency check: A consistency check is a form of logical check that verifies that the data has
been input in a consistent manner. Checking whether a package’s delivery date is later than its
shipment date is one example.

6. Uniqueness check: Some data like PAN or e-mail ids are unique by nature. These fields should
typically contain unique items in a database. A uniqueness check guarantees that an item is not
put into a database numerous time.

Da Chapter 9
No ratings yet
Da Chapter 9
55 pages
Big Data in Asset Management
No ratings yet
Big Data in Asset Management
6 pages
Research Paper
No ratings yet
Research Paper
5 pages
Big Data Group Assingment
No ratings yet
Big Data Group Assingment
41 pages
Design and Implementation of Enterprise Financing Decision Model Based On Data Mining
No ratings yet
Design and Implementation of Enterprise Financing Decision Model Based On Data Mining
14 pages
In Today's Era 1
No ratings yet
In Today's Era 1
1 page
Chapter 9 - Big Data Applications (Finance)
No ratings yet
Chapter 9 - Big Data Applications (Finance)
32 pages
Chapter 4 Data Science and Big DataÂ
No ratings yet
Chapter 4 Data Science and Big DataÂ
23 pages
ResearchReport Updated - Template
No ratings yet
ResearchReport Updated - Template
12 pages
Day 2
No ratings yet
Day 2
27 pages
Data Science in Finance Course
100% (1)
Data Science in Finance Course
81 pages
Data Analytics in Consumer Finace Industry
No ratings yet
Data Analytics in Consumer Finace Industry
3 pages
Magazine 5
No ratings yet
Magazine 5
7 pages
Deep Learning in Finance Sector
No ratings yet
Deep Learning in Finance Sector
5 pages
Big Data and Artificial Intelligence in Digital Finance - Shared by WorldLine Technology
No ratings yet
Big Data and Artificial Intelligence in Digital Finance - Shared by WorldLine Technology
291 pages
Big Data and Artificial Intelligence in Digital Finance: John Soldatos Dimosthenis Kyriazis Editors
100% (2)
Big Data and Artificial Intelligence in Digital Finance: John Soldatos Dimosthenis Kyriazis Editors
371 pages
39
No ratings yet
39
3 pages
Applications of Data Mining in The Banking Sector
No ratings yet
Applications of Data Mining in The Banking Sector
8 pages
A Comparative Analysis of Predictive Modeling, Data Mining, and Machine Learning
No ratings yet
A Comparative Analysis of Predictive Modeling, Data Mining, and Machine Learning
11 pages
Dr. Uthira.D Vice-Principal M.O.P Vaishnav College For Women
No ratings yet
Dr. Uthira.D Vice-Principal M.O.P Vaishnav College For Women
58 pages
2020 Guide To Data Driven Lending 1
No ratings yet
2020 Guide To Data Driven Lending 1
28 pages
Smart Banking: Unlocking Insights Through Data Logic
100% (1)
Smart Banking: Unlocking Insights Through Data Logic
13 pages
Day 3
No ratings yet
Day 3
20 pages
The Rise of The Data Scientist:: Machine Learning Models For The Future
No ratings yet
The Rise of The Data Scientist:: Machine Learning Models For The Future
40 pages
Data Analytics in Finance - Black Book Final
No ratings yet
Data Analytics in Finance - Black Book Final
61 pages
Ai Advanced Machine Learning For Finance A Comprehensive Guide With Python Publishing PDF Download
100% (1)
Ai Advanced Machine Learning For Finance A Comprehensive Guide With Python Publishing PDF Download
78 pages
Ai &ML
No ratings yet
Ai &ML
6 pages
Fintech
No ratings yet
Fintech
112 pages
Advancing Financial Services
No ratings yet
Advancing Financial Services
11 pages
Data Mining Tools and Applications
No ratings yet
Data Mining Tools and Applications
5 pages
Unstructured Data & AI
No ratings yet
Unstructured Data & AI
76 pages
Critical Data Warehouse Trends
No ratings yet
Critical Data Warehouse Trends
30 pages
Cia 3 Digital Finance
No ratings yet
Cia 3 Digital Finance
11 pages
Redefining Finance: The Influence of Artificial Intelligence (AI) and Machine Learning (ML)
No ratings yet
Redefining Finance: The Influence of Artificial Intelligence (AI) and Machine Learning (ML)
11 pages
Big Data
No ratings yet
Big Data
5 pages
LM11 Introduction To Big Data Techniques IFT Notes
No ratings yet
LM11 Introduction To Big Data Techniques IFT Notes
7 pages
The Role of Artificial Intelligence and
No ratings yet
The Role of Artificial Intelligence and
13 pages
Khanhdoan Bigdata
No ratings yet
Khanhdoan Bigdata
4 pages
Q Paper Sample EmBA
No ratings yet
Q Paper Sample EmBA
8 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
54 pages
Research Paper
No ratings yet
Research Paper
11 pages
Guide To Big Data For Finance
No ratings yet
Guide To Big Data For Finance
14 pages
Intelligent Automation Entering The Business World
No ratings yet
Intelligent Automation Entering The Business World
8 pages
AI Finance White Paper
No ratings yet
AI Finance White Paper
18 pages
Bi Assignment 1
No ratings yet
Bi Assignment 1
14 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Optimizing Banking Data Management
No ratings yet
Optimizing Banking Data Management
14 pages
Big Data & AI Transform Finance Industry
No ratings yet
Big Data & AI Transform Finance Industry
11 pages
Use of Big Data Analytics in Banking Industry PDF
No ratings yet
Use of Big Data Analytics in Banking Industry PDF
4 pages
The Challenge and Opportunity of Data
No ratings yet
The Challenge and Opportunity of Data
11 pages
Introduction DM2
No ratings yet
Introduction DM2
13 pages
AComprehensive Studyon Integrationof Big Dataand AIin Financial Industryandits Effecton Presentand Future Opportunities
No ratings yet
AComprehensive Studyon Integrationof Big Dataand AIin Financial Industryandits Effecton Presentand Future Opportunities
10 pages
Data Mining in Banking and Finance
100% (1)
Data Mining in Banking and Finance
14 pages
Beginner's Guide to Data Science
No ratings yet
Beginner's Guide to Data Science
26 pages
Unit 1 DSML
No ratings yet
Unit 1 DSML
11 pages
Fintech in Investment Management: Sources Sources
No ratings yet
Fintech in Investment Management: Sources Sources
4 pages
Data Mining: A Competitive Tool in The Banking and Retail Industries
No ratings yet
Data Mining: A Competitive Tool in The Banking and Retail Industries
7 pages
WHITEPAPER Data Observability For Fintech and Financial Services v1
No ratings yet
WHITEPAPER Data Observability For Fintech and Financial Services v1
7 pages
Practical Big Data Analytics For Financials
No ratings yet
Practical Big Data Analytics For Financials
5 pages
Criminology Refresher Reviewer
No ratings yet
Criminology Refresher Reviewer
99 pages
Opt 4001
No ratings yet
Opt 4001
45 pages
Colorful / Mono USB2.0 CCD Digital Camera With 1.4MP 5.0MP 8.0MP CCD Image Sensor
No ratings yet
Colorful / Mono USB2.0 CCD Digital Camera With 1.4MP 5.0MP 8.0MP CCD Image Sensor
6 pages
Tunnel Type Silt Ejector
33% (3)
Tunnel Type Silt Ejector
11 pages
Unit-4 Image Compression
No ratings yet
Unit-4 Image Compression
87 pages
Upgrading To ExtendSim 10
No ratings yet
Upgrading To ExtendSim 10
15 pages
Panasonic TC-P42st30 Service Manual
No ratings yet
Panasonic TC-P42st30 Service Manual
114 pages
Programme 07250069
No ratings yet
Programme 07250069
5 pages
FP 7.1 FTDOnboardingToFMCThroughFDM TOI
No ratings yet
FP 7.1 FTDOnboardingToFMCThroughFDM TOI
94 pages
Strength of Materials: A L A L B B
No ratings yet
Strength of Materials: A L A L B B
1 page
7 Day Electronic Timersheet
100% (1)
7 Day Electronic Timersheet
3 pages
Device Network SDK (Access Control On Card)
No ratings yet
Device Network SDK (Access Control On Card)
473 pages
3rd Sem Syllabus
No ratings yet
3rd Sem Syllabus
25 pages
Aircon Compilation
No ratings yet
Aircon Compilation
198 pages
Grasa LGWA 2 Ficha Tecnica
No ratings yet
Grasa LGWA 2 Ficha Tecnica
2 pages
HH Coins
75% (4)
HH Coins
3 pages
The Quest For Budget-Friendly Gaming Monitors
No ratings yet
The Quest For Budget-Friendly Gaming Monitors
6 pages
ADVERTISEMENT
No ratings yet
ADVERTISEMENT
16 pages
FinalSchedule FMFP
No ratings yet
FinalSchedule FMFP
3 pages
Executive Assistant To CEO
No ratings yet
Executive Assistant To CEO
3 pages
Airborne Contaminant Limits Comparison
No ratings yet
Airborne Contaminant Limits Comparison
1 page
The 7 Sources of Innovative Opportunity
No ratings yet
The 7 Sources of Innovative Opportunity
3 pages
Baby Unicorn: Level 7 8 Parts Time To Create 13 Hour
No ratings yet
Baby Unicorn: Level 7 8 Parts Time To Create 13 Hour
19 pages
Java G1 Garbage Collector Guide
No ratings yet
Java G1 Garbage Collector Guide
20 pages
Ifr Com120b Training
No ratings yet
Ifr Com120b Training
78 pages
Control Systems Prof. C. S. Shankar Ram Department of Engineering Design Indian Institute of Technology, Madras Lecture - 40 Root Locus 4 Part-2
No ratings yet
Control Systems Prof. C. S. Shankar Ram Department of Engineering Design Indian Institute of Technology, Madras Lecture - 40 Root Locus 4 Part-2
7 pages
Reports and Dashboards in SFDC
No ratings yet
Reports and Dashboards in SFDC
8 pages
Simscape Fluids Exercise 2018 A
No ratings yet
Simscape Fluids Exercise 2018 A
15 pages
Fineness of Cement
No ratings yet
Fineness of Cement
7 pages
Basham Srihari
No ratings yet
Basham Srihari
8 pages

Chapter 9

Uploaded by

Chapter 9

Uploaded by

Chapter – 9.

Data processing (DP) involves organizing, categorizing, and

was possible using manual by the US Bureau of the productivity. Data

cases Manual DP is still in use compiling the findings of a electronically using

the data’s difficulty in census. Use of mechanical cutting-edge

digitization or inability to be DP made it quicker and electronics. It is now

texts or documents. manual process. and academia.

Risk analytics: In the financial sector, the integration of data

Real-time analytics: have revolutionized data processing,

Consumer analytics, empowered by machine learning and

Customer segmentation is a critical strategy where consumers

Advanced customer service in data science leverages real-

Predictive analytics in the financial sector uses machine

Anomaly detection in financial services utilizes advanced

Algorithmic trading utilizes data science to automate

(i) Validation: Data validation is a crucial process that

(ii) Sorting: Data sorting is a process that organizes

(iv) Analysis: Data analysis involves cleaning,

(v) Reporting: Data reporting involves gathering, organizing, and

In managing data, organizations classify their data and

High Risk: This category encompasses data that is highly

Data organization involves structuring Data distribution is a statistical function that

efficient data management, includes techniques determining the type of distribution a

such as classification, frequency distribution population follows, allowing for the

tables, and various graphical representations. appropriate statistical methods to be applied

systems using advanced technological tools. distributions, a key aspect of data

Effective data organization is vital for distributions, provide a mathematical

business intelligence, streamlined operations, various scenarios based on the types of

transforms raw, unstructured data into or continuous. This facilitates decision-

strategic planning. mean, mode, range, and probability.

Distributions are basically classified based on the type of data:

(vi) T student distribution: The t distribution or student’s t distribution is a probability distribution

Main characteristics of quality data are:

Types of data validation

You might also like