[go: up one dir, main page]

0% found this document useful (0 votes)
9 views34 pages

Business Intelligence

Uploaded by

saniya.gayas04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views34 pages

Business Intelligence

Uploaded by

saniya.gayas04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Business Intelligence (BI) can be defined as a set of mathematical models and analytical methodologies that leverage

available data to generate information and knowledge, which are essential for supporting complex decision-making
processes.

Effective and Timely Decision-Making in Organizations


1. Decision-Making in Complex Organizations
 Decisions are made continuously at various hierarchical levels in both public and private organizations.
 Decisions can have either short-term or long-term effects, and their importance varies.
 The ability of knowledge workers to make decisions greatly influences the performance and competitive
strength of an organization.
2. Traditional Decision-Making Approaches
 Knowledge workers often rely on intuitive and experience-based decision-making methodologies.
 This approach, while easy and intuitive, leads to stagnant decision-making in dynamic environments.
 Unstable conditions in the economic environment require more rigorous methods than intuition alone.
3. The Need for Analytical Methods and Mathematical Models
 Modern decision-making processes are too complex and dynamic to be handled through intuition alone.
 Advanced analytical methodologies and mathematical models are required for effective decision-making.
 The strategic value of analytics is recognized for gaining a competitive advantage.

Feature Timely Decisions Effective Decisions


Definition Quick decisions made to respond to change Decisions that achieve desired outcomes

Focus Speed and responsiveness Quality and accuracy

Characteristics Reactive, prioritizes rapid response Analytical, considers multiple alternatives


Importance Crucial for survival in competitive markets Essential for long-term success

Examples of Complex Decision-Making


Example 1.1 – Customer Retention in the Mobile Phone Industry
 Situation: The marketing manager of a mobile phone company faces high customer churn (attrition).
 Problem: Out of 2 million customers, the manager has a budget to target only 2,000 customers for a retention
campaign.
 Objective: Estimate the churn probability for each customer to target the ones most likely to leave.
 Solution: Use advanced mathematical models and data mining techniques to identify the customers with the
highest churn likelihood and maximize retention.
Example 1.2 – Logistics Planning in Manufacturing
 Situation: A logistics manager needs to create a medium-term logistics plan.
 Problem: Decisions involve demand allocation, procurement, production planning, and distribution across many
suppliers, facilities, and products.
 Objective: Develop a plan for efficient resource allocation across a complex network of suppliers and production
sites.
 Solution: Apply optimization models to handle the complexity and develop the best logistics strategy.

The Role of Business Intelligence Systems


1. Purpose of Business Intelligence Systems
 Business Intelligence (BI) systems provide tools and methodologies to support effective and timely decision-
making.
 BI helps in turning raw data into actionable insights, enhancing the overall quality of decisions.
2. Effective Decisions
 Rigorous analytical methods help decision-makers rely on more dependable information.
 These methods force decision-makers to clearly define criteria and mechanisms for evaluation, leading to more
informed decisions.
3. Timely Decisions
 In competitive and dynamic environments, timely reactions to market conditions are critical.
 BI systems enable faster analysis and reaction, improving the speed of decision-making.

Advantages of Business Intelligence Systems


1. Improved Decision Quality: BI allows the analysis of a larger number of alternative actions, leading to more
accurate conclusions.
2. Increased Effectiveness: BI enhances decision-making processes, resulting in more effective strategies and
achieving business objectives.
3. Faster Decision-Making: Mathematical models and algorithms speed up the evaluation of options, ensuring
decisions are both effective and timely.
By leveraging BI systems, organizations can improve their ability to handle complex decision-making processes and
adapt to changing economic environments efficiently.

Data, Information, and Knowledge in Decision-Making


1. Vast Amounts of Data in Organizations
 Public and private organizations have accumulated vast amounts of data.
 These data originate from internal transactions (administrative, logistical, commercial) and external sources.
 Despite being systematically gathered and stored, raw data cannot be directly used for decision-making.
2. Data Transformation for Decision-Making
 Data needs to be processed using extraction tools and analytical methods to transform it into information and
knowledge for decision-makers.

Feature Data Information Knowledge


Processed and organized data Understanding gained through
Definition Raw facts and figures
with meaning analysis and experience
Unstructured, lacks Contextualized, answers Interpreted data, applied
Characteristics
context who, what, where, when understanding
Sales numbers, Monthly sales reports, Market trends analysis, strategic
Examples
temperature readings demographic statistics insights
Provides context for decision-
Role in BI Foundation for analysis Leads to informed actions and strategies
making
Active vs. Passive Knowledge Extraction
1. Passive Knowledge Extraction
 Passive extraction occurs through analysis criteria defined by decision-makers.
2. Active Knowledge Extraction
 Active extraction involves applying mathematical models such as inductive learning or optimization.

Knowledge Management and Business Intelligence


1. Knowledge as an Intangible Asset
 Many organizations have developed formal mechanisms to gather, store, and share knowledge, recognizing it as
a valuable intangible asset.
2. Knowledge Management
 Knowledge management involves integrating decision-making processes with information technologies to
support knowledge workers.
 It focuses on unstructured and often implicit information, which is found in documents, conversations, and
experiences.
3. Business Intelligence
 Business Intelligence (BI) systems rely on structured and often quantitative information, typically stored in
databases.

Distinctions and Overlap Between BI and Knowledge Management


1. Key Differences
 Knowledge management deals with unstructured information, often implicit and stored in non-traditional
formats (e.g., documents, conversations).
 Business intelligence is focused on structured information, typically quantitative and organized in databases.
2. Blurring Boundaries
 The distinction between BI and knowledge management is fuzzy:
o With the advent of techniques like text mining, BI systems are increasingly capable of analyzing
unstructured information, such as emails and web pages.

The Role of Mathematical Models in Business Intelligence


1. Mathematical Models in Business Intelligence Systems
 Business Intelligence (BI) systems extract information and knowledge from data using mathematical models and
algorithms.
 Some analyses involve simple calculations like totals and percentages, while others require advanced
optimization and learning models.
2. Promoting a Scientific and Rational Approach
 The use of BI systems encourages a scientific and rational approach to enterprise management.
 Even basic tools, such as spreadsheets, push decision-makers to create a mental representation of processes
(e.g., financial flows).
3. Historical Use of Mathematical Models
 Classical scientific disciplines (e.g., physics) have long used mathematical models to represent real systems.
 In disciplines like operations research, scientific methods and models are applied to artificial systems (e.g.,
organizations).

Characteristics of Business Intelligence Analyses


1. Defining Objectives and Performance Indicators
 The first step is to identify the objectives of the analysis.
 Performance indicators are then defined to evaluate alternative options.
2. Developing Mathematical Models
 Mathematical models are developed based on relationships between system control variables, parameters, and
evaluation metrics.
3. Conducting What-If Analyses
 What-if analyses are performed to assess the impact of changes in control variables and parameters on
performance.

Advantages of Mathematical Models


1. Enhanced Decision-Making Effectiveness
 The primary goal of mathematical models is to improve the effectiveness of decision-making.
2. Deeper Understanding of the Domain
 Developing an abstract model forces decision-makers to focus on key features, leading to a deeper
understanding of the subject.
3. Long-Term Knowledge Transfer
 The knowledge gained from building a mathematical model can be more easily transferred within the
organization, leading to better preservation of knowledge.
4. Reusability of Models
 Mathematical models are general and flexible, allowing them to be applied to similar decision-making tasks in
the future.

Business Intelligence Architectures


1. Major Components of a Business Intelligence System
The architecture of a business intelligence (BI) system consists of three key components:
 Data Sources: Data is gathered from various primary and secondary sources, which include operational systems,
unstructured documents (like emails), and external providers. Integrating these heterogeneous sources requires
significant effort.
 Data Warehouses and Data Marts: Using Extract, Transform, Load (ETL) tools, data from various sources are
processed and stored in data warehouses or data marts, which are designed to support business intelligence
analysis.
 Business Intelligence Methodologies: Data is extracted and processed through mathematical models and
analysis methodologies to support decision-makers.

Business Intelligence Methodologies


1. Types of Methodologies
Several decision support applications can be implemented in a BI system:
 Multidimensional Cube Analysis: Allows for viewing data from multiple dimensions for in-depth analysis.
 Exploratory Data Analysis: Focuses on discovering patterns in data using statistical tools.
 Time Series Analysis: Involves analyzing data points collected over time.
 Inductive Learning Models for Data Mining: Uses algorithms to extract patterns and insights from data.
 Optimization Models: Help in determining the best course of action among alternatives.

BI System Pyramid Structure


1. Data Exploration (Third Level)
 Tools at this level involve query and reporting systems, as well as statistical methods.
 These are considered passive methodologies as they rely on decision-makers generating hypotheses and then
using tools to confirm their assumptions.
 Example: A sales manager noticing a revenue drop in a geographic area might use extraction and visualization
tools to test this hypothesis.

2. Data Mining (Fourth Level)


 Active methodologies like machine learning, pattern recognition, and data mining extract information from
data without requiring a predefined hypothesis.
 The goal is to expand decision-makers’ knowledge by identifying patterns and trends in the data.
3. Optimization (Fifth Level)
 Optimization models are used to identify the best solution from a range of alternatives.
 Example: In logistics, optimization models can determine the best route or production plan.
4. Decision-Making (Top Level)
 The final stage in the pyramid is the decision-making process, where decision-makers choose and implement a
specific action.
 Decision-makers may integrate informal and unstructured information with BI system recommendations to
modify decisions.

Increasing Complexity and Roles in BI Systems


 As we move up the pyramid, BI systems offer more advanced and active support tools.
 The required expertise changes:
o At the bottom, database administrators handle the technical aspects of data.
o In the middle, analysts and experts in mathematical/statistical models manage data analysis.
o At the top, decision-makers responsible for the business domain lead the decision-making process.

Cycle of a Business Intelligence Analysis


The cycle of a business intelligence analysis follows a structured yet flexible path. While the specifics can vary based on
the application domain, decision-makers, and available tools, a typical business intelligence analysis evolves through four
main phases: Analysis, Insight, Decision, and Evaluation.

1. Analysis Phase
 Problem Identification: The first step is recognizing and clearly defining the problem at hand. Decision-makers
form a mental representation of the issue by identifying critical factors.
 Investigative Flexibility: Business intelligence methodologies, like multidimensional data cubes (discussed in
Chapter 3), provide the tools to explore different investigative paths. Decision-makers can flexibly adjust their
hypotheses as new insights emerge.
 Interactive Exploration: By using interactive tools, decision-makers can ask questions and get quick responses,
refining their understanding in a dynamic and iterative way.

2. Insight Phase
 Deep Understanding: In this phase, decision-makers go beyond surface-level observations to gain a deeper
understanding of the problem, often at a causal level. For example, after identifying a trend (e.g., customers
discontinuing an insurance policy), decision-makers look for common characteristics or profiles of the affected
group.
 Knowledge Extraction:
o Insight may stem from intuition and experience, using unstructured data and personal knowledge.
o Alternatively, structured data can be analyzed using inductive learning models to derive patterns and
trends.

3. Decision Phase
 Actionable Knowledge: The insights gained from the previous phase are translated into decisions. These
decisions, enabled by the faster analysis process provided by BI tools, lead to timely and effective actions.
 Reduction of Cycle Time: The availability of business intelligence tools accelerates the entire cycle, allowing
organizations to reduce the time between analysis, decision, and action. This enhances the quality of the
decision-making process and aligns better with organizational strategy.

4. Evaluation Phase
 Performance Measurement: After actions are implemented, the final phase involves assessing the effectiveness
of the decisions. Performance metrics should not focus solely on financial outcomes but also include other key
performance indicators (KPIs) relevant to different departments.
 Comprehensive Evaluation: As described , advanced methodologies enable comprehensive performance
evaluations, providing a holistic view of the organization’s success across various dimensions.

Enabling Factors in Business Intelligence Projects


The success of business intelligence (BI) projects relies on several critical enabling factors: Technologies, Analytics, and
Human Resources. Each of these elements plays a pivotal role in the development and effectiveness of BI systems within
organizations.

1. Technologies
 Advanced Hardware and Software:
o The growth in computing capabilities, with microprocessors improving by an average of 100% every 18
months, has made advanced BI systems feasible.
o Reduced costs of technology enable the implementation of inductive learning methods and
optimization models, ensuring reasonable processing times.
 Data Visualization:
o Cutting-edge graphical visualization techniques, including real-time animations, enhance data
representation, making it easier for decision-makers to understand complex information.
 Mass Storage Capacity:
o The exponential increase in mass storage capabilities at decreasing costs allows organizations to store
terabytes of data, which is essential for effective BI systems.
 Network Connectivity:
o The establishment of Extranets and Intranets facilitates the flow of information and knowledge
extracted from BI systems, improving communication and data accessibility across departments.
 Integration of Technologies:
o The ability to integrate hardware and software from different suppliers or developed internally is crucial
for the effective deployment of data analysis tools.

2. Analytics
 Role of Mathematical Models:
o Mathematical models and analytical methodologies are vital for enhancing information and extracting
knowledge from data within organizations.
o While data visualization supports decision-making, it is considered a passive form of support.
 Active Analytical Models:
o To provide more substantial support, it is essential to implement advanced models of inductive learning
and optimization techniques. These allow organizations to actively analyze data, derive insights, and
improve the decision-making process.

3. Human Resources
 Organizational Competencies:
o The success of a BI system depends significantly on the skills and competencies of the individuals within
the organization. This collective knowledge forms the organizational culture.
 Knowledge Workers' Impact:
o Knowledge workers’ abilities to acquire information and translate it into practical actions greatly
influence the quality of decision-making.
o Even with an advanced BI system, the effectiveness of analyses and interpretation of results depends on
the skills and creativity of the human resources involved.
 Mental Agility and Adaptability:
o Companies that foster an environment where knowledge workers possess mental agility and are open to
changing their decision-making styles will have a competitive advantage.
o The willingness to embrace change and utilize analytical tools effectively can lead to innovative solutions
and successful action plans.

Development of a Business Intelligence System


The development of a business intelligence (BI) system is akin to managing a project that has specific objectives,
development timelines, costs, and the coordination of resources needed to execute planned activities. The typical
development cycle of a BI architecture encompasses several key phases, although individual organizations may adapt this
process based on their specific needs and existing infrastructure.
1. Analysis
 Identifying Organizational Needs:
o The initial phase involves a comprehensive assessment of the organization's needs regarding the BI
system.
o This is typically accomplished through interviews with knowledge workers who occupy various roles
within the organization.
 Project Objectives and Priorities:
o Decision-makers must clearly outline the general objectives and priorities of the BI project.
o The anticipated costs and benefits associated with developing the system are also defined during this
phase.

2. Design
1. Architecture Planning:
o Create a flexible plan for future growth and changes.
2. Infrastructure Assessment:
o Review existing systems to see what needs upgrading or building.
3. Decision-Making Processes:
o Analyze the processes BI will support to understand data needs.
4. Project Planning:
o Set up a clear plan with:
 Phases
 Priorities
 Timeline & Costs
 Roles & Resources

3. Planning
 Defining Functions:
o This stage includes a detailed definition of the functions of the BI system, ensuring all requirements are
captured.
 Data Assessment:
o Existing data is evaluated alongside potential external data sources to identify what can be integrated
into the BI architecture.
 Designing Information Structures:
o The information structures, including a central data warehouse and possibly satellite data marts, are
designed based on the assessed data.
 Mathematical Models:
o The mathematical models necessary for data analysis are defined, ensuring all required data is available
and that algorithms are efficient.
 Prototyping:
o A low-cost, limited-capability system prototype is created to identify discrepancies between actual
needs and project specifications early in the development process.

4. Implementation and Control


 Data Warehouse and Data Marts Development:
o The first sub-phase involves developing the data warehouse and each specific data mart, which serve as
the information infrastructure for the BI system.

 Creating Metadata Archives:


o A metadata archive is established to explain the data contained in the warehouse and the
transformations applied to the primary data.
 ETL Procedures:
o ETL (Extract, Transform, Load) procedures are set up to extract data from primary sources, transform it
as needed, and load it into the data warehouse and data marts.
 Developing Core Applications:
o The next step involves developing the core BI applications that facilitate planned analyses and support
decision-making.
 Testing and Usage:
o Finally, the entire system undergoes a phase of testing before being released for operational use.

Ethics and Business Intelligence


The implementation of business intelligence (BI) methodologies, data mining techniques, and decision support systems
presents several ethical challenges that warrant careful consideration. While advancements in these areas offer
numerous opportunities, they also bring potential distortions and risks that must be addressed through appropriate
control mechanisms.

Key Ethical Concerns


1. Respect for Privacy:
o Improper Data Use: Organizations must protect individuals' privacy when using their data.
o Invasive Investigations: Avoid collecting too much personal data that could harm consumers or
employees.
2. Power Dynamics:
o Imbalance of Power: BI can give companies too much power over customers or society, so there should
be rules to share benefits fairly.
3. Responsibility to Stakeholders:
o Considering All Stakeholders: Companies should think about investors, employees, and the
community—not just profits.
o Ethical Decisions: Profit-driven decisions can sometimes harm society. BI users must balance profit with
social impact.

Ethical Dilemmas in Business Intelligence


 Data Enrichment: Sharing private data without consent raises ethical concerns.
 Profit vs. Ethics: BI models might prioritize profit (like tax avoidance or cost-cutting in safety), which could lead to
unethical outcomes.

What Is a Data Warehouse?


 Definition: Data warehousing provides architectures and tools for business executives to systematically organize,
understand, and use their data for strategic decision-making. Data warehouses are valuable in today's
competitive landscape, with many firms investing heavily in building enterprise-wide systems.
 Understanding Data Warehouse: A data warehouse is loosely defined as a data repository maintained separately
from an organization's operational databases, allowing for integration of various application systems and
supporting information processing with a consolidated historical data platform for analysis.
Key Characteristics of a Data Warehouse:
1. Subject-Oriented:
o Focuses on a specific theme (e.g., sales, marketing), not daily operations.
o Helps with decision-making by removing unnecessary data and providing clear, thematic insights.
2. Integrated:
o Combines data from various sources (e.g., mainframes, databases) into a unified, consistent format.
o Ensures consistency in naming, formats, and codes across the system for effective analysis.
3. Time-Variant:
o Stores data over time (e.g., weekly, monthly) for long-term analysis.
o Data is fixed and cannot be updated once stored, making it useful for historical comparisons.
4. Non-Volatile:
o Data is permanent and not erased when new data is added.
o Data is read-only and refreshed periodically, maintaining historical records for long-term analysis.

 Purpose and Architecture: A data warehouse serves as a semantically consistent data store for strategic
decision-making. It integrates data from multiple sources to support structured and ad hoc queries, analytical
reporting, and overall decision-making processes.
 Data Warehousing Process: The construction and utilization of data warehouses involve data cleaning,
integration, and consolidation. Decision support technologies enable knowledge workers (e.g., managers,
analysts) to efficiently access data and make informed decisions.
Applications of Data Warehousing
1. Customer Focus: Analyze customer preferences and behavior.
2. Product Management: Track product sales and performance over time.
3. Operational Analysis: Identify profit sources in operations.
4. Customer Relationship Management: Adapt strategies based on customer data.:

1. Integration of Different Databases:


o Data warehousing combines data from different sources into one system, making it easier to access and
use.
2. Comparison with Traditional Methods:
o Traditional databases require complex filtering when you want to use the data. Data warehouses prepare
the data in advance, making it faster and easier to analyze.
3. Advantages of Data Warehousing:
o Stores historical data, handles complex queries, and speeds up analysis by organizing data beforehand.
This makes it very popular for businesses.

Differences between Operational Database Systems and Data Warehouses


 Overview: Understanding the distinction between operational database systems and data warehouses is
facilitated by recognizing their unique purposes and functionalities. Operational database systems, known as
online transaction processing (OLTP) systems, focus on day-to-day operations, while data warehouses, referred
to as online analytical processing (OLAP) systems, are designed for data analysis and decision-making.
 Primary Functions:
o OLTP Systems: Designed for online transaction and query processing, covering activities such as
purchasing, inventory management, banking, payroll, and accounting.
o OLAP Systems: Serve knowledge workers by organizing and presenting data for analysis, facilitating
informed decision-making through diverse data formats.
 Distinguishing Features:
o Users and System Orientation:
 OLTP: Customer-oriented, used primarily for transaction processing by clerks, clients, and IT
professionals.
 OLAP: Market-oriented, used for data analysis by knowledge workers like managers, executives,
and analysts.
o Data Contents:
 OLTP: Manages current, detailed data that is often too granular for effective decision-making.
 OLAP: Handles large volumes of historical data, offering summarization and aggregation
facilities, making it more suitable for informed decision-making at various levels of granularity.
o Database Design:
 OLTP: Typically employs an entity-relationship (ER) model with an application-oriented database
design.
 OLAP: Often utilizes a star or snowflake model along with a subject-oriented database design,
facilitating analytical processes.
o View of Data:
 OLTP: Focuses primarily on current data within an organization or department, generally
excluding historical data.
 OLAP: Often encompasses multiple versions of a database schema, integrating information from
various sources and organizations to support broader analytical queries.
o Access Patterns:
 OLTP: Characterized by short, atomic transactions that necessitate concurrency control and
recovery mechanisms.
 OLAP: Primarily involves read-only operations (since most data warehouses contain historical
information), although these operations can involve complex queries
Data Warehousing: A Multitiered Architecture
 Overview: Data warehousing commonly employs a three-tier architecture to effectively manage and analyze
large volumes of data. This architecture enhances data processing and decision-making capabilities within
organizations.
 Architecture Tiers:


o Bottom Tier:
 Warehouse Database Server: Typically a relational database system that serves as the core data
repository.
 Back-End Tools and Utilities: These components extract data from operational databases and
external sources (e.g., customer profiles). They perform essential functions such as data
extraction, cleaning, and transformation to unify data from different sources.
 Loading and Refresh Functions: These processes update the data warehouse to ensure that it
contains current information.
 Gateways: Application program interfaces (APIs) that facilitate data extraction, allowing client
programs to execute SQL code on the server. Examples include:
 ODBC (Open Database Connectivity)
 OLEDB (Object Linking and Embedding Database) by Microsoft
 JDBC (Java Database Connectivity)
 Metadata Repository: This component stores critical information about the data warehouse and
its contents, providing context and facilitating data management.
o Middle Tier:
 OLAP Server: This tier is responsible for data analysis and is typically implemented using one of
two models:
 Relational OLAP (ROLAP): An extended relational DBMS that maps operations on
multidimensional data to standard relational operations.
 Multidimensional OLAP (MOLAP): A special-purpose server designed to directly
implement multidimensional data and operations, optimizing analytical performance.
 OLAP Servers Discussion: Further details about OLAP servers are covered in Section 4.4.4,
highlighting their role in data analysis.
o Top Tier:
 Front-End Client Layer: This layer includes various tools that facilitate user interaction with the
data warehouse. It typically contains:
 Query and Reporting Tools: Allow users to retrieve and present data in a meaningful
way.
 Analysis Tools: Tools for conducting in-depth analysis of the data.
 Data Mining Tools: Techniques for exploring data patterns, trends, and predictions (e.g.,
trend analysis, forecasting).

Data Warehouse Models: Enterprise Warehouse, Data Mart, and Virtual Warehouse
 Overview: There are three primary data warehouse models from an architectural perspective: the enterprise
warehouse, the data mart, and the virtual warehouse. Each model serves different organizational needs and
provides varying levels of data integration and accessibility.
 Models:
o Enterprise Warehouse:
 Definition: An enterprise warehouse encompasses all information across the organization,
providing comprehensive corporate-wide data integration from multiple operational systems and
external information sources.
 Scope: Cross-functional, containing both detailed and summarized data.
 Size: Ranges from a few gigabytes to terabytes or more.
 Implementation: Can be built on traditional mainframes, computer superservers, or parallel
architecture platforms.
 Design Process: Requires extensive business modeling and can take years to design and
construct.
o Data Mart:
 Definition: A data mart is a subset of corporate-wide data tailored for a specific group of users,
focusing on selected subjects relevant to that group.
 Examples: A marketing data mart may concentrate on data related to customers, items, and
sales.
 Data Characteristics: Typically contains summarized data, making it easier to analyze.
 Implementation: Often deployed on low-cost departmental servers using Unix/Linux or
Windows, with implementation cycles measured in weeks rather than months or years.
 Types:
 Independent Data Marts: Sourced from data captured in operational systems, external
providers, or locally generated data within specific departments.
 Dependent Data Marts: Directly sourced from enterprise data warehouses.
o Virtual Warehouse:
Definition: A virtual warehouse consists of a set of views over operational databases that
provide an efficient means of querying data.
 Materialization: Only select summary views may be materialized for effective query processing.
 Construction: Easy to build but demands excess capacity on operational database servers.
 Development Approaches:
o Top-Down Approach:
 Advantages: Provides a systematic solution, minimizing integration issues.
 Disadvantages: High cost, lengthy development time, and limited flexibility due to the challenge
of establishing a consistent data model across the organization.
o Bottom-Up Approach:
 Advantages: Offers flexibility, low cost, and rapid return on investment.
 Disadvantages: May result in challenges when integrating various independent data marts into a
cohesive enterprise data warehouse.
 Recommended Development Method:
o Incremental and Evolutionary Implementation:
 Initial Step: Define a high-level corporate data model within a short timeframe (one or two
months) to ensure a consistent, integrated view of data across various subjects.
 Second Step: Implement independent data marts in parallel with the enterprise warehouse
based on the established corporate data model.
 Third Step: Construct distributed data marts to integrate various data marts using hub servers.
 Final Step: Build a multitier data warehouse where the enterprise warehouse acts as the sole
custodian of all warehouse data, distributing it to various dependent data marts.

ETL Tools
 Overview: ETL (Extract, Transform, Load) tools are software applications designed to automate three primary
functions: extraction, transformation, and loading of data into a data warehouse. These tools play a crucial role
in data warehousing by ensuring that the data is efficiently collected, cleaned, and made ready for analysis.
 Functions:
o Extraction:
 Definition: The first phase involves extracting data from various internal and external sources.
 Initial vs. Incremental Extraction:
 Initial Extraction: Involves populating the empty data warehouse with all historical data.
 Incremental Extraction: Involves updating the data warehouse with new data as it
becomes available over time.
 Data Selection: The selection of data for import is guided by the data warehouse design, which
is influenced by the information requirements of business intelligence analyses and decision
support systems relevant to specific application domains.
o Transformation:
 Goal: The purpose of the transformation phase is to enhance the quality of the extracted data by
addressing inconsistencies, inaccuracies, and missing values.
 Common Issues Addressed:
 Inconsistencies: Corrections made for discrepancies between values recorded in
different attributes that share the same meaning.
 Data Duplication: Removal of duplicate records.
 Missing Data: Identification and handling of missing data points.
 Inadmissible Values: Addressing the presence of unacceptable or invalid values.
 Cleaning Process:
 Automatic Rules: Predefined rules are applied to correct recurring errors.
 Dictionaries: Valid term dictionaries are used to replace incorrect terms based on
similarity levels.
 Additional Data Conversions:
 Ensures homogeneity and integration among various data sources.
 Involves data aggregation and consolidation to generate summaries, improving response
times for subsequent queries and analyses.
o Loading:
 Definition: The final phase where extracted and transformed data is loaded into the tables of the
data warehouse.
 Purpose: Makes the data readily accessible to analysts and decision support applications,
facilitating data-driven decision-making.

Metadata Repository
 Definition: Metadata refers to data about data, serving as a crucial component in a data warehouse. It defines
and describes warehouse objects and their characteristics, providing context and meaning to the data stored
within the warehouse.
 Role in Data Warehousing:
o Storage Location: Metadata is maintained within the metadata repository, typically located in the
bottom tier of the data warehousing architecture.
o Creation: Metadata is generated for various aspects of the data warehouse, including:
 Data names and definitions.
 Timestamps for extracted data.
 Source information for extracted data.
 Missing fields added during data cleaning or integration processes.
 Contents of a Metadata Repository:
1. Data Warehouse Structure:
 Description: Includes warehouse schema, views, dimensions, hierarchies, derived data
definitions, and details about data mart locations and their contents.
2. Operational Metadata:
 Data Lineage: Tracks the history of migrated data and the sequence of transformations applied.
 Currency of Data: Indicates whether data is active, archived, or purged.
 Monitoring Information: Contains warehouse usage statistics, error reports, and audit trails.
3. Summarization Algorithms:
 Definition Algorithms: Includes measures and dimension definitions, data granularity, partitions,
subject areas, aggregation, summarization, and predefined queries and reports.
4. Mapping Information:
 Operational to Warehouse Mapping: Covers source databases and their contents, gateway
descriptions, data partitions, data extraction rules, cleaning and transformation rules, data
refresh and purging rules, and security protocols (user authorization and access control).
5. System Performance Data:
 Indices and Profiles: Enhance data access and retrieval performance, along with rules for the
timing and scheduling of refresh, update, and replication cycles.
6. Business Metadata:
 Terms and Definitions: Includes business-specific terms, data ownership information, and
charging policies.
 Levels of Data Summarization: A data warehouse contains various levels of data summarization, of which
metadata is one aspect. Other levels include:
o Current detailed data (usually stored on disk).
o Older detailed data (often archived on tertiary storage).
o Lightly summarized data.
o Highly summarized data (may or may not be physically stored).
 Importance of Metadata:
o Directory Function: Metadata serves as a directory for decision support system analysts, helping them
locate the contents of the data warehouse.
o Data Mapping Guide: Provides guidance for mapping data when transforming it from the operational
environment to the data warehouse environment.
o Summarization Guidance: Acts as a reference for the algorithms used for summarizing data at different
levels (e.g., from current detailed data to lightly summarized data).
o Persistent Storage: Metadata should be stored and managed persistently, typically on disk, to ensure its
availability and reliability.

OLAP stands for Online Analytical Processing Server. It is a software technology that allows users to analyze information
from multiple database systems at the same time. It is based on multidimensional data model and allows the user to
query on multi-dimensional data (eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one or more cubes
and these cubes are known as Hyper-cubes.

OLAP operations:

There are five basic analytical operations that can be performed on an OLAP cube:

1. Drill down: In drill-down operation, the less detailed data is converted into highly detailed data. It can be done
by:

 Moving down in the concept hierarchy

 Adding a new dimension


In the cube given in overview section, the drill down operation is performed by moving down in the concept hierarchy
of Time dimension (Quarter -> Month).

2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the OLAP cube. It can be done
by:

 Climbing up in the concept hierarchy

 Reducing the dimensions

In the cube given in the overview section, the roll-up operation is performed by climbing up in the concept hierarchy
of Location dimension (City -> Country).

3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In the cube given in the
overview section, a sub-cube is selected by selecting following dimensions with criteria:

 Location = “Delhi” or “Kolkata”

 Time = “Q1” or “Q2”

 Item = “Car” or “Bus”


4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-cube creation. In the cube
given in the overview section, Slice is performed on the dimension Time = “Q1”.

5. Pivot: It is also known as rotation operation as it rotates the current view to get a new view of the
representation. In the sub-cube obtained after the slice operation, performing pivot operation gives a new view
of it.

Schema is a logical description of the entire database. It includes the name and description of records of all record types
including all associated data-items and aggregates. Much like a database, a data warehouse also requires to maintain a
schema. A database uses relational model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema.
In this chapter, we will discuss the schemas used in a data warehouse.

Star Schema
 Each dimension in a star schema is represented with only one-dimension table.

 This dimension table contains the set of attributes.

 The following diagram shows the sales data of a company with respect to the four dimensions, namely time,
item, branch, and location.

 There is a fact table at the center. It contains the keys to each of four dimensions.

 The fact table also contains the attributes, namely dollars sold and units sold.

Note − Each dimension has only one dimension table and each table holds a set of attributes. For example, the location
dimension table contains the attribute set {location_key, street, city, province_or_state,country}. This constraint may
cause data redundancy. For example, "Vancouver" and "Victoria" both the cities are in the Canadian province of British
Columbia. The entries for such cities may cause data redundancy along the attributes province_or_state and country.

Snowflake Schema

 Some dimension tables in the Snowflake schema are normalized.

 The normalization splits up the data into additional tables.

 Unlike Star schema, the dimensions table in a snowflake schema are normalized. For example, the item
dimension table in star schema is normalized and split into two dimension tables, namely item and supplier
table.

 Now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-key.

 The supplier key is linked to the supplier dimension table. The supplier dimension table contains the attributes
supplier_key and supplier_type.

Note − Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it becomes easy to
maintain and the save storage space.

Fact Constellation Schema

 A fact constellation has multiple fact tables. It is also known as galaxy schema.
 The following diagram shows two fact tables, namely sales and shipping.

 The sales fact table is same as that in the star schema.

 The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key, from_location,
to_location.

 The shipping fact table also contains two measures, namely dollars sold and units sold.

 It is also possible to share dimension tables between fact tables. For example, time, item, and location dimension
tables are shared between the sales and shipping fact table.

Measures: Their Categorization and Computation


Overview of Measures in Data Cubes
 Definition: A data cube measure is a numeric function evaluated at each point in the data cube space, defined by
a set of dimension–value pairs
 (e.g., ⟨time="Q1",location="Vancouver",item="computer"⟩\langle \text{time} = "Q1", \text{location} =
"Vancouver", \text{item} = "computer" \rangle⟨time="Q1",location="Vancouver",item="computer"⟩).
 Computation: The measure value is computed by aggregating data corresponding to the defined dimension–
value pairs for that point.
Categorization of Measures
Measures can be categorized into three main types based on the aggregate functions used:
1. Distributive Measures
 Definition: An aggregate function is distributive if it can be computed in a distributed manner.
 Process:
o If data is partitioned into nnn sets, apply the function to each partition, yielding nnn aggregate values.
o If the result from these aggregates matches the result obtained by applying the function to the entire
dataset, the function is distributive.
 Examples:
o Sum: sum()\text{sum()}sum() can be calculated by summing results from subcubes.
o Count: Counts non-empty base cells as 1; total count for a cube is the sum of counts from its child cells.
o Min and Max: Both can be computed similarly.
 Efficiency: Distributive measures are computationally efficient due to partitioning.
2. Algebraic Measures
 Definition: An aggregate function is algebraic if it can be computed using a bounded number MMM of
arguments, each derived from a distributive aggregate function.
 Examples:
o Average (avg): Computed as sum()/count()\text{sum()}/\text{count()}sum()/count().
o Min N and Max N: Find the N minimum and maximum values from a set.
o Standard Deviation: Also an algebraic aggregate function.
 Efficiency: Algebraic measures rely on the efficiency of their underlying distributive functions.
3. Holistic Measures
 Definition: An aggregate function is holistic if it cannot be characterized by a fixed number of arguments. There
is no algebraic function that can describe the computation.
 Examples:
o Median: The middle value in a sorted dataset.
o Mode: The most frequently occurring value.
o Rank: The relative standing of a value in a dataset.
 Efficiency: Holistic measures are generally difficult to compute efficiently, but approximation techniques exist.
Efficient Computation Techniques
 Distributive and Algebraic Measures: Many efficient computation techniques are available, making them
manageable in large data cube applications.
 Holistic Measures: These are harder to compute efficiently, but approximation methods (e.g., estimating the
median using statistical formulas) can help overcome challenges.
UNIT 2

Detailed Analysis of User Types in Business Intelligence


1. Power Users
 Characteristics:
o Highly skilled and analytical individuals.
o Comfortable with complex BI tools and statistical analysis.
o Often have backgrounds in data science, statistics, or specialized domains.
 Expectations:
o Advanced Tools: Desire access to sophisticated analytics and modeling tools (e.g., R, Python, Tableau).
o Flexibility: Require the ability to customize reports and analyses to fit specific business needs.
o Deep Insights: Expect the capability to perform multi-dimensional analysis and drill down into granular
data.
 Impact on BI Design:
o BI systems should support advanced analytics capabilities, allowing for custom scripts and algorithms.
o The interface should enable deep exploration of data, with interactive visualizations that can handle
complex queries.
2. Business Users
 Characteristics:
o Typically domain-specific employees (e.g., marketing, finance) who may not have a technical
background.
o Rely on insights provided by power users but seek to perform their own analyses as needed.
 Expectations:
o User-Friendly Interface: Require intuitive tools that do not necessitate advanced technical skills.
o Ad Hoc Queries: Need the ability to create simple queries on the fly and access raw data when required.
o Contextual Reporting: Prefer reports tailored to their specific business functions, often summarized for
quick insights.
 Impact on BI Design:
o BI solutions should include user-friendly dashboards with drag-and-drop features for report creation.
o Training sessions may be necessary to help business users maximize the use of available tools.
3. Casual Users
 Characteristics:
o Employees who occasionally interact with BI tools, often requiring insights for decision-making.
o Typically represent various operational areas or functions within the organization.
 Expectations:
o Simplicity: Look for straightforward metrics and insights that do not require extensive analysis.
o Pre-Designed Reports: Rely on existing dashboards, scorecards, or reports for summarized information.
o Accessibility: Expect easy access to information without needing to navigate complex systems.
 Impact on BI Design:
o Systems should focus on providing clear, concise visualizations and straightforward navigation.
o Regular updates to reports and dashboards are necessary to keep content relevant.
4. Data Aggregators or Information Providers
 Characteristics:
o Organizations that specialize in collecting, processing, and selling data.
o Enhance and reorganize data for industry-specific insights.
 Expectations:
o Comprehensive Data: Expect access to a wide variety of data sources for aggregation and enhancement.
o Customization: Need the ability to tailor reports based on client requirements and industry standards.
o Quality Assurance: Require robust processes for ensuring the accuracy and reliability of data.
 Impact on BI Design:
o BI tools must support complex data integration processes, with capabilities for data cleaning and
validation.
o Customizable reporting functionalities should allow for the creation of client-specific insights.
5. Operational Analytics Users
 Characteristics:
o Individuals who rely on real-time data insights to guide immediate actions within operational contexts.
o Typically work in roles that require quick adjustments based on analytical insights (e.g., customer
service, sales).
 Expectations:
o Real-Time Insights: Require access to live data and analytics to make prompt decisions.
o Embedded Analytics: Look for analytics embedded within their operational tools (e.g., CRM systems).
o Actionable Recommendations: Prefer tools that suggest next steps based on analytics.
 Impact on BI Design:
o BI systems should focus on real-time data processing and integration with operational applications.
o Analytics should be context-sensitive and provide actionable insights directly within workflow tools.
6. Extended Enterprise Users
 Characteristics:
o External stakeholders such as customers, partners, or regulatory bodies requiring access to specific data
insights.
o Their needs vary widely based on their relationship with the organization.
 Expectations:
o Transparency: Seek clear insights into the organization’s data relevant to their interactions.
o Data Security: Require assurance that their access to data is secure and compliant with regulations.
o User-Friendly Reports: Expect easy-to-understand reports that provide the necessary insights without
overwhelming detail.
 Impact on BI Design:
o BI systems should include secure access layers tailored for external users.
o Reporting tools need to balance detail with clarity to meet diverse stakeholder needs.
7. IT Users
 Characteristics:
o IT professionals responsible for the development, maintenance, and support of BI systems.
o Typically involved in ensuring data integrity, system performance, and user support.
 Expectations:
o Robust Infrastructure: Need systems that can handle large volumes of data and complex queries
efficiently.
o Support Tools: Require tools for monitoring system performance and troubleshooting issues.
o Documentation: Seek comprehensive documentation for system architecture and processes.
 Impact on BI Design:
o BI solutions should include administrative tools for monitoring system performance and managing user
access.
o Comprehensive training and documentation are essential for effective system management.

Standard Reports in Business Intelligence


Overview of Standard Reports
Standard reports represent one of the most fundamental approaches to the presentation of information in BI. They are
characterized by their static nature and structured layout, which facilitates the delivery of consistent and comparable
data views. Typically generated in batch mode and provided on a scheduled basis, standard reports serve to inform
stakeholders about specific business metrics in a straightforward format.
Structure of Standard Reports
 Two-Dimensional Grid: The basic format consists of a grid layout defined by rows and columns.
o Columns: These represent the items or characteristics being measured (e.g., sales figures, customer
counts).
o Rows: These correspond to the divisions or hierarchies for which the measures are reported (e.g.,
different geographical regions, product categories).
 Intersection of Rows and Columns: The cell at the intersection of a specific row and column delivers the specific
measure relevant to that combination. For instance, if we look at a report from the U.S. Census Bureau, we might
find metrics such as:
o Owner-Occupied Housing Units with a Mortgage: The estimated count of such units.
o Margin of Error: Indicates the reliability of the estimate.
Hierarchical Breakdown
Standard reports often include hierarchical categorizations to provide more granular insights. For example, the report
may include:
 Groups of Items: These might include:
o The number of owner-occupied housing units with a mortgage.
o The value of the houses.
o Mortgage status.
o Household income for the previous 12 months.
 Further Hierarchies: Within these groups, there can be additional breakdowns, such as:
o Dollar Groupings for Value: Categories that might range from low to high-value properties.
o Categories for Mortgage Status: Such as “fully paid” or “under mortgage.”
These hierarchical structures allow users to drill down into specific data categories, but they remain within the confines
of the predefined report format.
Characteristics of Standard Reports
 Canned Reports: Standard reports are often referred to as "canned" reports due to their static, pre-prepared
nature. They are designed to provide a consistent view of data without the need for customization at the point of
delivery.
 Scheduled Delivery: These reports are typically generated at regular intervals (e.g., daily, weekly, monthly) and
delivered through standard web interfaces or email distributions, ensuring that stakeholders receive the
necessary data without delay.
Limitations of Standard Reports
 Static Nature: The primary limitation of standard reports lies in their static design, which may not accommodate
the dynamic needs of users seeking deeper insights or real-time data analysis.
 Limited Analytical Depth: While standard reports provide analytical results, they may fall short of offering
actionable insights. Analysts may find that these reports do not answer specific business questions or help
identify trends unless the reported figures deviate significantly from expected norms.
 Prescriptive Insight: The presumption is that standard reports serve merely as a foundation for additional
analysis. Users seeking deeper insights often need to employ more flexible and interactive BI tools that enable
exploratory data analysis.

Interactive Analysis and Ad Hoc Querying


 Drilling Into Data:
BI users looking for additional details regarding information delivered in standard reports may opt to drill into
the data, either with broader visibility into the existing data or with a finer level of granularity. Both options are
intended to go beyond the relatively strict format of the standard report, even if they open up different views
into the data.
 First Option – Downloading Data:
The first option involves taking data formatted into a standard report and downloading it into a framework that
allows users to slice and dice the existing data more freely.
o Example: One example involves extracting data from the report into a desktop spreadsheet tool that
provides organization around hierarchies.
o Dimensional Analysis: This precursor to dimensional analysis provides some level of interactive analysis
and is often manifested as a pivot table.
o Pivot Tables: These pivot tables enable broader flexibility in grouping data within ordered hierarchies,
developing static graphs and charts, or just perusing the data from different angles.
 Second Option – Ad Hoc Querying:
The second option is more powerful in enabling finer granularity by allowing more sophisticated users to execute
their own queries into the analytical data platform.
o User Skills: Users with an understanding of the data warehouse’s data inventory and who have some skill
at articulating their queries can either run ad hoc queries directly using SQL or can use tools that help
users describe the data sets they’d like to review.
o Query Tools: These tools reformulate those requests into SQL queries that are executed directly. The
result sets are also suitable for loading into desktop tools for further organization and analysis, as well as
forming the basis for static charts and graphs.
Caveats for Ad Hoc Querying
However, there are some caveats when allowing users to formulate and execute ad hoc queries:
 Performance:
Writing efficient queries is a skill, and many queries involve joins across multiple tables that can bring a system’s
performance to its knees. Users would be expected to be highly trained before letting many loose in writing their
own queries.
 Semantic Consistency:
Allowing users to write their own queries implies they know and understand the meanings of the data elements
they have selected to include in their result sets. However, without comprehensive, standardized business term
glossaries and metadata repositories, users may see data element names and impute their definitions,
potentially assigning meanings that are different than what was intended by the data creators. These
discrepancies may impact the believability of the results.
 Repeatability:
The ad hoc process involves a sequence consisting of multiple iterations of the two-phased query and review of
the result set process. The operational process allows the analyst to effectively follow a thread or a train of
thought, but without a means for capturing the thought processes driving the sequence, it is difficult to capture
the intuition that drives the ultimate result. In other words, the sequence may yield some results, but it may be
difficult to replicate that process a second or third time.
Conclusion
 Role of Standard Reports:
Standard reports can provide knowledge to a broad spectrum of consumers, even if those consumers must have
contextual knowledge to identify the key indicators and take action.
 Ad Hoc Queries:
Ad hoc queries enable greater drill-down and potential for insight.
 Evolution of Reporting:
However, given the growth of data into the petabytes coupled with the complexity and performance impacts of
ad hoc queries, standard reporting is rapidly yielding to more organized methods for delivering results through
parameterized reporting, dimensional analysis, and notification, alerts, and exception reporting.

Parameterized Reports and Self-Service Reporting


1. Identifying Query Patterns:
o Users from similar categories often execute very similar ad hoc queries.
o Isolated execution of similar queries can degrade overall system performance.
2. System Optimization:
o Knowledge of similar query patterns allows system managers to:
 Optimize the environment: Reorganizing data to enhance performance for recurring queries.
 Preprocess queries: Improving efficiency by preparing aspects of queries in advance.
 Cache data: Reducing memory access and network latency by storing frequently accessed data.
3. Template Queries:
o Queries that differ only by specific variables (e.g., location) can be standardized into template queries.
o These templates allow users to fill in conditions using precomputed lists or form-based drop-downs.
o Parameterized reports serve as a bridge between static reports and ad hoc queries.
4. Self-Service Business Intelligence (Self-Service BI):
o Parameterized reports enable self-service BI by simplifying the report generation process:
 Data Discovery: Presenting users with a palette of accessible data sets.
 Data Access Methods: Masking or virtualizing access to data for easier querying.
 Documentation: Collaboratively documenting the structure and makeup of reports to share
knowledge among analysts.
 Presentation Layer Development: Simplifying the creation of reports, whether in basic
row/column format or using advanced visualization techniques.
5. Reducing IT Bottleneck:
o Self-service BI aims to reduce reliance on IT for report generation.
o As BI programs gain acceptance, demand for IT resources increases, creating a bottleneck.
o Delays in report development can hinder timely decision-making; self-service BI addresses this by
empowering users to generate their own reports quickly.
o Example: A call center manager needing timely insights on product performance must have reports
generated swiftly to facilitate immediate adjustments.

Dimensional Analysis
1. Overview of Multidimensional Analysis:
o Multidimensional analysis enhances the pivot table functionality found in desktop spreadsheet tools.
o OLAP tools provide the capability to "slice and dice" relationships between different variables across
various levels of their hierarchies.
2. Examples of Analysis:
o Analysts can review data in various dimensions:
 Item sales by time period by region: Analyzing sales data for different items over specified time
periods and across various regions.
 Product availability by product classification by supplier by location: Examining product
availability categorized by classification, supplier, and location.
3. Data Viewing and Grouping:
o The use of the term "by" indicates pivot points for viewing data:
 Data can be grouped by various hierarchies, allowing for flexible analysis.
 For example, data can be viewed by item classification and then by time period and region, or
conversely, by region and then by time period.
4. Drill Up and Down Functionality:
o OLAP enables analysts to drill up and down along hierarchical dimensions to reveal hidden relationships.
o This functionality allows users to explore different levels within a dimension, enhancing the depth of
analysis.
5. OLAP Cube Structure:
o OLAP queries are organized around partial aggregations along different dimensions, typically structured
in an OLAP cube.
o This cube structure allows for rapid response to queries that involve slicing or dicing the data.
6. Slicing and Dicing:
o Slicing: Fixes one dimension's value while providing data for all other dimensions. For example, fixing the
region (Northeast) while reviewing item sales grouped by classification and time period.
o Dicing: Involves subselecting components of one or more dimensions, such as choosing specific item
classifications and presenting them by time period and location.
7. Drill-Through Capability:
o Users can drill through (or drill down) along different levels of a dimension's hierarchy.
o For instance, after selecting the Northeast region, analysts can review sales data at a more granular level,
such as by each state in the Northeast.
8. Presentation Layer in OLAP:
o OLAP environments align data along chosen dimensions and provide a palette for visualization.
o Users can pivot dimensions around each other and choose how to present data:
 Data can be displayed in a grid format similar to standard reports.
 Alternatively, data can be visualized through graphical components to enhance understanding.
9. Flexibility for Users:
o The slicing, dicing, and drill-through features of OLAP provide significant flexibility for:
 Power users: Engaging in detailed data discovery.
 Business users: Analyzing data to identify anomalies or search for patterns.

Alerts/Notifications
1. Purpose of Alerts:
o In many cases, users reviewing standard reports are only interested in one or two key variables.
o The focus is often on verifying if a specific value is within an expected range or determining if it is outside
that range and requires action.
2. Example Scenario:
o A national call center manager may regularly check average hold times by region:
 If hold times are within the acceptable range (e.g., 30–60 seconds), no action is needed.
 If hold times exceed a threshold (e.g., over 60 seconds), the manager must take action by
contacting the regional manager to investigate the issue.
3. Triggered Action Based on Specific Variables:
o In many business cases, action is only required when certain variables reach specific values.
o Instead of reviewing entire reports, users only need to be notified when critical thresholds are breached,
making full report reviews unnecessary.
4. Alerts as an Alternative to Full Reports:
o Alerts or notifications serve as an alternative to full reports by delivering only the actionable
knowledge.
o This approach focuses solely on critical variables and when they require action, allowing other
information to be ignored unless needed.
5. Suitability for Operational Environments:
o Alerts are particularly useful in operational environments, where timely information delivery is crucial.
o Notifications can be delivered through various methods, including:
 Email
 Instant messages
 Direct messages through internal systems or social media platforms
 Smartphones or other mobile devices
 Radio transmissions
 Visual cues, such as:
 Scrolling message boards
 Light banks
 Visual consoles
6. Context-Driven Notifications:
o The method of notification can provide context; for example:
 A flashing amber light not only delivers the message but also acts as the medium for the alert.
 This approach enhances the delivery of critical information and minimizes the need for manually
inspecting reports.
7. Enabling Rapid Actions:
o By simplifying the delivery of critical information, alerts reduce the effort required to inspect key
variables.
o This method enables quicker responses to potential issues, allowing businesses to take rapid action
when necessary.
Visualization: Charts, Graphs, Widgets
1. Importance of Presentation:
o While previous sections focused on delivering analytical results, effective presentation methods are
crucial for conveying messages and prompting appropriate actions.
o Different visualization methods can enhance the comparison of analytical results.
2. Types of Visualizations:
o Line Chart:
 Displays points connected by line segments on a grid.
 Useful for showing trends over time (e.g., gas price changes over 36 months).
o Bar Chart:
 Represents values with rectangles whose lengths correspond to the values.
 Effective for comparing different values across contexts (e.g., average life expectancy in different
countries).
o Pie Chart:
 A circular chart divided into sectors representing percentages of a whole.
 Good for illustrating distributions within a single domain (e.g., owner-occupied homes by
ethnicity).
o Scatter Plot:
 Graphs points to show relationships between two variables (independent and dependent).
 Helps identify correlations (e.g., age vs. weight).
o Bubble Chart:
 A variation of scatter plots where the size of the bubble represents a third variable.
 Useful for displaying multi-variable relationships (e.g., sales volume by items sold with market
share represented by bubble size).
o Gauge:
 Indicates magnitude within critical value ranges.
 Ideal for conveying the status of critical variables (e.g., fuel gauge in a car).
o Directional Indicators (Arrows):
 Used for comparing current values to previous ones, indicating improvement, stability, or
decline.
 Often utilized in stock price presentations.

o Heat Map:
 Tiles a two-dimensional space with varying sizes and colors to display multiple values.
 Highlights specific data points effectively (e.g., clicks on webpage links).
o Spider or Radar Chart:
 Displays multiple variable values across dimensions, with each axis representing a variable.
 Facilitates quick comparisons between different observations (e.g., product characteristics).
o Sparkline:
 Small line graphs that lack axes and coordinates.
 Useful for relative trend comparisons (e.g., stock price trends across companies).

Bar Chart
 Structure: A bar chart uses rectangles (bars) whose lengths correspond to the values being represented.
 Purpose: Bar charts are effective for comparing different values of the same variable across various contexts.
 Example: An example given is a chart illustrating the average life expectancy in years across different countries.
 Visualization: The focus is on comparing the height or length of the bars to understand differences in values.
Pie Chart
 Structure: A pie chart is represented as a circle divided into sectors, with each sector representing a percentage
of the whole.
 Purpose: Pie charts are good for showing distributions of values across a single domain, highlighting the relative
size of parts to a whole.
 Example: An example given is displaying the relative percentages of owner-occupied homes by ethnicity within a
Zip code area.
 Visualization: The emphasis is on how each slice represents a percentage of the total, with all components
adding up to 100%.

Feature Gauge Chart Heat Map


Purpose Indicates the status of a single KPI Visualizes intensity or frequency of multiple values
Visualization Single value with a pointer in a range Color-coded grid showing values across two dimensions
Use Cases Monitoring specific metrics (e.g., sales) Analyzing patterns in large datasets (e.g., web clicks)
Complexity Simple and straightforward More complex, displaying relationships in data
Scorecards and Dashboards
1. Overview:
o The evolution of notifications and alerts leads to more comprehensive visual presentations of analytical
results.
o Simplified presentation of key performance metrics enables knowledge workers to shift from merely
seeing past performance to understanding necessary changes for business process improvement.
2. Scorecards:
o Definition:
 Scorecards present key performance indicators (KPIs) along with indicators that reflect whether
KPI values are acceptable.
o Features:
 May include historical trends to show the performance of KPIs over time.
 Typically updated periodically (e.g., daily or hourly).
o Purpose:
 Helps users assess the health of business processes at a glance by summarizing critical
performance metrics.
3. Dashboards:
o Definition:
 Dashboards offer a flexible presentation of summarized performance metrics tailored to
individual user preferences.
o Features:
 Users can choose from a variety of presentation graphics (e.g., charts, graphs) based on their
operational needs.
 Capable of connecting to real-time data sources for up-to-date information.
o Functionality:
 Allows for continuous monitoring of performance metrics throughout the day.
 Facilitates drilling down into key indicators to identify emerging opportunities.
 Enables integrated actions through process-flow and communication tools.
o Delivery Mechanisms:
 Dashboards can be delivered across various channels, including traditional browser formats and
mobile devices.
4. Mashups:
o Definition:
 Mashups allow knowledge consumers to combine analytics and reports with external data
streams (e.g., news feeds, social networks) in a customizable visualization framework.
o Functionality:
 Provides a means to integrate diverse data sources into a coherent view that supports specific
business needs and objectives.
o Purpose:
 Empowers users to create tailored analytics environments that blend internal and external
information for enhanced decision-making.
Here are some quick guidelines to keep in mind when laying out a BI dashboard:

Choose the right visualization graphic

 Don’t let the shiny graphics fool you into using a visual component that does not properly convey the
intended result.
 For example, line charts are good for depicting historical trends of the same variable over time, but bar
charts may not be as good a choice.

Manage your "real estate"

 The available screen space limits what can be displayed at one time, and this is referred to as screen
"real estate."
 Different delivery channels allow different amounts of real estate. A regular desktop screen affords more
presentation area than a laptop screen, which in turn is larger than a portable tablet or smartphone.
 Consider the channel and the consumer when employing visualization components, ensuring they fit
within the available space yet still deliver actionable knowledge.

Maintain context

 Recognize that the presentation of a value is subject to variant interpretations when there is no external
context defining its meaning.
 For example, a dial-gauge can convey the variable’s magnitude but not whether the value is good, bad,
or indifferent.
 Adding a red zone (for bad values) and a green zone (for good values) provides context to the displayed
magnitude.

Be consistent

 When self-service dashboard development is in the hands of many data consumers, biases can lead to
varied ways of representing the same or similar ideas.
 Consistent representations and selections of standard visualization graphics will help ensure consistent
interpretations across different users.

Keep it simple

 Avoid inundating the presentation with fancy-looking graphics that don’t add value to the decision-
making process.
 Often, the simpler the presentation, the more easily the content is conveyed.

Engage

 Engage the user community to agree on standards, practices, and create a guidebook.

Aspect Scorecard Dashboard


A scorecard presents key performance indicators (KPIs) A dashboard provides a flexible presentation of
Definition along with indicators reflecting whether those KPI summarized performance metrics tailored to
values are acceptable. individual user preferences.
- Displays KPI values with performance indicators
(acceptable/unacceptable). - Allows users to select from various presentation graphics
- May include historical trends to show (charts, graphs) based on their operational needs.
Features
performance over time. - Connects to real-time data sources for up-to-date
- Typically updated on a periodic basis (e.g., daily, information.
hourly).
- Helps users quickly assess the health of - Enables continuous monitoring of performance metrics
Purpose business processes by summarizing critical throughout the day and facilitates drilling down into key
performance metrics. indicators to identify opportunities.
Generally less interactive; focuses on Highly interactive; allows users to drill down and take integrated
Interactivity
presenting key metrics and trends. actions through process-flow and communication tools.
Limited customization; primarily displays Offers high customization; users can craft presentations
Customization
predetermined metrics and indicators. that suit their individual business needs and objectives.
Delivery Primarily delivered in a static format, Delivered through various channels, including browser
Mechanisms emphasizing summary and trend analysis. formats and mobile devices, providing flexibility in access.

Significance of Geographic visualization

1. Better Understanding of Data


 Visual Context: Geographic visualization helps users see data in relation to a physical location, making it easier to
understand patterns and trends. For example, displaying population statistics on a map allows quick
identification of areas with high or low populations.

 Layering Information: It enables different data sets to be layered on a map, such as combining weather patterns
with insurance risk areas, helping organizations assess potential risks effectively.

2. Interactive Exploration

 Drill-Down Features: Users can interact with the map, clicking on specific areas to see more detailed
information. This helps analysts focus on specific regions and uncover insights that may not be apparent in
regular reports.

 Dynamic Updates: When users select different data in related charts or tables, the map can automatically
update to reflect those changes, allowing for real-time analysis.

3. Risk Management

 Assessing Hazards: Geographic visualization can show risk factors, like areas prone to natural disasters, helping
businesses adjust their strategies accordingly. For example, an insurance company can overlay hazard zones on a
map to evaluate potential risks.
 Resource Allocation: By identifying high-risk areas, organizations can better allocate resources and manage risks
effectively.

4. Comparative Analysis

 Heat Maps: Geographic visualizations can use heat maps to represent data intensity, such as customer activity or
sales volume in different areas, helping businesses identify where to focus their efforts.
 Data Cross-Referencing: Combining maps with other data displays, like charts or tables, allows for easy
comparison and deeper insights into different aspects of the data.

5. Informed Decision-Making

 Data-Driven Choices: Geographic visualization supports decision-making by providing clear insights based on
spatial data, leading to better strategies and operations. For example, a retail company might use geographic
data to decide where to open new stores based on customer density.

 Effective Communication: Maps and geographic visuals make complex data easier to understand and share with
others, helping stakeholders grasp important insights quickly.

Integrated Analytics
Integrated Analytics refers to the seamless incorporation of analytical results into operational activities, allowing users to
benefit from Business Intelligence (BI) tools without needing extensive training. Here are the key aspects and
characteristics of integrated analytics based on the provided content:
Characteristics of Integrated Analytics
1. Distinct Performance Objectives:
o Each business process has specific goals that analytical results aim to achieve.
2. Decision Points:
o There are critical points within the process where decisions must be made by individuals or teams.
3. Impact of Information Absence:
o Lack of timely information can hinder the performance of the business process.
4. Ill-informed Decisions:
o Poor decisions, often due to inadequate information, can impair the effectiveness of the process.
5. Improvement through Informed Decision-Making:
o The process can be enhanced by making decisions based on well-informed analytics.
6. User Accessibility:
o Participants do not need advanced technical skills (tech-savvy) to understand the information provided.
Implementation Considerations
For integrated analytics to be effective, certain conditions must be met:
 Real-time Data Integration:
o Data from various sources (analytics and operational data) must be integrated in real-time to provide the
necessary insights.
 Timely Delivery:
o Actionable knowledge must be delivered to the right person at the right time to facilitate effective
decision-making.
 Seamless Presentation:
o The presentation of analytical results should align with everyday business operations and integrate
smoothly with commonly used productivity tools.
 Event-Driven Notifications:
o Using alerts and notifications allows analytics to be embedded directly within operational processes,
enhancing responsiveness and actionability.
Benefits of Integrated Analytics
 Reduced Training Needs:
o End-users can operate effectively without extensive training in BI tools.
 Enhanced Decision-Making:
o Facilitates better-informed decisions, leading to improved business outcomes.
 Increased Efficiency:
o Streamlines processes by embedding analytics directly into workflows.
 Wider Adoption of BI Services:
o Lower barriers to deployment and integration can lead to broader acceptance and utilization of BI across
organizations.

Considerations: Optimizing the Presentation for the Right Message


When designing Business Intelligence (BI) dashboards and visualizations, it's essential to focus on optimizing the
presentation to effectively convey the intended message. Here are some key considerations to keep in mind:
1. Choose the Right Visualization Graphic:
o Select visual components that accurately represent the data. For instance, use line charts for showing
historical trends over time, while bar charts may be less effective for that purpose.
2. Manage Your "Real Estate":
o Screen space, or "real estate," is limited, so use it wisely. Different devices provide varying amounts of
display space (e.g., desktop vs. smartphone). Choose visualization components that fit well within the
available area while still conveying actionable insights.
3. Maintain Context:
o Values presented in isolation can be misinterpreted. Provide external context to define the meaning of
data points. For example, using a dial gauge with color zones (red for poor performance, green for good
performance) adds clarity to the displayed values.
4. Be Consistent:
o Ensure that the representations of similar data are consistent across different users and platforms.
Inconsistent visualizations can lead to confusion. Standardizing graphics helps maintain a uniform
understanding of the data.
5. Keep It Simple:
o Avoid overwhelming users with overly complex or flashy graphics that do not enhance decision-making.
Often, simpler presentations convey information more effectively.
6. Engage the User Community:
o Involve users in discussions about standards, practices, and guidelines for visualization. Collaborative
input ensures that the developed visualizations meet the needs and expectations of the user community.
Static Reports
Definition: Static reports are fixed documents that present data in a set format. They are usually created on a regular
schedule (like daily or monthly) and do not allow for changes once they are made. Examples include monthly sales
reports and financial statements.
Limitations of Static Reports
1. No Interactivity:
o Users can't explore the data further or customize the information presented. They have to accept the
report as it is, which limits deeper analysis.
2. Outdated Information:
o Since these reports are generated on a schedule, the data can quickly become old. This is a problem in
fast-paced environments where timely data is important.
3. Limited Detail:
o Static reports might not provide enough detail or context for users to fully understand the data.
Important information can be overlooked in a standardized format.
4. Passive Use:
o Users engage with static reports by just reading them. There’s no way to interact with the data or ask
questions, making it harder to apply the information to real situations.
5. Resource Intensive:
o Creating static reports often takes a lot of time and effort from IT teams or analysts. Keeping them
updated can also be a heavy workload.

You might also like