[go: up one dir, main page]

0% found this document useful (0 votes)
50 views14 pages

Bi Assignment 1

business intelligence
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views14 pages

Bi Assignment 1

business intelligence
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Department of Computer Science and Engineering

Assignment 1
Subject: BUSINESS INTELLIGENCE

M.E. – 1th Semester


[Batch: 2024-26]

UIET- University Institute of Engineering and Technology,


Panjab University,
Chandigarh

Submitted to: - Submitted by: -


Prof. Sukhwinder Singh Parminder Kaur
[24-317]
Question 1: Survey the scientific, technology, and professional literature from
the past six months to find application examples from the field of enterprise
information systems and data warehouses where data mining and data analytics
have been used for economic and financial purposes.
Answer: Data analytics for economic and financial purposes over the past six months.
1. Risk Management and Fraud Detection in Financial Institutions
Application in Banking and Insurance
Over the last six months, financial institutions have increasingly utilized data mining and
analytics to predict and manage risks. Banks, insurance companies, and investment firms
have adopted advanced machine learning models to forecast risks such as loan defaults, stock
market volatility, and insurance fraud.
 Example : Loan Default Prediction
Banks use data analytics models that consider vast customer datasets—including
income, payment history, and market conditions—to predict the probability of loan
defaults. These predictive models allow banks to adjust interest rates dynamically
based on individual risk profiles, improving profitability while mitigating risks.
2. Customer Segmentation and Personalization in Retail and E-Commerce
Application in Retail
Retailers and e-commerce platforms rely heavily on enterprise information systems to track
customer behavior and preferences. Over the past six months, there has been a significant
uptick in the use of advanced customer segmentation techniques. Companies such as
Amazon, Walmart, and Alibaba use data mining algorithms to divide customers into segments
based on purchasing patterns, website behavior, and even social media interactions.
 Example : Personalization Engines
Retail giants have increasingly employed data mining for real-time customer
personalization. By using recommendation engines powered by machine learning
algorithms like collaborative filtering and deep neural networks, they can personalize
product recommendations, which increases the conversion rates of online stores.
3. Supply Chain Optimization and Demand Forecasting
Application in Manufacturing and Logistics
In the manufacturing and logistics sectors, data warehouses are increasingly used to
consolidate and analyze data across multiple supply chain systems. Over the past six months,
the use of predictive analytics in supply chain optimization has gained momentum,
particularly in light of global supply chain disruptions caused by the pandemic and
geopolitical tensions.
 Example : Demand Forecasting
Companies like Procter & Gamble and Unilever have started using predictive
analytics to forecast demand more accurately. By analyzing historical sales data,
customer behavior, and external factors like weather patterns, companies can optimize
their production schedules and reduce inventory costs. In addition, real-time data from
sensors embedded in the supply chain (part of IoT technology) is helping companies
anticipate and mitigate potential delays.
4. Financial Forecasting and Stock Market Analysis
Application in Investment and Portfolio Management
Financial institutions are employing data mining and analytics to gain insights into stock
market movements, economic trends, and investment strategies. In the last six months, a
major trend has been the increased use of quantitative analysis driven by data analytics
platforms. Hedge funds and portfolio managers now rely on big data analytics to process vast
datasets from financial markets, economic reports, and even social media sentiment.
 Example : Predictive Stock Market Models
Hedge funds are utilizing advanced data analytics platforms to predict stock price
movements based on large, real-time datasets. Machine learning models trained on
historical price data, company earnings reports, and even news sentiment can
anticipate market trends and optimize investment portfolios for higher returns.
5. Regulatory Compliance and Reporting in Financial Services
Application in Compliance
One area that has seen significant development over the past six months is the application of
data mining and analytics in regulatory compliance. Banks and financial institutions are
increasingly using these tools to ensure adherence to global regulations such as GDPR, Basel
III, and anti-money laundering (AML) standards.
 Example: AML and KYC Compliance
Banks use data mining algorithms to analyze transactions in real-time and flag
suspicious activities that may indicate money laundering or fraud. Machine learning
models help automate the Know Your Customer (KYC) process by verifying client
identity and ensuring compliance with international regulations.

Question 2: What is the difference between “traditional data mining” (in the
context of data warehouses) and Big Data analytics?
Answer: The difference between traditional data mining and Big Data analytics can be
understood by comparing their scope, scale, technologies, and data sources. Let’s break down
these differences:
Key Differences:
Feature Traditional Data Mining Big Data Analytics
Data Type Structured (relational Structured, semi-structured,
databases unstructured
Data Volume Smaller (GB to TB) Huge (TB to PB and
beyond)
Data Sources Limited, internal sources Many sources (social media,
IoT, logs, etc.)

Technologies SQL, OLAP, data Hadoop, Spark, NoSQL,


warehouses distributed computing
Processing Mode Batch processing, historical Real-time or near-real-time
data
Analytical Techniques Basic statistical analysis, Advanced machine learning,
decision trees AI, deep learning
Storage Centralized (SQL, relational Distributed (HDFS, cloud
databases) storage)
Scope Descriptive, diagnostic Predictive, prescriptive, real-
time decisions

Question 3: Categorize the underlying data structure and the data processing of
technologies as OLTP, OLAP, or Big Data analytics.
Answer: 1. OLTP (Online Transaction Processing)
OLTP systems are designed for managing transactional data that is generated by day-to-day
operations. These systems focus on real-time processing of individual transactions, and their
underlying data structure is typically highly structured and normalized for efficiency in
storage and retrieval.
Characteristics of OLTP:
 Data Structure: Structured (relational databases like MySQL, PostgreSQL)
 Processing: Real-time transaction processing, high volume, fast inserts/updates
 Examples:
o ERP systems: Handle payroll, inventory management, procurement.

o Banking systems: Record deposits, withdrawals, fund transfers.

o E-commerce websites: Handle online orders, payment processing.

Technologies:
 Database Systems: Relational Databases (e.g., MySQL, Oracle, SQL Server)
 Key Operations: CRUD operations (Create, Read, Update, Delete)
 Use Cases: Real-time order processing, inventory tracking, banking transactions.

2. OLAP (Online Analytical Processing)


OLAP systems are optimized for complex queries and reporting, allowing for the analysis of
large datasets over time. OLAP focuses on multidimensional data modeling and
aggregation for business intelligence, where users can drill down, slice, and dice the data for
deeper insights. Data in OLAP systems is usually stored in data warehouses and is
denormalized to improve query performance.
Characteristics of OLAP:
 Data Structure: Multidimensional, typically in star or snowflake schema,
denormalized.
 Processing: Batch processing, heavy read operations, complex queries for
aggregating large datasets.
 Examples:
o Business Intelligence (BI) tools: Such as Tableau, Power BI, and QlikView,
used for dashboards and reporting.
o Data Warehouses: Used for storing historical sales, customer, and financial
data.
Technologies:
 Database Systems: Data Warehouses (e.g., Amazon Redshift, Google BigQuery,
Microsoft SQL Server Analysis Services)
 Key Operations: Aggregation, roll-up, drill-down, slice and dice.
 Use Cases: Sales forecasting, financial reporting, customer behavior analysis.
3. Big Data Analytics
Big Data Analytics is designed to handle massive volumes of structured, semi-structured,
and unstructured data coming from a variety of sources like social media, IoT sensors,
clickstreams, and machine logs. Big Data analytics processes this data using distributed
computing frameworks like Hadoop and Spark, which can process the data in real-time or
in batch mode.
Characteristics of Big Data Analytics:
 Data Structure: Unstructured, semi-structured, structured (NoSQL databases, data
lakes, HDFS)
 Processing: Distributed processing, real-time/near-real-time or batch mode, handles
large data volumes, variety, and velocity.
 Examples:
o Hadoop: Used for distributed storage and processing of large datasets.

o Apache Spark: Real-time analytics, machine learning, and data processing.

o NoSQL Databases: Like MongoDB, Cassandra, which handle semi-structured


data.
Technologies:
 Storage: HDFS (Hadoop Distributed File System), Data Lakes (Amazon S3, Azure
Data Lake)
 Processing Engines: Apache Hadoop, Apache Spark, Apache Flink
 Key Operations: Distributed file storage, parallel data processing, real-time data
streaming.
 Use Cases: Predictive analytics, fraud detection, real-time recommendation engines,
IoT analytics.

Comparison Summary of OLTP, OLAP, and Big Data Analytics

Feature OLTP OLAP Big Data Analytics

Structured Structured,
Data Structured, semi-
(normalized, relational multidimensional
Structure structured, unstructured
DB) (denormalized)

Transactional data Various sources (social


Data Source Historical data (batch)
(real-time) media, sensors, logs)

Transaction processing Complex queries, historical Real-time/batch analysis,


Primary Use
(day-to-day) analysis large data volumes

MySQL, PostgreSQL, Data Warehouses, OLAP Hadoop, Spark, NoSQL


Technologies
Oracle cubes (MongoDB, Cassandra)

Processing Distributed,
Real-time Batch processing
Type real-time/batch

CRUD (Create, Read, Aggregation, roll-up, drill- Predictive analytics,


Operations
Update, Delete) down machine learning

ERP systems, banking, Business intelligence tools, Social media analysis,


Examples
e-commerce reporting IoT data, streaming data
Question 4 :In the case of a project aimed at introduction of tools for BI, Data
Analytics, and data science into an enterprise, which elements of the analytics
ecosystems can and should be combined?
Answer: In a project aimed at introducing tools for Business Intelligence (BI), Data
Analytics, and Data Science into an enterprise, several elements of the analytics ecosystem
should be combined to create a cohesive and efficient analytics environment. The following
key components of the analytics ecosystem can and should be integrated:
1. Data Sources
 Internal Data Sources:
These include transactional databases, CRM systems, ERP systems, and HR systems,
which store structured data related to business operations, customers, finance, and
human resources.
 External Data Sources:
Data from external sources, such as social media, third-party data providers, market
reports, and web logs, is essential for enhancing the enterprise's analytical capabilities
by bringing in additional insights.
 Combination Strategy:
Combine both internal and external data sources to create a comprehensive dataset.
This allows the enterprise to make more informed decisions by integrating operational
data with customer sentiment, market trends, and competitor analysis.

2. Data Integration Tools


 ETL Tools (Extract, Transform, Load):
These tools, such as Informatica, Talend, and Microsoft SSIS, are used to extract data
from different sources, transform it into a consistent format, and load it into data
warehouses or data lakes.
 Data Virtualization Tools:
Tools like Denodo and TIBCO Data Virtualization allow access to data from various
sources without needing to physically move or replicate it. This is useful when
integrating real-time data sources.
 Combination Strategy:
Use ETL processes to bring structured and semi-structured data into a data warehouse
for analytical queries. For real-time and dynamic data, integrate data virtualization
tools to provide unified access to multiple sources without redundancy.
3. Data Storage Solutions
 Data Warehouses:
For structured, historical data, tools like Amazon Redshift, Google Big Query, or
Snowflake are ideal. Data warehouses allow the enterprise to store large volumes of
processed data in a manner that supports BI and OLAP queries.
 Data Lakes:
For unstructured and semi-structured data, data lakes such as Amazon S3, Azure Data
Lake, and Hadoop provide scalable storage. Data lakes are ideal for storing raw data
that can later be processed by analytics tools.
 Combination Strategy:
Combine both data warehouses (for structured, historical data and reporting) and data
lakes (for semi-structured and unstructured data) to ensure flexibility in handling
different data types. This also supports advanced analytics and machine learning
applications by providing raw data for data scientists to experiment with.

4. BI Tools
 Dashboard and Reporting Tools:
Tools like Power BI, Tableau, and QlikView are widely used for generating interactive
dashboards and reports, enabling users to visualize data in real-time.
 Self-Service BI:
Tools like Looker and Microsoft Power BI enable non-technical users to create their
own reports and dashboards, giving business users more autonomy to analyze data.
 Combination Strategy:
Integrate BI tools into the analytics ecosystem to provide real-time reporting and self-
service analytics. By doing so, different departments (finance, sales, marketing) can
access the insights they need without heavy reliance on IT or data teams.

5. Data Analytics and Machine Learning Tools


Advanced Analytics Platform : Platforms like R, Python (with libraries like Pandas, Scikit-
learn), SAS, and SPSS are essential for conducting advanced statistical analysis and building
machine learning models.
 Data Science Platforms:
Tools such as DataRobot, H2O.ai, and Azure Machine Learning Studio provide end-
to-end platforms for building, training, and deploying machine learning models.
 Predictive Analytics Tools:
Platforms like IBM Watson or SAP Predictive Analytics help build models that
predict future outcomes, allowing businesses to make proactive decisions.
 Combination Strategy:
Combine data analytics platforms with data science tools to create an ecosystem
where data scientists and analysts can collaborate on building models and developing
insights. For example, use Python or R for exploratory data analysis, then transition
the models to platforms like Azure ML for deployment and scalability.
6. Data Governance and Security
 Data Governance Tools:
Tools like Collibra and Informatica Data Governance ensure data quality, integrity,
and compliance with regulations such as GDPR or HIPAA.
 Security Frameworks:
Security tools like Microsoft Azure Security Center or AWS Identity and Access
Management (IAM) are essential to secure access to sensitive data and ensure proper
authentication and encryption.
 Combination Strategy:
Integrate data governance and security frameworks into the analytics ecosystem to
ensure that data privacy and compliance standards are maintained. Data governance
tools also help in tracking data lineage and ensuring that analytics are based on
accurate and verified data sources.

Question 5: For the paper titled “Data Warehousing Supports Corporate


Strategy at First American Corporation” (by Watson, Wixom, and Goodhue) go
to
:https://www.researchgate.net/publication/2385545_Data_Warehousing_support
s_Corporate Strategy at Read the paper, and answer the following questions:
Answer: The paper "Data Warehousing Supports Corporate Strategy at First American
Corporation" by Watson, Wixom, and Goodhue describes how First American Corporation
(FAC) implemented a data warehouse called VISION to support their shift to a customer-
centric strategy. Below are the answers to the questions based on the paper:
a. What were the drivers for the DW/BI project in the company?
The primary drivers for the Data Warehousing (DW) and Business Intelligence (BI) project at
FAC were:
1. Customer-Centric Strategy: FAC shifted from a traditional banking approach to a customer
relationship-oriented strategy, where customers were placed at the center of their business
model.
2. Need for Data Integration: FAC wanted to integrate disparate data sources across various
systems to improve decision-making capabilities and provide comprehensive insights into
customer behavior.
3. Competitive Pressures: The financial services industry was becoming increasingly
competitive, and FAC needed better insights into customer needs and preferences to gain a
competitive edge.
b. What strategic advantages were realized?
FAC achieved several strategic advantages through the DW/BI project:
1. Enhanced Customer Segmentation: The data warehouse enabled FAC to segment its
customer base more effectively, allowing for personalized marketing and product offerings.
2. Increased Customer Retention: By understanding customer behavior better, FAC was able
to improve customer satisfaction and loyalty, leading to increased retention rates.
3. Competitive Advantage: FAC's use of data-driven insights allowed them to tailor their
services better than competitors, positioning them as a leader in the financial services
industry.
c. What operational and tactical advantages were achieved?
On the operational and tactical levels, the DW/BI project provided:
1. Improved Decision-Making: The integration of data across the organization allowed for
faster, more informed decision-making by executives and managers.
2. Operational Efficiency: The data warehouse streamlined reporting processes, reduced the
time spent on manual data entry, and improved overall operational efficiency.
3. Marketing Optimization: FAC was able to create targeted marketing campaigns, leading to
more effective use of marketing resources and improved return on investment (ROI).
d. What were the critical success factors for the implementation?
The successful implementation of the DW/BI project at FAC was attributed to several critical
success factors:
1. Strong Leadership: Senior management at FAC provided clear support and direction for the
project, ensuring alignment with corporate goals.
2. Clear Vision: The project had a well-defined vision that linked the DW/BI initiative
directly to FAC’s strategic objectives, particularly the customer-centric approach.
3. Cross-Functional Collaboration: The success of the data warehouse depended on
collaboration across different departments, ensuring that all relevant data was captured and
utilized effectively.
4. Technology Integration: Proper integration of the DW/BI system with existing IT
infrastructure was key to ensuring seamless data flow and accessibility.

Question 6: Go to the article “Predictive Analytics—Saving Lives and Lowering


Medical Bills.”
https://pubsonline.informs.org/do/10.1287/LYTX.2012.01.03/full/ from
January/February 2012 edition titled “Special Issue: The Future of Healthcare.”
Read the article and Answer the following questions:
a.What problem is being addressed by applying predictive analytics?
The primary problem being addressed is medication non-adherence which refers to patients
not following prescribed medication regimens. Non-adherence leads to poor health outcomes,
increased hospitalizations, and higher healthcare costs. Predictive analytics aims to identify
patients who are likely to deviate from their prescribed medication plans, allowing healthcare
providers to intervene and improve adherence rate.
b. What is the FICO Medication Adherence Score?
The FICO Medication Adherence Score is a predictive tool used to estimate the likelihood
that a patient will follow their prescribed medication regimen. It is similar to credit scores but
focuses on healthcare data, including medication history, to predict adherence levels.
c. How is a prediction model trained to predict the FICO Medication Adherence Score?
Did the prediction model classify the FICO Medication Adherence Score?
The prediction model is trained using historical data such as medication history, patient
demographics, and socioeconomic factors. Machine learning algorithms are applied to
identify patterns that influence whether a patient is likely to adhere to their medications. The
model effectively classifies patients based on their predicted adherence levels, helping
providers target interventions.
d. Zoom in on Figure 4, and explain what kind of techniques were applied to the
generated results.
Although the specific details of Figure 4 are not available, typical techniques in such
predictive analytics cases include logistic regression, classification models, and decision
trees. These techniques help categorize patients based on risk levels, and predictive models
are often trained on large datasets to enhance accuracy and precision in identifying non-
adherence.
e. List some of the actionable decisions that were based on the prediction results.
Some actionable decisions based on predictive analytics include:
1. Targeted Patient Interventions: Identifying at-risk patients allows healthcare providers to
implement personalized intervention plans such as follow-up appointments, education on the
importance of adherence, or adjusting medication schedules.
2. Resource Allocation : Predictive insights help healthcare organizations allocate resources
more efficiently, ensuring that patients with a higher likelihood of non-adherence receive the
necessary support.
3. Improved Patient Outcomes: By proactively addressing non-adherence, healthcare
providers can reduce hospitalizations and overall healthcare costs while improving patient
health outcomes.
Question 7 Go to the article “Big Data, Analytics and Elections,” of the
January/February 2013 edition titled “Work Social.”
https://pubsonline.informs.org/do/10.1287/LYTX.2013.01.03/full/ Read and
answer the following questions:
a. What kinds of Big Data were analyzed in the article? Comment on some of the
sources of Big Data.
The article analyzes various kinds of **Big Data** in the context of elections. This includes:
1. Voter Demographic Data: Information such as age, gender, income level, and education,
collected from public records and voter registration databases.
2. Social Media Data: Data from platforms like Facebook and Twitter, capturing public
sentiment, political opinions, and social behavior.
3. Behavioral Data: This includes browsing history, clickstream data, and engagement with
political content online.
4. Polling Data: Data from election polls, surveys, and focus groups, combined with
predictive models to refine voter targeting.
b. Explain the term integrated system. What is the other technical term that suits an
integrated system?
An integrated system in the context of Big Data and elections refers to a unified platform
where different data sources and analytical tools are combined to provide a single, cohesive
environment for processing and analyzing data. This system allows campaigns to gather data
from multiple channels (such as social media, polling data, and voter databases) into one
place for analysis and decision-making.
c. What kinds of data analysis techniques are employed in the project? Comment on
some initiatives that resulted from data analysis.
The article outlines several data analysis techniques used in election campaigns:
1. Sentiment Analysis: This technique is used to assess public sentiment on social media 2.
platforms about candidates and political issues.
2. Predictive Analytics: Historical voting patterns, combined with demographic and
behavioral data, are used to predict voter turnout and preferences.
3. Cluster Analysis: Voters are segmented into groups based on their demographics and
behavior, allowing for more targeted messaging.
d. What are the different prediction problems answered by the models?
The predictive models used in the election context answer several key problems:
1. Voter Turnout Prediction: Identifying which voters are likely to participate in the election
and which may need further outreach.
2. Voter Preference Prediction: Forecasting which candidates or policies individual voters
will support based on their data profiles.
3. Campaign Impact: Analyzing the effectiveness of different campaign strategies (e.g., media
ads, rallies) in influencing voter behavior and opinion.
e. List some of the actionable decisions taken that were based on the prediction results.
Based on the prediction results, the following actionable decisions were taken:
1. Targeted Voter Outreach: Campaigns were able to focus their efforts on undecided or swing
voters by tailoring specific messages to different voter segments.
2. Campaign Resource Allocation: Resources such as time, money, and volunteers were
allocated to regions or districts where predictive models indicated the greatest potential
impact.
3. Message Personalization: Political campaigns adjusted their messaging based on the
predicted preferences of different voter groups, making communication more effective.
f. Identify two applications of Big Data analytics that are not listed in the article.
Fundraising Optimization: Big Data analytics can be used to predict donor behavior, helping
campaigns target high-value donors more effectively and optimize their fundraising
strategies.Real-time Debate Analysis: Using natural language processing (NLP) and
sentiment analysis, Big Data can be applied to provide real-time feedback during political
debates, helping campaigns adjust their strategies based on public reaction.

Ques 8: Read the article “Just-in-Time Logistics: What It Means and Why It
Matters” https://pubsonline.informs.org/do/10.1287/LYTX.2023.04.01/full/
answer the following questions:
a. What are the problems being faced by JIT logistics?
Some of the major challenges faced by Just-in-Time (JIT) logistics include:
1. Accurate Demand Forecasting: JIT systems rely heavily on precise demand forecasting.
Any errors in forecasting can lead to stockouts or overproduction, which disrupt the entire
supply chain.
2. Supply Chain Disruptions: Since JIT operates with minimal inventory, even minor
disruptions—such as delayed shipments, equipment breakdowns, or supplier issues—can
cause significant delays in production.
3. Hidden Costs: While JIT reduces inventory holding costs, other hidden costs like frequent
small shipments and the need for expedited orders can increase transaction and transportation
costs.
b. What strategic advantages were realized?
JIT logistics provides several strategic advantages:
1. Cost Reduction:By minimizing inventory and holding costs, companies save money on
warehouse space and reduce waste.
2. Faster Deliveries: JIT ensures that materials are delivered exactly when needed, reducing
lead times and improving delivery speed for finished products.
3. Increased Responsiveness: The system allows businesses to quickly adapt to market
changes and new customer demands, enhancing their agility in a competitive market.
c. What operational and tactical advantages were achieved?
Operational and tactical benefits include:
1. Improved Efficiency: JIT logistics optimizes production cycles, ensuring a smoother and
more efficient workflow by reducing downtime and unnecessary movement of goods.
2. Waste Minimization: It eliminates excess inventory and reduces the risk of holding
obsolete stock, contributing to leaner operations.
3. Better Supplier Relationships: The system fosters closer collaboration with suppliers to
ensure timely and accurate deliveries.
d. What were the critical success factors for the implementation?
The critical factors for successful JIT implementation include:
1. Accurate Demand Forecasting: To avoid production delays, accurate forecasting of demand
is crucial.
2. Supplier Collaboration: Reliable relationships with suppliers are essential to ensure
consistent quality and timely deliveries.
3. Process Optimization: Identifying bottlenecks and inefficiencies in the supply chain and
addressing them with data-driven insights is key to maintaining JIT's smooth operation.

You might also like