[go: up one dir, main page]

0% found this document useful (0 votes)
21 views34 pages

BBA 202 BA Unit 3

Uploaded by

ritwik.h25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views34 pages

BBA 202 BA Unit 3

Uploaded by

ritwik.h25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

BBA 202 Business Analytics

Unit3:
Data Visualization: Definition, visualization techniques, -Tables, Cross Tabula-
tion, Charts, Tableau, Data Modeling, - Concept Role and Techniques

Data Visualization:

Organizations regularly generate an overabundance of data that is es-


sential for decision-making.

Data visualizations play an important role in helping people understand


complex data and observe patterns and trends over a period of time.

Data visualization is an important skill for data professionals, which


usually goes hand in hand with storytelling, aiming to communicate ob-
servations effectively and inform decisions.

➢ Data visualization is graphical representation of information and


data by using visual elements like charts, graphs, and maps
➢ Data visualization tools provide an accessible way to see and un-
derstand trends, outliers, and patterns in data.
Importance of Data Visualization
➢ Using graphs and charts to visualize a large amount of the complex
data sets is more comfortable in comparison to studying the spread-
sheet and report s.
➢ Data visualization is an easy and quick way to convey concepts uni-
versally.

Use of Data Visualization?

➢ To highlight trends, patterns and correlations in data.


➢ To make well-informed decisions backed by data.
➢ Easier to make sense of large datasets and combine different
datasets from various sources.
➢ It’s an effective way of using storytelling to communicate
data-backed ideas.
➢ Data visualizations need to hold the target audience’s atten-
tion while being easy to understand and interpret.
➢ Data visualizations make it easier to monitor important met-
rics and keep an eye on key performance indicators (KPIs).

Best Practices for Data Visualization

Here are a few best practices to keep in mind when creating visualiza-
tions.

1. Clear Goal in Mind

When creating data visualizations, Keep in mind the information


you want to communicate, its importance and the audience you
are presenting to.
2. Selection of Right Visualization Tool

After setting goals for your visualization, consider the right tools
to help you present your data. There are code libraries and no-
code/low-code platforms, depending on the use case, and they
each have their own strengths.

No-Code/Low-Code Platforms

With little to no knowledge of how to code, you can create engag-


ing data visualizations that capture the message you’re trying to
communicate. Here are a few popular visualization platforms that
can enable you to create visualizations, dashboards and reports:

a. Tableau: Tableau is a visual analytics platform that enables


users to create interactive charts, maps and dashboards. The
effortless drag-and-drop functionality of the platform enables
users to quickly create interactive visualizations with a wide
range of charts, graph and interactive elements, while also en-
abling users to integrate programming languages such as Py-
thon and R.
b. Power BI: Power BI is a popular business intelligence platform
that allows professionals to create interactive dashboards,
charts and graphs, offering a wide range of options and interac-
tive elements while being user-friendly and intuitive.

c. Looker Studio (formerly Google Data Studio): Looker Studio


is a platform that allows users to create customizable dash-
boards and reports. It provides users with an effective option to
create professional-quality visualizations for free and without
the need to write code.
Data Visualization Libraries

On the other hand, depending on the use case, you can use code to
create data visualizations. For visualizations, you can build and use
packages created in JavaScript, Python and R.

R is a programming language and software environment used for


statistical computing and visualizations. With R, you’ll have ac-
cess to several libraries and packages for creating a wide range
of visualizations, including simple plots and interactive graphics.

Here are a few widely used R packages for data visualizations:

• ggplot2
• Highcharter
• Plotly
Type of Visualizations

Picking the right type of visualization can greatly improve clarity


and readability, and ensure your visualizations are engaging.
When using these visualizations, ensure that you make wise com-
parisons and use charts well-suited for the data type. The right
visualization depends on your goals, data type and audience.

Here are some charts and what they’re most useful for:

Tables, Charts and Cross Tabulation

A data can be presented in the form of Table : For example result of


BBA Class students of 2019-22 batch of all semester can be presented
in a Table as shown below , There are four variables: Total students,
students passed, Students passed with Ist Div (scoring 60 to 75%
marks) , and Students passed with distinction scoring > 75% Marks

When trend of more than two variables are analysed in one table , it is
called Crossed tabulation . The data can be presented in a visual form
of bar charts
Semester I II III IV V

Total Students Appeared 28 28 28 28 27

Total Students Pass 28 28 28 28 27

First Division
9 19 24 3 17
(60%-75%)

First Division with Distinc-


tion 0 7 2 25 6
(Above 75%)

Result Analysis BBA 2019-22 Batch


30 28 28 28 28 28 28 28 28 27 27
24 25
25
19
20 17
STUDENTS

15
9
10 7 6
5 2 3
0
0
I II III IV V
SEMESTER

Total Students Appeared Total Students Pass First Division First Division with Distinction
(60%-74%) (Above 75%)

Bar charts: Bar charts are graphs with rectangular bars used for cre-
ating visuals for categorical data. They are useful for showing
distributions / Comparisons. These are most commonly used as
they’re a quick way of communicating information and comparing val-
ues.

Vertical Comparative

Horizontal Comparative

Stacked Bar Chart


Line graphs: Line graphs use lines to connect different data points.
They’re helpful when creating a graph that presents trends and pat-
terns in data, such as time series data. Some examples are changes
in weather, stock prices, sales, etc.
Scatter plots: Scatter plots show relationships between variables ,
how variables influence each other or identifying data patterns.

Pie charts: Pie charts are simple, effective charts that use a circular
diagram, with each portion of the pie representing the relative size of
the data. They show how a quantity or percentage is distributed


Heatmaps: Heatmaps are colored matrices used to present data val-
ues; darker colors are used to represent high values, while cooler
colors represent lower values. Heatmaps are useful for identifying
noteworthy variations in data as well as pointing out patterns and
trends.

Making Visuals More Effective:

Succinct (Brief and Clear) Labels and Titles

➢ It’s important to use labels and titles for your visualizations to


make them easy to understand.
➢ These labels should provide context and inform the reader what
the graph is trying to communicate.
➢ Titles and labels should use fonts that are easy to read, making
sure the fonts are large enough, minding the positioning of your
labels and using colors that are also easy to read.
➢ Avoid too many labels close together, which creates clutter and
affects the readability of the text and unnecessary abbreviations
should be avoided ; if need to use abbreviations, include what
they mean in a key within the report.

The Right Color Scheme:

➢ Colors play an important role in data visualizations as they help


keep your audience engaged with the content of your dashboard,
or report and draw the audience’s attention to important infor-
mation one trying to point out.
➢ When picking a color scheme, choose to stay on brand (for exam-
ple, using your company’s brand colors) while keeping in mind
the readability of titles, labels, charts, etc.

➢ Avoid to use too many colors, and ensure your visualizations are
appealing. At the same time, one should consider the psychology
of color as it influences your audience’s reaction. Finally, consider
people with visual impairments when picking color schemes for
your visualization.
➢ In essence, stick to a clear and consistent color scheme for your
visualizations.
Avoid Clutter and Unnecessary Visual Elements

➢ A good data visualization should be easy to understand, engag-


ing and uncluttered. Too many things going on at once in your
visualization can distract your audience from the insights they’re
supposed to take away from your work.
➢ To improve the readability of your visualization, use simple de-
signs and avoid unnecessary elements.
➢ Unnecessary elements could be too many labels, distracting back-
ground images and patterns, unnecessary data points or gridlines
that aren’t relevant.
➢ Titles and labels should be easy to read and comprehend and
make sure there’s adequate spacing between charts and other
components. Keep your designs simple and focused on the in-
sights you’re trying to get across.

Use Data That Is Clean and Up to Date

➢ Use clean and pre-processed data to ensure it’s free from errors
and anomalies. This process may include removing missing or du-
plicate values, data normalization, etc.
➢ Using uncleaned data could result in misinterpretation or incor-
rect conclusions — not to mention how difficult it is to create ef-
fective visualizations with it.
➢ Use the most recent and relevant data available to ensure your
visualization is not only current, but accurate Avoid outdated too
old data .

Characteristics for Good Data Visualizations:

Here are some criteria for creating good data visualizations:

Easy to understand: A good data visualization shows complex data


connections in a way that’s easy to understand, clear, concise and
without clutter.

Clear Insight : Insights should be easily absorbed by the audience.

The visualizations should effectively communicate the information


and ideas in the data using the right visual elements.

Neat and Clear Presentation : Good data visualizations should con-


sider the needs of various audiences, while being accessible and in-
clusive by using clear and legible fonts and text sizes. Use appropriate
color choices and contrasts. Avoid colors such as red and green, since
red-green color blindness is most common.

A good data visualization is simple and straightforward without un-


necessary distractions or elements.

Accurate and Reliable Data: Good data visualizations are based on


accurate, current and reliable data.
Data modelling

It is a visual representation of either a whole information system or parts


of it to communicate connections between different kind of related data .

The goal is to illustrate the types of data used and stored within the sys-
tem, and the relationships among these data types, i.e the ways the data
can be grouped and organized and its formats and attributes.

The Data Modelling process begins by collecting information about busi-


ness requirements from stakeholders and end users. These business
rules are then translated into data structures to formulate a concrete da-
tabase design.

Types of Data Models:

Data models can generally be divided into three categories, which vary
according to their degree of abstraction(detailing).

The process will start with a conceptual model, progress to a logical model
and conclude with a physical model. Each type of data model is discussed
in more detail below:

Not every team will necessarily follow all three strictly. Often, all three –
conceptual, logical, and physical data models – are compressed into one
modeling exercise.

However, breaking the process down into these three levels can be valu-
able. Each step lays down a foundation for the next:

1. Conceptual – the “what” model


2. Logical – the “how” of the details
3. Physical – the “how” of the implementation

Each level of conceptual, physical, and logical data models can involve
different roles from your team.

Conceptual data models.

Conceptual models are usually created to gather initial project require-


ments. It includes

➢ Defining entity classes (defining the types of things that are im-
portant for the business to represent in the data model), their char-
acteristics and constraints,
➢ Defining Attributes of the entities the relationships between them
and relevant security and data integrity requirements. Any notation
is typically simple.
➢ And Defining relationships among Entities – how these different
kinds of data are related to each other

Logical data models: These are further extension of concep-


tual data model and provide more detail about the concepts and rela-
tionships in the domain under consideration. These Model indicate data
attributes, such as data types and their corresponding lengths, and show
the relationships among entities. Logical data models don’t specify any
technical system requirements.
Physical data models. They provide a schema for how the data will
be physically stored within a database and they are the most detailed.
They offer a finalized design that can be implemented as a relational da-
tabase, including associative tables that illustrate the relationships among
entities as well as the primary keys and foreign keys that will be used to
maintain those relationships. Physical data models can include database
management system (DBMS)-specific properties, including performance
tuning.
Conceptual Data Model Example

Here’s an example of a conceptual diagram that involves Three core en-


tities: travel routes , airlines and schedule and their respective at-
tributes

Logical Data Model Example

This step involves filling in the details of the conceptual model. Defining
type of data of the attributes

Decide the details of each individual field/column and relationship as well.


This includes data types, sizes, lengths, arrays, nested objects, etc.

The logical model is typically created by architects and business analysts.

Logical Data Model Example

For instance, if going with a relational model, the logical model might
look like this:
Physical Data Model

Once a logical model has been defined, it’s now time to actually imple-
ment it into a real database.

If one decide on a relational model, options include SQL Server, Oracle,


PostgreSQL, MySQL, etc.

The physical data model should include:

➢ A specific DBMS
➢ How data is stored (On disk/RAM/hybrid/etc. Couchbase has a
built-in cache to provide the speed of RAM with the durability of
disk)
➢ How to accommodate replications, shards, partitions, etc. (For
Couchbase, sharding and partitioning is automatic. Replication is a
drop-down box to select how many replicas you want).
➢ The physical data model is typically created by DBAs and/or devel-
opers.

Physical Data Model Example

Here’s an example of a physical model for Couchbase:


Please note: above we defined data models , now we will
define database models

Database Models

Database Model: Database Model is a logical structure of a data in which


manner data can be stored, organized and manipulated.

There are three common types of database model that are useful for dif-
ferent types of data or information.

➢ Hierarchical
➢ Network
➢ Relational
➢ Object oriented.

1. Hierarchical databases

Developed by IBM for information Management System. In a hierarchical


database model, the data is organized into a tree-like structure. In simple
language we can say that it is a set of organized data in tree structure.
This type of Database model is rarely used nowadays. Its structure is like
a tree with nodes representing records and branches representing fields.

The following figure shows the generalized the structure of Hierarchical


database model in which data is stored in the form of tree like structure
(data represented or stored in root node, parent node and child node).

Advantages

➢ The model allows us easy addition and deletion of new infor-


mation.
➢ Data at the top of the Hierarchy is very fast to access.
➢ It worked well with linear data storage mediums such as tapes.
➢ It relates well to anything that works through a one to many rela-
tionships.

Disadvantages

➢ It requires data to be repetitively stored in many different entities.


Now a day there is no longer use of linear data storage mediums
such as tapes.
➢ Searching for data requires the DBMS to run through the entire
model from top to bottom until the required information is found,
making queries very slow.
➢ This model support only one to many relationships, many to many
relationships are not supported.

Network databases

This is looks like a Hierarchical database model due to which many time
it is called as modified version of Hierarchical database. Network data-
base model organised data more like a graph and can have more than
one parent node. The network model is a database model conceived as a
flexible way of representing objects and their relationships.

Advantage

➢ The network model is conceptually simple and easy to design.


➢ The network model can represent redundancy in data more effec-
tively than in the hierarchical model.
➢ The network model can handle the one to many and many to many
relationships which is real help in modelling the real-life situations.
➢ The data access is easier and flexible than the hierarchical model.
➢ The network model is better than the hierarchical model in isolating
the programs from the complex physical storage details.

Disadvantage:

➢ All the records are maintained using pointers and hence the whole
database structure becomes very complex.
➢ The insertion, deletion and updating operations of any record re-
quire the large number of pointers adjustments.
➢ The structural changes to the database is very difficult.

RELATIONAL DATABASE

A relational database is developed by E. F. Codd in 1970. In this model,

Data is organised in in table form of rows (Records) and column(fields)


i.e., two-dimensional tables and the relationship is maintained by storing
a common field. It consists of three major components.

In relational model, three key terms are heavily used such as relations,

attributes,

and domains.

A relation nothing but is a table with rows and columns. The named col-
umns of the relation are called as attributes, and finally the domain is noth-
ing but the set of values the attributes can take.

Relational databases are designed as Entity-Relation Diagrams where


more than one table is related to each other through a common entity
ER Model

An Entity-Relationship Model represents the structure of the data-


base with the help of a diagram. ER Modelling is a systematic process to
design a database as it would require you to analyze all data requirements
before implementing your database.

E R Diagram

An Entity Relationship Diagram (ER Diagram) pictorially explains the re-


lationship between entities to be stored in a database. Fundamentally,
the ER Diagram is a structural design of the database. It acts as a
framework created with specialized symbols for the purpose of defining
the relationship between the database entities.

ER diagram is created based on three principal components: entities, at-


tributes, and

relationships.

Entities

An entity can be either a living or non-living component. It showcases


an entity as a rectangle in an ER diagram.

For example, in a student study course, both the student and the course
are entities.
Attribute

An attribute are the properties of an entity. Attribute is represented with


an oval shape in an ER diagram.

Here Student is Entity and Name, Address, Roll No Age are Attributes

Relationship

The diamond shape showcases a relationship in the ER diagram.

It depicts the relationship between two entities.

In the example below, both the student and the course are entities, and
study is the relationship between them.
Example of ER models of RDBMS

Star Schema and snowflake schema are ER Models of Relational


database.
Advantages and Limitations of Relational Models
1 – Simplicity of Model

In contrast to other types of database models, the relational database


model is much simpler. Query processing or structuring so simple SQL
queries are enough to handle the data.

2 – Ease of Use

Users can easily access/retrieve their required information within seconds


without indulging in the complexity of the database. Structured Query Lan-
guage (SQL) is used to execute complex queries.

3 – Accuracy

A key feature of relational databases is that they’re strictly defined and


well-organized, so data doesn’t get duplicated. Relational databases have
accuracy because of their structure with no data duplication.

4 – Data Integrity
RDBMS databases are also widely used for data integrity as they provide
consistency across all tables. The data integrity ensures the features like
accuracy and ease of use.

5 – Easy Normalization

As data becomes more and more complex, the need for efficient ways of
storing it increases. Normalization is a method that breaks down infor-
mation into manageable chunks to reduce storage size. Data can be bro-
ken up into different levels with any level requiring preparation before
moving onto another level of normalizing your data.

Database normalization also ensures that a relational database has no


variety or variance in its structure and can be manipulated accurately. This
ensures that integrity is maintained when using data from this database
for your business decisions.

6 – Collaboration

Multiple users can access the database to retrieve information at the same
time and even if data is being updated.

7 – Security

Data is secure as Relational Database Management System allows only


authorized users to directly access the data. No unauthorized user can
access the information.

Limitations:

Although there are more benefits of using relational databases, it has


some limitations also. Let’s see the limitations or disadvantages of using
the relational database.
1 – Maintenance Problem

The maintenance of the relational database becomes difficult over time


due to the increase in the data. Developers and programmers have to
spend a lot of time maintaining the database.

2 – Cost

The relational database system is costly to set up and maintain. The initial
cost of the software alone can be quite pricey for smaller businesses, but
it gets worse when you factor in hiring a professional technician who must
also have expertise with that specific kind of program.

3 – Physical Storage

A relational database is comprised of rows and columns, which requires


a lot of physical memory because each operation performed depends on
separate storage. The requirements of physical memory may increase
along with the increase of data.

4 – Lack of Scalability

While using the relational database over multiple servers, its structure
changes and becomes difficult to handle, especially when the quantity of
the data is large. Due to this, the data is not scalable on different physical
storage servers. Ultimately, its performance is affected i.e. lack of availa-
bility of data and load time etc..

5 – Complexity in Structure

Relational databases can only store data in tabular form which makes it
difficult to represent complex relationships between objects. This is an is-
sue because many applications require more than one table to store all
the necessary data required by their application logic.
6 – Decrease in performance over time

The relational database can become slower, not just because of its reli-
ance on multiple tables. When there is a large number of tables and data
in the system, it causes an increase in complexity. It can lead to slow re-
sponse times over queries or even complete failure for them depending
on how many people are logged into the server at a given time.

Object Oriented Model

The Object-Oriented Model in DBMS or OODM is the data model where


data is stored in the form of objects. This model is used to represent real-
world entities. The data and data relationship is stored together in a single
entity known as an object in the Object Oriented Model.

The Object Oriented Model in DBMS use real-world entities. Here, we


can store pictures, audio, video, and other types of data, which was pre-
viously impossible to store with the relational approach (Even though we
can store video and audio in the relational database, it is generally not
recommended).

Example : Entities Bus, Ship and Plane as pictures


Components of Object-Oriented Database:

The main components of an object-oriented database model are:

Objects

In an object-oriented database model, data is organized and stored as


objects, which are self-contained units that contain both data and the op-
erations or methods that can be performed on that data.

Classes

Objects in an object-oriented database model are organized into classes,


which define the properties and behavior of the objects. Classes can in-
herit properties and behavior from other classes, which allows for the ef-
ficient reuse of code and data structures.

Inheritance

Inheritance is a key concept in object-oriented database models, which


allows classes to inherit properties and behavior from other classes. This
allows for the efficient reuse of code and data structures, and simplifies
the development and management of complex data structures.

Polymorphism

Polymorphism is another key concept in object-oriented database models,


which allows objects to take on different forms or behaviors depending on
the context in which they are used. This allows for the efficient represen-
tation and manipulation of complex data structures and relationships.
Persistence

In an object-oriented database model, objects are persistent, which


means that they are stored in the database and can be accessed and
manipulated by applications and users. This allows for the efficient man-
agement and manipulation of complex data structures and relationships.

Example of Object oriented database

You might also like