[go: up one dir, main page]

0% found this document useful (0 votes)
46 views16 pages

Data Management

The document provides information on data management, including data, information, and knowledge. It defines data as raw values, information as processed data that provides context and meaning, and knowledge as accumulated experience and insight. It also discusses structured, unstructured, and semi-structured data. Big data is described as large, complex data sets with the key properties of volume, variety, and velocity. The document outlines various sources of data like internal data, third party analytics, external data, and open data. It provides examples for each type. Finally, it provides an overview of data warehousing as organizing data into a database and data mining as extracting meaningful patterns from databases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views16 pages

Data Management

The document provides information on data management, including data, information, and knowledge. It defines data as raw values, information as processed data that provides context and meaning, and knowledge as accumulated experience and insight. It also discusses structured, unstructured, and semi-structured data. Big data is described as large, complex data sets with the key properties of volume, variety, and velocity. The document outlines various sources of data like internal data, third party analytics, external data, and open data. It provides examples for each type. Finally, it provides an overview of data warehousing as organizing data into a database and data mining as extracting meaningful patterns from databases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Management

UNIT –VI REF CLASS NOTES

Dr. J. Rai, IIBM Patna UNIT VI CRM 1


Data, Information & Knowledge

❑Data are the raw alphanumeric values obtained through different acquisition methods. Data in their
simplest form consist of raw alphanumeric values.

❑Information is created when data are processed, organized, or structured to provide context and
meaning. Information is essentially processed data.

❑Knowledge is what we know. Knowledge is unique to each individual and is the accumulation of past
experience and insight that shapes the lens by which we interpret, and assign meaning to, information.
For knowledge to result in action, an individual must have the authority and capacity to make and
implement a decision. Knowledge (and authority) are needed to produce actionable information that
can lead to impact.

Dr. J. Rai, IIBM Patna UNIT VI CRM 2


Dr. J. Rai, IIBM Patna UNIT VI CRM 3
Types of Data
1. Structured and unstructured data
2. Semi-structured data
3. Big data

Dr. J. Rai, IIBM Patna UNIT VI CRM 4


Structured data Unstructured data
Data has a machine-readable format. Data requires a human to interpret.
Data adheres to a predefined data model. Data need not adhere to any predefined model.
Data is in a tabular / rectangular format Data is in the form of social media feed, results
(columns display different attributes or of research and development, surveys, call
variables, rows display a particular record). records, and so on.
Data can be entered, stored, queried, or Data requires human help to manually catalogue
analysed by machines. the data.
Analysts can leverage on the model to know Analysts can use machines to read each word,
how data is recorded, defining the different or sentence, but not to interpret the meaning.
attributes present, and providing information (This is where machine learning and other
about the data type and restrictions on their elements of artificial intelligence come in to
values. play.)
Examples: Names, dates, phone numbers,
Example: Images (both human or and machine-
currency or prices, heights or weights, word
generated), video files, audio files, social media
count or file size of a document, credit card
posts, product reviews, mobile SMS, and so on.
numbers, and so on.

Dr. J. Rai, IIBM Patna UNIT VI CRM 5


2. Semi-structured data

Some data is neither structured nor unstructured, which is called semi-structured data. Email is an
example of semi-structured data. Email headers contain metadata like the date, language, and
recipient’s email address, which are structured data. However, the email body, which contains your
message, is unstructured.

3. Big data (Social Media Data)

The term ‘big data’ is used to describe large, complex data sets of any type – structured, unstructured,
or even semi-structured. While big data sets have been around since the 1960s, in the last 20 years
there has been a considerable increase in the amount of data being created, or made available,
especially by large online services (YouTube, Netflix, Salesforce, etc.).

Big data has three key properties: volume, variety, and velocity. Each of these properties present
unique challenges.

Dr. J. Rai, IIBM Patna UNIT VI CRM 6


Volume The amount of data matters. With big data, you’ll have to process high
volumes of low-density, unstructured data. This can be data of unknown
value, such as Twitter data feeds, clickstreams on a web page or a
mobile app, or sensor-enabled equipment. For some organizations, this
might be tens of terabytes of data. For others, it may be hundreds of
petabytes.
Velocity Velocity is the fast rate at which data is received and (perhaps) acted on.
Normally, the highest velocity of data streams directly into memory
versus being written to disk. Some internet-enabled smart products
operate in real time or near real time and will require real-time
evaluation and action.
Variety Variety refers to the many types of data that are available. Traditional
data types were structured and fit neatly in a relational database. With
the rise of big data, data comes in new unstructured data types.
Unstructured and semi-structured data types, such as text, audio, and
video, require additional preprocessing to derive meaning and support
metadata

Dr. J. Rai, IIBM Patna UNIT VI CRM 7


Sources of Data

1. Internal Data
2. Third Party Analytics
3. External Data
4. Open Data

Dr. J. Rai, IIBM Patna UNIT VI CRM 8


1. Internal data

Internal data is data captured by your organizational processes. Your organization may have
machine-generated data available from sensors or devices used to manufacture a product, or
recorded by the product itself (e.g. smartphones or IoT devices).

For example:

1. transactional data (customer purchases and staff pay)

2. email marketing metrics (email opens, click rates)

3. information in customer profiles (names, addresses)

4. records of customer interactions (email queries, support calls)

5. online activity (placing items in an online shopping cart)


Dr. J. Rai, IIBM Patna UNIT VI CRM 9
2. Third-party analytics

In some cases, you may not have the capacity to capture data, in which case third-party
analytics can be used. Third-party web analytics services can provide cost-effective
collection and analysis and evaluate how your website performs over time, or against
averages across the provider’s customer base.

For example:

Google Analytics is a popular tool and provides businesses with the ability to analyse
and better understand how users find and use their websites and pages. For more
privacy-friendly analysis, such as what the government or health sectors choose to use,
try Piwik Pro Analytics.

Dr. J. Rai, IIBM Patna UNIT VI CRM 10


3. External data

External data can include almost anything from historical demographic data to market
prices, or weather conditions to social media trends. Organizations use external data to
analyze and model economic, political, social, or environmental factors that influence
their business.

For example:

•Open sources (data.gov.uk)

•Social media data (Twitter, Facebook, or LinkedIn)

•Paid sources (Thomson Reuters or Westlaw)

Dr. J. Rai, IIBM Patna UNIT VI CRM 11


4. Open data

Open data is accessible to everyone and free to use. However, if it’s high-level data, or it’s heavily
summarized and aggregated, it might not be very relevant to you. It might also not be in the format you
need, or it might be very difficult for you to make sense of it. All of these challenges can require a lot of
time to make the data usable.

For example:

•Government data: data.gov (US), data.gov.uk (UK), data.gov.in (IND).

•Health and scientific data: World Health Organisation (WHO), Nature.com scientific data, Open Science
Data Cloud (OSCDC), Center for Open Science.

•Social media: Google trends (i.e. look at national trends on search terms), Yahoo finance (great for stock
market information), Twitter (allows you to search by tags and users, which can be downloaded by using
Twitter APIs).

Dr. J. Rai, IIBM Patna UNIT VI CRM 12


Data Warehousing & Data Mining
Data warehousing is a method of organizing and compiling data into one
database, whereas Data mining deals with fetching important data from
databases. Data mining attempts to depict meaningful patterns through a
dependency on the data that is compiled in the data warehouse.

Dr. J. Rai, IIBM Patna UNIT VI CRM 13


A data warehouse is where data can be collected for mining purposes, usually with large storage
capacity. Various organizations’ systems are in the data warehouse, where it can be fetched as per
usage.

Source Extract Transform Load Target

(Data warehouse process)

Data warehouses collaborate data from several sources and ensure data accuracy, quality, and
consistency. In a data warehouse, data is sorted into a formatted pattern by type and as needed.
The data is examined by query tools using several patterns.

Data warehouses store historical data and handle requests faster, helping in online analytical
processing, whereas a database is used to store current transactions in a business process that is
called online transaction processing

Dr. J. Rai, IIBM Patna UNIT VI CRM 14


Data warehouses help analysts or senior executives analyze, organize, and use
data for decision making.
It is used in the following fields:
•Consumer goods
•Banking services
•Financial services
•Manufacturing
•Retail sectors

Dr. J. Rai, IIBM Patna UNIT VI CRM 15


In Data Mining process, data is extracted and analyzed to fetch useful
information. In data mining hidden patterns are researched from the dataset to
predict future behavior. Data mining is used to indicate and discover relationships
through the data.

Data mining uses statistics, artificial intelligence, machine learning systems, and
some databases to find hidden patterns in the data. It supports business-related
queries that are time-consuming to resolve. Eg :Box plot in SPSS

Dr. J. Rai, IIBM Patna UNIT VI CRM 16

You might also like