[go: up one dir, main page]

0% found this document useful (0 votes)
15 views18 pages

U1 D CLSRM

The document provides an overview of Big Data, including its types (structured, unstructured, semi-structured), characteristics, and the analytic process which consists of phases like business understanding, data collection, preparation, modeling, evaluation, and deployment. It distinguishes between reporting and analysis, emphasizing their different purposes and outputs. Additionally, it mentions various modern data analytic tools available for organizations to consider.

Uploaded by

lolrofl102938
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

U1 D CLSRM

The document provides an overview of Big Data, including its types (structured, unstructured, semi-structured), characteristics, and the analytic process which consists of phases like business understanding, data collection, preparation, modeling, evaluation, and deployment. It distinguishes between reporting and analysis, emphasizing their different purposes and outputs. Additionally, it mentions various modern data analytic tools available for organizations to consider.

Uploaded by

lolrofl102938
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture-7

Big Data (KCS-061)


Unit 1: Introduction to Big Data
• Types of digital data
• History of Big Data innovation
• Introduction to Big Data platform, drivers for Big Data
• Big Data architecture and characteristics
• 5 Vs of Big Data
• Big Data technology components
• Big Data importance and applications
• Big Data features – security, compliance, auditing and protection
• Big Data privacy and ethics
• Big Data Analytics
• Challenges of conventional systems
• Intelligent data analysis, nature of data, analytic processes and tools, analysis vs
reporting, modern data analytic tools
Nature of Data

• The data in Big Data can be any of the following:

• Structured

• Unstructured

• Semi-structured
• Usually, data is in the unstructured format which makes extracting
information from it difficult.
• According to Merrill Lynch, 80–90% of business data is either
unstructured or semi-structured.
• Gartner also estimates that unstructured data constitutes 80% of the
whole enterprise data.
Formats of Digital Data
Here is a percent distribution of the three forms of data
• Structured
By structured data, we mean data that can be processed, stored, and retrieved in a fixed
format. It refers to highly organized information that can be readily and seamlessly
stored and accessed from a database by simple search engine algorithms. For instance,
the employee table in a company database will be structured as the employee details,
their job positions, their salaries, etc., will be present in an organized manner.

• Unstructured
This data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data.
Email is an example of unstructured data. Structured and unstructured are two
important types of big data.
• Semi-structured
This data pertains to the data containing both the formats mentioned above, that is,
structured and unstructured data. To be precise, it refers to the data that although has
not been classified under a particular repository (database), yet contains vital
information or tags that segregate individual elements within the data. Thus we come to
the end of types of data. Lets discuss the characteristics of data.
Example: data in an XML file
The Analytic Process
• An analysis process contains all or some of the following phases:

• Business Understanding

• Data Collection and Understanding

• Data Preparation

• Modeling

• Evaluation

• Deployment
1. Business Understanding:

• This step mostly focuses on understanding the Business in all the different aspects. It follows the
below different steps.

a) Identify the goal and frame the business problem.

b) Gather information on resource, constraints, assumptions, risks etc

c) Prepare Analytical Goal

d) Flow Chart
2. Data Collection:

• The process of collecting data is an important task in executing a project plan accurately.

• In this phase, data from different data sources is collected furst and then described in terms of its
application and need of the project.

• This process is also called data exploration.

• Exploration of data is required to ensure the quality of the collected data.


3. Data Preparation:

• In this step, the provided data is prepared and cleaned.

• In other words, unnecessary or unwanted data is removed in this phase.

4. Data Modeling:

• In this phase, a model is created by using a data modeling technique.

• The data model is used to analyze the relationship between different selected objects in the data.

• Test cases are created to assess the applicability of model and data is structured according to the
model.
5. Data Evaluation:

• The results obtained from the different test cases are evaluated and reviewed for errors.

• After validating the results, analysis reports are created for determining the next plan of action.

5. Deployment:

• In this phase, the plan is finalized for deployment.

• The deployed plan is constantly checked for errors and maintenance.

• This process is also termed as reviewing the project.


• Phases of analysis:
Analysis vs Reporting
• Sometimes the line between reporting and analysis tends to blur.
• We need to be able to distinguish between these two areas.

• Reporting:
• It is a process in which data is organized and summarized in an easy-to-understand format.
• Reports enable organizations to monitor various performance parameters and imporve customer
satisfaction.

• Analysis:
• It is a process in which data and reports are examined to get insights from them.
• These insights help an organization to perform important tasks in a timely manner, such as
planning a strategy, taking important business decisions, introducing a new product, and
improving customer satisfaction.
• In simple words, reporting can be sonsidered as a process in which raw data is transformed into
useful information and analysis as a process that transforms information into insights.

• While both draw upon the same collected online data, reporting and analysis are very different in
terms of their purpose, tasks, outputs, delivery, and value as shown below:
Modern Data Analytic Tools
• Various types of analytical tools are available in the market, but no company can buy and
implement all of them.

• Some of the open-source analytical tools are as follows:


✔ GridGain
✔ HPCC
✔ Storm
✔ Terrastore
✔ Neo4j

*
• The decision to invest in an analytical tool is a crucial one and needs careful consideration on the
part of a company on various parameters.

• The following are some popular analytical tools:

✔ The R Project for Statistical Computing

✔ IBM SPSS

✔ SAS
Thank You

You might also like