[go: up one dir, main page]

0% found this document useful (0 votes)
35 views42 pages

Chapter 2.data Warehouse

The document discusses the multidimensional data model, emphasizing the importance of data cubes in organizing data across various dimensions such as time, item, and location. It also highlights the critical role of data cleaning in data mining, outlining steps and techniques for ensuring data quality, including removing duplicates, fixing structural errors, and handling missing data. Additionally, it covers data integration methods and challenges, emphasizing the need for accurate and consistent data for effective decision-making.

Uploaded by

vivekwolf61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views42 pages

Chapter 2.data Warehouse

The document discusses the multidimensional data model, emphasizing the importance of data cubes in organizing data across various dimensions such as time, item, and location. It also highlights the critical role of data cleaning in data mining, outlining steps and techniques for ensuring data quality, including removing duplicates, fixing structural errors, and handling missing data. Additionally, it covers data integration methods and challenges, emphasizing the need for accurate and consistent data for effective decision-making.

Uploaded by

vivekwolf61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Data science

2.Data Warehouse

Multi-Dimensional Data Model

A multidimensional model views data in the form of a data-cube. A data cube enables data to be
modeled and viewed in multiple dimensions. It is defined by dimensions and facts.

 The dimensions are the perspectives or entities concerning which an organization keeps
records. For example, a shop may create a sales data warehouse to keep records of the
store's sales for the dimension time, item, and location.
 These dimensions allow the save to keep track of things, for example, monthly sales of
items and the locations at which the items were sold. Each dimension has a table related to
it, called a dimensional table, which describes the dimension further. For example, a
dimensional table for an item may contain the attributes item name, brand, and type.

A multidimensional data model is organized around a central theme, for example, sales. This theme
is represented by a fact table. Facts are numerical measures. The fact table contains the names of
the facts or measures of the related dimensional tables.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 1


Data science
Consider the data of a shop for items sold per quarter in the city of Delhi. The data is shown in the
table. In this 2D representation, the sales for Delhi are shown for the time dimension (organized in
quarters) and the item dimension (classified according to the types of an item sold).
The fact or measure displayed in rupee sold (in thousands).

Now, if we want to view the sales data with a third dimension, For example, suppose the data
according to time and item, as well as the location is considered for the cities Chennai, Kolkata,
Mumbai, and Delhi. These 3D data are shown in the table. The 3D data of the table are represented
as a series of 2D tables.

Conceptually, it may also be represented by the same data in the form of a 3D data cube, as shown
in fig:

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 2


Data science

Data Cleaning in Data Mining


• Data cleaning is an essential step in the data mining process. It is crucial to the construction
of a model. The step that is required, but frequently overlooked by everyone, is data
cleaning.
• The major problem with quality information management is data quality. Problems with
data quality can happen at any place in an information system. Data cleansing offers a
solution to these issues.
• Data cleaning is the process of correcting or deleting inaccurate, damaged, improperly
formatted, duplicated, or insufficient data from a dataset.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 3


Data science

Steps for Cleaning Data

1. Remove duplicate or irrelevant observations

Remove duplicate or pointless observations as well as undesirable observations from your


dataset. The majority of duplicate observations will occur during data gathering. Duplicate data
can be produced when you merge data sets from several sources, scrape data, or get data from
clients or other departments.
One of the most important factors to take into account in this procedure is deduplication. Those
observations are deemed irrelevant when you observe observations that do not pertain to the
particular issue you are attempting to analyze.

2. Fix structural errors

When you measure or transfer data and find odd naming practices, typos, or wrong
capitalization, such are structural faults. Mislabeled categories or classes may result from
these inconsistencies. For instance, "N/A" and "Not Applicable" might be present on any
given sheet, but they ought to be analyzed under the same heading.

3. Filter unwanted outliers

There will frequently be isolated findings that, at first glance, do not seem to fit the data you
are analyzing. Removing an outlier if you have a good reason to, such as incorrect data
entry, will improve the performance of the data you are working with.

4. Handle missing data

Because many algorithms won't tolerate missing values, you can't overlook missing data. There
are a few options for handling missing data. While neither is ideal, both can be taken into
account, for example:

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru 4


Data science
5. Validate and QA

• As part of fundamental validation, you ought to be able to respond to the following queries
once the data cleansing procedure is complete:

• Are the data coherent?

• Does the data abide by the regulations that apply to its particular field?

• Does it support or refute your working theory? Does it offer any new information?

• To support your next theory, can you identify any trends in the data?

• If not, is there a problem with the data's quality?

False conclusions can be used to inform poor company strategy and decision-making as a result of
inaccurate or noisy data. False conclusions can result in a humiliating situation in a reporting
meeting when you find out your data couldn't withstand further investigation.
Establishing a culture of quality data in your organization is crucial before you arrive. The tools you
might employ to develop this plan should be documented to achieve this.

Techniques for Cleaning Data


The data should be passed through one of the various data-cleaning procedures available. The
procedures are explained below:

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 5


Data science
1. Ignore the tuples: This approach is not very practical because it is only useful when a tuple
has multiple characteristics and missing values.

2. Fill in the missing value: This strategy is also not very practical or effective. Additionally,
it could be a time-consuming technique. One must add the missing value to the approach.
The most common method for doing this is manually, but other options include using
attribute means or the most likely value.

3. Binning method: This strategy is fairly easy to comprehend. The values nearby are used
to smooth the sorted data. The information is subsequently split into several equal-sized
parts. The various techniques are then used to finish the assignment.

4. Regression: With the use of the regression function, the data is smoothed out. Regression
may be multivariate or linear. Multiple regressions have more independent variables than
linear regressions, which only have one.
5. Clustering: This technique focuses mostly on the group. Data are grouped using clustering.
After that, clustering is used to find the outliers. After that, the comparable values are
grouped into a "group" or "cluster".

Process of Data Cleaning

The data cleaning method for data mining is demonstrated in the subsequent sections.

1. Monitoring the errors: Keep track of the areas where errors seem to occur most frequently.
It will be simpler to identify and maintain inaccurate or corrupt information. Information is
particularly important when integrating a potential substitute with current management
software.

2. Standardize the mining process: To help lower the likelihood of duplicity, standardize the
place of insertion.

3. Validate data accuracy: Analyses the data and spend money on data cleaning

software. Artificial intelligence-based tools were utilized to thoroughly check for accuracy.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 6


Data science
4. Research on data: Our data needs to be vetted, standardized, and duplicate-checked before
this action. There are numerous third-party sources, and these vetted and approved sources
can extract data straight from our databases. They assist us in gathering the data and
cleaning it up so that it is reliable, accurate, and comprehensive for use in business
decisions.

5. Communicate with the team: Keeping the group informed will help with client
development and strengthening as well as giving more focused information to potential
clients.

Usage of Data Cleaning in Data Mining

The following are some examples of how data cleaning is used in data mining:

Data Integration: Since it is challenging to guarantee quality with low-quality data,


data integration is crucial in resolving this issue.
• The process of merging information from various data sets into one is known as data
integration. Before transferring to the ultimate location, this step makes sure that the
embedded data set is standardized and formatted using data cleansing technologies.
Data Migration: The process of transferring a file from one system, format, or
application to another is known as data migration.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 7


Data science

• To ensure that the resulting data has the correct format, structure, and consistency
without any delicacy at the destination, it is crucial to maintain the data's quality,
security, and consistency while it is in transit.

Data Transformation: The data must be changed before being uploaded to a


location. Data cleansing, which takes into account system requirements for formatting,
organizing, etc.,
Data Debugging in ETL Processes:
To prepare data for reporting and analysis throughout the extract, transform, and load (ETL)
process, data cleansing is essential. Only high quality data are used for decision-making and
analysis thanks to data purification.
• Cleaning data is essential. For instance, a retail business could receive inaccurate or
duplicate data from different sources, including CRM or ERP systems. A reliable data
debugging tool would find and fix data discrepancies. The deleted information will be
transformed into a common format and transferred to the intended database.

Characteristics of Data Cleaning

To ensure the correctness, integrity, and security of corporate data, data cleaning is a requirement.
These may be of varying quality depending on the properties or attributes of the data. The key
components of data cleansing in data mining are as follows:

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 8


Data science
• Accuracy: The business's database must contain only extremely accurate data. Comparing
them to other sources is one technique to confirm their veracity. The stored data will also
have issues if the source cannot be located or contains errors.
• Coherence: To ensure that the information on a person or body is the same throughout all
types of storage, the data must be consistent with one another.
• Validity: There must be rules or limitations in place for the stored data. The information
must also be confirmed to support its veracity.
• Uniformity: A database's data must all share the same units or values. Since it doesn't
complicate the process, it is a crucial component while doing the Data Cleansing process.
• Data Verification: Every step of the process, including its appropriateness and
effectiveness, must be checked. The study, design, and validation stages all play a role in
the verification process. The disadvantages are frequently obvious after applying the data
to a specific number of changes.
• Clean Data Backflow: After addressing quality issues, the previously clean data must be
replaced with data that is not present in the source so that legacy applications can profit
from it and avoid the need for a subsequent data-cleaning program.

Tools for Data Cleaning in Data Mining


Data Cleansing Tools can be very helpful if you are not confident of cleaning the data yourself or
have no time to clean up all your data sets. You might need to invest in those tools, but it is worth
the expenditure. There are many data cleaning tools in the market. Here are some topranked data
cleaning tools, such as:

1. Open Refine

2. Trifacta Wrangler

3. Drake

4. Data Ladder

5. Data Cleaner
Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 9
Data science
6. Clouding

7. IBM Info sphere Quality Stage

8. TIBCO Clarity

9. Win pure

Benefits of Data Cleaning


• Removal of inaccuracies when several data sources are involved.

• Clients are happier and employees are less annoyed when there are fewer mistakes.
• The capacity to map out the many functions and the planned uses of your data.

• Monitoring mistakes and improving reporting make it easier to resolve inaccurate or


damaged data for future applications by allowing users to identify where issues are coming
from.
• Making decisions more quickly and with greater efficiency will be possible with the use of
data cleansing tools.

Data Integration in Data Mining

Data integration is the process of merging data from several disparate sources. While performing
data integration, you must work on data redundancy, inconsistency, duplicity, etc.

In data mining, data integration is a record preprocessing method that includes merging data from
a couple of the heterogeneous data sources into coherent data to retain and provide a unified
perspective of the data.

These assets could also include several record cubes, databases, or flat documents. The statistical
integration strategy is formally stated as a triple (G, S, M) approach. G represents the global
schema, S represents the heterogeneous source of schema and M represents the mapping between
source and global schema queries.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 10


Data science
The data integration methods are formally characterized as a triple (G, S, M), where;

G represents the global schema,

S represents the heterogeneous source of schema,

M represents the mapping between source and global schema queries.

Data Integration Approaches


There are mainly two types of approaches for data integration. These are as follows:

Tight Coupling

It is the process of using ETL (Extraction, Transformation, and Loading) to combine data from
various sources into a single physical location.

Loose Coupling

Facts with loose coupling are most effectively kept in the actual source databases. This approach
provides an interface that gets a query from the user, changes it into a format that the supply
database may understand, and then sends the query to the source databases without delay to obtain
the result.

Issues in Data Integration

Entity Identification Problem


As you understand, the records are obtained from heterogeneous sources, and how can you 'match

the real-world entities from the data'.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 11


Data science
• For example, assume that the discount is applied to the entire order in one machine, but in
every other machine, the discount is applied to each item in the order. This distinction
should be noted before the information from those assets is included in the goal system.

Redundancy and Correlation Analysis


 One of the major issues in the course of data integration is redundancy. Unimportant data
that are no longer required are referred to as redundant data. It may also appear due to
attributes created from the use of another property inside the information set.

Tuple Duplication
Information integration has also handled duplicate tuples in addition to redundancy. Duplicate
tuples may also appear in the generated information if the denormalized table was utilized as a
deliverable for data integration.

Data warfare Detection and backbone


The data warfare technique of combining records from several sources is unhealthy. In the same
way, that characteristic values can vary, so can statistics units. The disparity may be related to the
fact that they are represented differently within the special data units. For example, in one-ofakind
towns, the price of an inn room might be expressed in a particular currency. This type of issue is
recognized and fixed during the data integration process.

Data Integration Techniques


There are various data integration techniques in data mining. Some of them are as follows:

Manual Integration
This method avoids using automation during data integration. The data analyst collects, cleans,
and integrates the data to produce meaningful information. This strategy is suitable for a mini
organization with a limited data set. Although, it will be time-consuming for the huge,

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 12


Data science
sophisticated, and recurring integration. Because the entire process must be done manually, it is a
time-consuming operation.

Middleware Integration
The middleware software is used to take data from many sources, normalize it, and store it in the
resulting data set. When an enterprise needs to integrate data from legacy systems to modern
systems, this technique is used. Middleware software acts as a translator between legacy and
advanced systems. You may take an adapter that allows two systems with different interfaces to be
connected. It is only applicable to certain systems.

Application-based integration
It is using software applications to extract, transform, and load data from disparate sources. This
strategy saves time and effort, but it is a little more complicated because building such an
application necessitates technical understanding. This strategy saves time and effort, but it is a little
more complicated because building such an application necessitates technical understanding.

Uniform Access Integration


This method combines data from a more disparate source. However, the data's position is not
altered in this scenario; the data stays in its original location. This technique merely generates a
unified view of the integrated data. The integrated data does not need to be stored separately
because the end-user only sees the integrated view.

Data Transformation in Data Mining


Data transformation is a technique used to convert the raw data into a suitable format that
efficiently eases data mining and retrieves strategic information. Data transformation includes
data cleaning techniques and a data reduction technique to convert the data into the appropriate
form.
This could mean that data transformation may be:

• Constructive: The data transformation process adds, copies, or replicates data.


Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 13
Data science

• Destructive: The system deletes fields or records.

• Aesthetic: The transformation standardizes the data to meet requirements or parameters.

• Structural: The database is reorganized by renaming, moving, or combining columns.

Data Transformation Techniques

There are several data transformation techniques that can help structure and clean up the data
before analysis or storage in a data warehouse. Let's study all techniques used for data
transformation, some of which we have already studied in data reduction and data cleaning.

1. Data Smoothing
 Data smoothing is a process that is used to remove noise from the dataset using some

algorithms.
 The concept behind data smoothing is that it will be able to identify simple changes to help
predict different trends and patterns. This serves as a help to analysts or traders need to
look at a lot of data which can often be difficult to digest for finding patterns that they
finding patterns that they wouldn't see otherwise.

 We have seen how the noise is removed from the data using the techniques such as binning,
regression, clustering.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 14


Data science
 Binning: This method splits the sorted data into the number of bins and smoothens the data
values in each bin considering the neighborhood values around it.
 Regression: This method identifies the relation among two dependent attributes so that if
we have one attribute, it can be used to predict the other attribute.
 Clustering: This method groups similar data values and form a cluster. The values that lie
outside a cluster are known as outliers.

2. Attribute Construction
 In the attribute construction method, the new attributes consult the existing attributes to
construct a new data set that eases data mining. New attributes are created and applied to
assist the mining process from the given attributes. This simplifies the original data and
makes the mining more efficient.
 For example, suppose we have a data set referring to measurements of different plots, i.e.,
we may have the height and width of each plot. So here, we can construct a new attribute
'area' from attributes 'height' and 'weight'. This also helps understand the relations among
the attributes in a data set.

3. Data Aggregation
 Data collection or aggregation is the method of storing and presenting data in a summary
format. The data may be obtained from multiple data sources to integrate these data sources
into a data analysis description. This is a crucial step since the accuracy of data analysis
insights is highly dependent on the quantity and quality of the data used.
 Gathering accurate data of high quality and a large enough quantity is necessary to produce
relevant results. The collection of data is useful for everything from decisions concerning
financing or business strategy of the product, pricing, operations, and marketing strategies.
 For example, we have a data set of sales reports of an enterprise that has quarterly sales of
each year. We can aggregate the data to get the enterprise's annual sales report.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 15


Data science

4. Data Normalization
Normalizing the data refers to scaling the data values to a much smaller range such as [-1, 1] or
[0.0, 1.0]. There are different methods to normalize the data, as discussed below.

Consider that we have a numeric attribute A and we have n number of observed values for attribute
A that are V1, V2, V3, ….Vn.

o Min-max normalization: This method implements a linear transformation on the original


data. Let us consider that we have minA and maxA as the minimum and maximum value
observed for attribute A and Viis the value for attribute A that has to be normalized. The

min-max normalization would map Vi to the V'i in a new smaller range [new_minA,
new_maxA]. The formula for min-max normalization is given below:

For example, we have $1200 and $9800 as the minimum, and maximum value for the

attribute income, and [0.0, 1.0] is the range in which we have to map a value of $73,600.

The value $73,600 would be transformed using min-max normalization as follows:

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 16


Data science
o Z-score normalization: This method normalizes the value for attribute A using the mean
and standard deviation. The following formula is used for Z-score normalization:

Here Ᾱ and σA are the mean and standard deviation for attribute A, respectively. For
example, we have a mean and standard deviation for attribute A as $54,000 and $16,000.
And we have to normalize the value $73,600 using z-score normalization.

o Decimal Scaling: This method normalizes the value of attribute A by moving the decimal
point in the value. This movement of a decimal point depends on the maximum absolute
value of A. The formula for the decimal scaling is given below:

Here j is the smallest integer such that max(|v'i|)<1

For example, the observed values for attribute A range from -986 to 917, and the maximum
absolute value for attribute A is 986. Here, to normalize each value of attribute
A using decimal scaling, we have to divide each value of attribute A by 1000, i.e., j=3. So,
the value -986 would be normalized to -0.986, and 917 would be normalized to 0.917. The
normalization parameters such as mean, standard deviation, the maximum absolute value
must be preserved to normalize the future data uniformly.

5. Data Discretization
 This is a process of converting continuous data into a set of data intervals. Continuous
attribute values are substituted by small interval labels. This makes the data easier to study
and analyze. If a data mining task handles a continuous attribute, then its discrete values
can be replaced by constant quality attributes. This improves the efficiency of the task.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 17


Data science
 This method is also called a data reduction mechanism as it transforms a large dataset into
a set of categorical data. Discretization also uses decision tree-based algorithms to produce
short, compact, and accurate results when using discrete values.
 Data discretization can be classified into two types: supervised discretization, where the
class information is used, and unsupervised discretization, which is based on which
direction the process proceeds, i.e., 'top-down splitting strategy' or 'bottom-up merging
strategy'.
 For example, the values for the age attribute can be replaced by the interval labels such as
(0-10, 11-20…) or (kid, youth, adult, senior).

6. Data Generalization
 It converts low-level data attributes to high-level data attributes using concept hierarchy.
This conversion from a lower level to a higher conceptual level is useful to get a clearer
picture of the data. Data generalization can be divided into two approaches: o Data cube
process (OLAP) approach. o
Attribute-oriented induction (AOI) approach.

 For example, age data can be in the form of (20, 30) in a dataset. It is transformed into a
higher conceptual level into a categorical value (young, old).

Data Transformation Process

The entire process for transforming data is known as ETL (Extract, Load, and Transform).
Through the ETL process, analysts can convert data to its desired format. Here are the steps
involved in the data transformation process:

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 18


Data science

1. Data Discovery: During the first stage, analysts work to understand and identify data in its
source format. To do this, they will use data profiling tools. This step helps analysts decide
what they need to do to get data into its desired format.

2. Data Mapping: During this phase, analysts perform data mapping to determine how
individual fields are modified, mapped, filtered, joined, and aggregated. Data mapping is
essential to many data processes, and one misstep can lead to incorrect analysis and ripple
through your entire organization.

3. Data Extraction: During this phase, analysts extract the data from its original source.
These may include structured sources such as databases or streaming sources such as
customer log files from web applications.

4. Code Generation and Execution: Once the data has been extracted, analysts need to create
a code to complete the transformation. Often, analysts generate codes with the help of data
transformation platforms or tools.
5. Review: After transforming the data, analysts need to check it to ensure everything has
been formatted correctly.

6. Sending: The final step involves sending the data to its target destination. The target might
be a data warehouse or a database that handles both structured and unstructured data.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 19


Data science

Advantages of Data Transformation


Transforming data can help businesses in a variety of ways. Here are some of the essential

advantages of data transformation, such as:

• Better Organization: Transformed data is easier for both humans and computers to use.

• Improved Data Quality: There are many risks and costs associated with bad data. Data
transformation can help your organization eliminate quality issues such as missing values
and other inconsistencies.

• Perform Faster Queries: You can quickly and easily retrieve transformed data thanks to
it being stored and standardized in a source location.

• Better Data Management: Businesses are constantly generating data from more and more
sources. If there are inconsistencies in the metadata, it can be challenging to organize and
understand it. Data transformation refines your metadata, so it's easier to organize and
understand.
• More Use Out of Data: While businesses may be collecting data constantly, a lot of that
data sits around unanalyzed. Transformation makes it easier to get the most out of your
data by standardizing it and making it more usable.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 20


Data science

Disadvantages of Data Transformation

While data transformation comes with a lot of benefits, still there are some challenges to
transforming data effectively, such as:
• Data transformation can be expensive. The cost is dependent on the specific
infrastructure, software, and tools used to process data. Expenses may include licensing,
computing resources, and hiring necessary personnel.
• Data transformation processes can be resource-intensive. Performing transformations
in an on-premises data warehouse after loading or transforming data before feeding it into
applications can create a computational burden that slows down other operations. If you
use a cloud-based data warehouse, you can do the transformations after loading because
the platform can scale up to meet demand.
• Lack of expertise and carelessness can introduce problems during transformation.
Data

analysts without appropriate subject matter expertise are less likely to notice incorrect data
because they are less familiar with the range of accurate and permissible values.
• Enterprises can perform transformations that don't suit their needs. A business might
change information to a specific format for one application only to then revert the
information to its prior format for a different application.

Data Reduction in Data Mining

• Data mining is applied to the selected data in a large amount database. When data analysis
and mining is done on a huge amount of data, then it takes a very long time to process,
making it impractical and infeasible.
• Data reduction techniques ensure the integrity of data while reducing the data. Data
reduction is a process that reduces the volume of original data and represents it in a much
smaller volume. Data reduction techniques are used to obtain a reduced representation of
the dataset that is much smaller in volume by maintaining the integrity of the original data.

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 21


Data science
By reducing the data, the efficiency of the data mining process is improved, which produces
the same analytical results.
• Data reduction does not affect the result obtained from data mining. That means the result
obtained from data mining before and after data reduction is the same or almost the same.
• Data reduction aims to define it more compactly. When the data size is smaller, it is simpler
to apply sophisticated and computationally high-priced algorithms. The reduction of the
data may be in terms of the number of rows (records) or terms of the number of
columns (dimensions).

Techniques of Data Reduction

Ms. Navya shree A, Asst Professor,Dept of BCA, SSIBM,Tumkuru Page 22

You might also like