[go: up one dir, main page]

0% found this document useful (0 votes)
43 views63 pages

Lecture 5 Clones

The document discusses the challenges of code cloning in software development. It notes that while cloning code is easy to do, it can have negative effects such as code bloat, copied defects, and increased work for future changes. Studies have found that normal code bases contain 8-12% duplicated code, while more than 15-25% duplication is considered a lot. Common types of duplicated code include code segments found in different files, functions, or within the same function. Managing duplicated code is important for software quality and maintenance.

Uploaded by

berlin.asd123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views63 pages

Lecture 5 Clones

The document discusses the challenges of code cloning in software development. It notes that while cloning code is easy to do, it can have negative effects such as code bloat, copied defects, and increased work for future changes. Studies have found that normal code bases contain 8-12% duplicated code, while more than 15-25% duplication is considered a lot. Common types of duplicated code include code segments found in different files, functions, or within the same function. Managing duplicated code is important for software quality and maintenance.

Uploaded by

berlin.asd123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

Clones

SOEN 6431 1
Human
Cloning is
challenging!
Right?

What about
cloning in
software
development?

As easy as:

SOEN 6431 2
Code is
Copied [432] NS_IMETHODIMP [467] NS_IMETHODIMP [497] NS_IMETHODIMP
Small Example from the Mozilla [433]
[434]
LocationImpl::GetPathname(nsString
{
[468]
[469]
LocationImpl::SetPathname(const nsString
{
[498]
[499]
LocationImpl::GetPort(nsString& aPort)
{
Distribution (Milestone 9) [435]
[436]
nsAutoString href;
nsIURI *url;
[470]
[471]
nsAutoString href;
nsIURI *url;
[500]
[501]
nsAutoString href;
nsIURI *url;
[437] nsresult result = NS_OK; [472] nsresult result = NS_OK; [502] nsresult result = NS_OK;
Extract from [438]
[439] result = GetHref(href);
[473]
[474] result = GetHref(href);
[503]
[504] result = GetHref(href);
/dom/src/base/nsLocation.cpp [440]
[441]
if (NS_OK == result) {
#ifndef NECKO
[475]
[476]
if (NS_OK == result) {
#ifndef NECKO
[505]
[506]
if (NS_OK == result) {
#ifndef NECKO
[442] result = NS_NewURL(&url, href); [477] result = NS_NewURL(&url, href); [507] result = NS_NewURL(&url, href);
[443] #else [478] #else [508] #else
[444] result = NS_NewURI(&url, href); [479] result = NS_NewURI(&url, href); [509] result = NS_NewURI(&url, href);
[445] #endif // NECKO [480] #endif // NECKO [510] #endif // NECKO
[446] if (NS_OK == result) { [481] if (NS_OK == result) { [511] if (NS_OK == result) {
[447] #ifdef NECKO [482] char *buf = aPathname.ToNewCString(); [512] aPort.SetLength(0);
[448] char* file; [483] #ifdef NECKO [513] #ifdef NECKO
[449] result = url->GetPath(&file); [484] url->SetPath(buf); [514] PRInt32 port;
[450] #else [485] #else [515] (void)url->GetPort(&port);
[451] const char* file; [486] url->SetFile(buf); [516] #else
[452] result = url->GetFile(&file); [487] #endif [517] PRUint32 port;
[453] #endif [488] SetURL(url); [518] (void)url->GetHostPort(&port);
[454] if (result == NS_OK) { [489] delete[] buf; [519] #endif
[455] aPathname.SetString(file); [490] NS_RELEASE(url); [520] if (-1 != port) {
[456] #ifdef NECKO [491] } [521] aPort.Append(port, 10);
[457] nsCRT::free(file); [492] } [522] }
[458] #endif [493] [523] NS_RELEASE(url);
[459] } [494] return result; [524] }
[460] NS_IF_RELEASE(url); [495] } [525] }
[461] } [496] [526]
[462] } [527] return result;
[463] [528] }
[464] return result; [529]
[465] }
[466]

3
General negative effect
• Code bloat

Negative effects on Software


Maintenance
Why Clones • Copied Defects
• Changes take double, triple, quadruple, ...
are bad?! Work
• Dead code
• Add to the cognitive load of future maintainers

Copying as additional source of defects


• Errors in the systematic renaming produce
unintended aliasing

4
How Much
Code is Duplication
with
Case Study LOC without
Duplicated? comments
comments

Usual estimates: 8 to 12% in


gcc 460’000 8.7% 5.6%
normal industrial code
15 to 25 % is already a lot!
Database 245’000 36.4% 23.3%
Server

Payroll 40’000 59.3% 25.4%

Message 6’500 29.4% 17.4%


Board

5
What is Duplicated Code?
 Duplicated Code = Source code segments that are found in different places of a system.
 in different files
 in the same file but in different functions
 in the same function
 The segments must contain some logic or structure that can be abstracted, i.e.,

... ... is not considered


computeIt(a,b,c,d); computeIt(w,x,y,z); duplicated code.
... ...

... ... could be abstracted


getIt(hash(tail(z))); getIt(hash(tail(a))); to a new function
... ...

 Copied artifacts range from expressions, to functions, to data structures, and to entire
subsystems.

6
Development Time
◦ Cloning a procedure rather than extracting a common part
may save on time

Communication
Why Cloning ◦ A code set may be borrowed but its working might not be
clear
Occurs Structural
◦ Code borrowed from an un-modifiable subsystem.

Coincidence
◦ Look Alikes and clones are difficult to differentiate.

7
Definitions
Clone Class/Set: Set of equivalent
Clones
Clone-pair

Precision: Percent of reported


clones that are genuine

Recall: Percent of genuine clones


that are reported

8
Clones?

02/22/2024 SOEN 6431 9


Type 1
Clones

SOEN 6431 02/22/2024 10


Clones?

02/22/2024 11
Type 2
clones

02/22/2024 12
Clones?

02/22/2024 13
Type 3 Clones

02/22/2024 14
Clone Definition
(Source Code
Clone)
Similar code fragments
◦ Type 1: Identical except
whitespaces …
◦ Type 2: Identical except
variable names ...
◦ Type 3: Identical except a
few missing…
◦ Type 4: Similar
functionality
Alternative Classification
02/22/2024 16
Code Detection
02/22/2024 SOEN 6431 17
Detection Techniques
Array a = new Array(10);

String Matching – Represents and


evaluates code using string comparisons.

Token Parsing – Code transformation


into tokens for comparison.

Graph Matching – Pattern matching on


graph representations of code.

18
02/22/2024 SOEN 6431 19
02/22/2024 SOEN 6431 20
Detection Strategies
21
Exact String
Matching
Definition
◦ Two sections of code are said to be a maximal
exact match if their lines match exactly
character by character but the preceding lines
do not match and the following lines do not
match.

22
Parameterized String Example
 Was found in the X-Window C code
 Fragment 1:
copy-number (&pmin, &pmax ,
pfi->min-bounds.lbearing,
pfi->max-bounds.lbeaing);
*pmin++ = *pmax++ = J , J ;
copy-number(&pmin, kpmax,
pfi->min-bounds.rbearing,
pf i->max-bounds .rbearing) ;
*pmin++ = *pmax++ = J , J ;

 Fragment 2:
copy-number(&pmin, &pmax,
pfh->min-bounds.left,
pfh->max-bounds.left);
*pmin++ = *pmax++ = J , J ;
copy-number(&pmin, &pmax,
pfh->min-bounds.right,
pfh->max-bounds.right); 23
Substring Matching
Substring Matching provides a faster search algorithm.
Phases
1.Normalization
2.Substring Generation
3.Matching
4.Consolidation
5.Reporting

24
Token Parsing
Techniques
Transforms code into tokens by using
language specific constructs

Find similarities within this token string

Transform token clones back into code clones


for presentation

25
Token Parsing Example

26
Token Parsing Example

27
Token Parsing Example

28
Token Parsing Example

29
Graph
Matching
Techniques
Form machine
representation of
code

Identify clones as
identical subgraphs

30
Abstract Syntax Tree Matching

31
Vertices are lines of code

Edges are attributed with different types


Program of dependences (control flow, data, etc)

Dependence
Graph
NP complete in general,
Matching k-cutoff in maximal Experiments
graph size used to limit determine k=20 best
runtime

O(|V|2) possible graph starting points,


reduced via heuristic

32
02/22/2024 SOEN 6431 33
Need to evaluate different clone detection
techniques
Hard to know the real number of clones in
a non-trivial application
How to compare different types of clones?

Metrics? LOC: Line number count

SLOC: Line number count after the removal


of blanks
%LOC: Percent of lines with clones in them

%FILE: Percent of files with clones in them

35
Comparison
of Clone
Detectors

36
Comparison of Clone Detectors
38
Clone Management

39
Clone Genealogies

Refers to the life cycle of clones through various versions of a program.


Evolution Patterns of a code clone and their relationship are seen.

40
How do Clones evolve
Consistently changing clones - all lineages in
the clone genealogy include at least one
“consistent change pattern”
Volatile clones – measured based on
presence across various versions. “K-volatile”
Clone
Locally Non-refactorable clones –
genealogy Programmer cannot refactor using pull-up or
extract methods.
Long Lived Clones – Clones that lived across
various versions of the program. Ideal for
refactoring

42
What
can we
do about
clones?

02/22/2024 43
Developers continuously modify, enhance
and adapt software.

As software evolves and strays away from


its original design, three things happen.
Software • Decreased understandability
Restructuring: • Decreased reliability
General Idea • Increased maintenance cost

Decreased understandability is due to


• Increased complexity of code
• Out-of-date documentation
• Code not conforming to standards

44
Decrease the complexity of software by improving its
internal quality by restructuring the software.

Restructuring applied on object-oriented software is called


refactoring.

SOFTWARE Restructuring means reorganizing software (source code +


documentation) to give it a different look, or structure.
RESTRUCT
URING: Source code is restructured to improve some of its non-
functional requirements:
GENERAL •

Readability
Extensibility
IDEA •

Maintainability
Modularity

Restructuring does not modify the software’s


functionalities.

Restructuring can be performed while adding new features.

45
Software restructuring is informally stated as
the modifications of software to make it
• easier to understand;
• easier to change;
• easier to change its documentation;
• less susceptible to faults when changes are made to it.
SOFTWARE A higher level goal of restructuring is to
RESTRUCT increase the software value
URING: • external software value: fewer faults in software is seen
to be better by customers
CORE IDEA • internal software value: a well-structured system is less
expensive to maintain

Simple examples of restructuring


• Pretty printing
• Meaningful names for variables
• One statement per line of source code

46
Developers and managers need to be aware
of restructuring for the following reasons
• better understandability
• keep pace with new structures
• better reliability
• longer lifetime
• automated analysis
SOFTWARE
RESTRUCT Characteristics of restructuring and
refactoring
URING: • The objective of restructuring and refactoring is to
CORE IDEA improve the internal and external values of software.
• Restructuring preserves the external behavior of the
original program.
• Restructuring can be performed without adding new
requirements.
• Restructuring generally produces a program in the same
language.
• Example: a C program is restructured into another C
program.

47
To restructure a software system, one follows a
process with well defined activities.
• Identify what to refactor.
• Determine which refactorings to apply.
• Ensure that refactoring preserves the software’s
behavior.
• Apply the refactorings to the chosen entities.
• Evaluate the impacts of the refactorings.
Activities in a • Maintain consistency.
Refactoring The programmer identifies what to refactor
Process from a set of high-level software artifacts.
• source code;
• design documents; and
• requirements documents.

Next, focus on specific portions of the chosen


artifact for refactoring.
• Specific modules, functions, classes, methods, and data
can be identified for refactoring.

48
The concept of code smell is applied to
source code to detect what should be
refactored. (Fowler)

A code smell is any symptom in source


code that possibly indicates a deeper
problem.
Identify what
to refactor
Examples of code smell are:

• duplicate code;
• long parameter list;
• long methods;
• large classes;
• message chain.

49
Tool support is needed to identify a feasible
subset of refactorings.

The following two techniques can be used to


analyze a set of refactorings to select a feasible
Determine subset.

which • Critical pair analysis


• Given a set of refactorings, analyze each pair for conflicts.
refactorings A apir is said to be conflicting if both of them cannot be
applied together.
to apply • Example: R4 and R6 constitute a conflicting pair.
• Sequential dependency analysis
• In order to apply a refactoring, one or more refactorings
must be applied before.
• If one refactoring has already been applied, a mutually
exclusive refactoring cannot be applied anymore.
• Example: after applying R1, R2, and R3, R4 becomes
applicable. Now, if R4 is applied, then R6 is not applicable
anymore.

50
• Ideally, the input/output behavior of a
program after refactoring is the same as the
behavior before refactoring.
• In many applications, preservation of non-
functional requirements is necessary.
• A non-exclusive list of such non-functional
Ensure that requirements is as follows:
refactoring ◦ Temporal constraints: A temporal constraint
over a sequence of operations is that the
preserves operations occur in a certain order.
the ◦ For real-time systems, refactorings should preserve
temporal constraints.
software’s ◦ Resource constraints: The software after
behavior. refactoring does not demand more resources:
memory, energy, communication bandwidth,
and so on.
◦ Safety constraints: It is important that the
software does not lose its safety properties
after refactoring.

51
Two pragmatic ways of showing that
refactoring preserves the software’s
behavior.

Ensure that
refactoring Testing
preserves
the • Exhaustively test the software before and after
applying refactorings, and compare the
software’s observed behavior on a test-by-test basis.
behavior.
Verification of preservation of call
sequence
• Ensure that the sequence(s) of method calls
are preserved in the refactored program.

52
“Extract” -
Use standard Make a
procedure
Refactoring
methods
“Pull Up” -
Make an
superclass
Code
Clone
Refactoring of code clones
Refactorin
advocated strongly by many
practitioners including
g
Fowler.

53
Extract

54
Pull Up

55
Unfactorable Code Clone
56
What Else can we do about Clones (cont.)

02/22/2024 57
Clones should be
avoided, no matter
what !!

Clones are good! We Controversial


should not remove all
clones !! statements

Clones/code
quality/maintenance is
there any relationship?

58
Why Clones can be good
02/22/2024 59
Bad News
02/22/2024 60
CLONE DETECTION TOOLS

61
CCFinder
62
Found 6 duplicate lines in the following files:

Between lines 201 and 207 in


simian/build/dist/src/java/awt/image/WritableRaster.j
ava
Between lines 1305 and 1311 in
simian/build/dist/src/java/awt/image/Raster.java

Simian Found 6 duplicate lines in the following files:

Between lines 920 and 926 in


simian/build/dist/src/com/sun/imageio/plugins/jpeg/J
FIFMarkerSegment.java
Between lines 908 and 914 in
simian/build/dist/src/com/sun/imageio/plugins/jpeg/J
FIFMarkerSegment.java
...

63

You might also like