Lecture 5 Clones
Lecture 5 Clones
SOEN 6431 1
Human
Cloning is
challenging!
Right?
What about
cloning in
software
development?
As easy as:
SOEN 6431 2
Code is
Copied [432] NS_IMETHODIMP [467] NS_IMETHODIMP [497] NS_IMETHODIMP
Small Example from the Mozilla [433]
[434]
LocationImpl::GetPathname(nsString
{
[468]
[469]
LocationImpl::SetPathname(const nsString
{
[498]
[499]
LocationImpl::GetPort(nsString& aPort)
{
Distribution (Milestone 9) [435]
[436]
nsAutoString href;
nsIURI *url;
[470]
[471]
nsAutoString href;
nsIURI *url;
[500]
[501]
nsAutoString href;
nsIURI *url;
[437] nsresult result = NS_OK; [472] nsresult result = NS_OK; [502] nsresult result = NS_OK;
Extract from [438]
[439] result = GetHref(href);
[473]
[474] result = GetHref(href);
[503]
[504] result = GetHref(href);
/dom/src/base/nsLocation.cpp [440]
[441]
if (NS_OK == result) {
#ifndef NECKO
[475]
[476]
if (NS_OK == result) {
#ifndef NECKO
[505]
[506]
if (NS_OK == result) {
#ifndef NECKO
[442] result = NS_NewURL(&url, href); [477] result = NS_NewURL(&url, href); [507] result = NS_NewURL(&url, href);
[443] #else [478] #else [508] #else
[444] result = NS_NewURI(&url, href); [479] result = NS_NewURI(&url, href); [509] result = NS_NewURI(&url, href);
[445] #endif // NECKO [480] #endif // NECKO [510] #endif // NECKO
[446] if (NS_OK == result) { [481] if (NS_OK == result) { [511] if (NS_OK == result) {
[447] #ifdef NECKO [482] char *buf = aPathname.ToNewCString(); [512] aPort.SetLength(0);
[448] char* file; [483] #ifdef NECKO [513] #ifdef NECKO
[449] result = url->GetPath(&file); [484] url->SetPath(buf); [514] PRInt32 port;
[450] #else [485] #else [515] (void)url->GetPort(&port);
[451] const char* file; [486] url->SetFile(buf); [516] #else
[452] result = url->GetFile(&file); [487] #endif [517] PRUint32 port;
[453] #endif [488] SetURL(url); [518] (void)url->GetHostPort(&port);
[454] if (result == NS_OK) { [489] delete[] buf; [519] #endif
[455] aPathname.SetString(file); [490] NS_RELEASE(url); [520] if (-1 != port) {
[456] #ifdef NECKO [491] } [521] aPort.Append(port, 10);
[457] nsCRT::free(file); [492] } [522] }
[458] #endif [493] [523] NS_RELEASE(url);
[459] } [494] return result; [524] }
[460] NS_IF_RELEASE(url); [495] } [525] }
[461] } [496] [526]
[462] } [527] return result;
[463] [528] }
[464] return result; [529]
[465] }
[466]
3
General negative effect
• Code bloat
4
How Much
Code is Duplication
with
Case Study LOC without
Duplicated? comments
comments
5
What is Duplicated Code?
Duplicated Code = Source code segments that are found in different places of a system.
in different files
in the same file but in different functions
in the same function
The segments must contain some logic or structure that can be abstracted, i.e.,
Copied artifacts range from expressions, to functions, to data structures, and to entire
subsystems.
6
Development Time
◦ Cloning a procedure rather than extracting a common part
may save on time
Communication
Why Cloning ◦ A code set may be borrowed but its working might not be
clear
Occurs Structural
◦ Code borrowed from an un-modifiable subsystem.
Coincidence
◦ Look Alikes and clones are difficult to differentiate.
7
Definitions
Clone Class/Set: Set of equivalent
Clones
Clone-pair
8
Clones?
02/22/2024 11
Type 2
clones
02/22/2024 12
Clones?
02/22/2024 13
Type 3 Clones
02/22/2024 14
Clone Definition
(Source Code
Clone)
Similar code fragments
◦ Type 1: Identical except
whitespaces …
◦ Type 2: Identical except
variable names ...
◦ Type 3: Identical except a
few missing…
◦ Type 4: Similar
functionality
Alternative Classification
02/22/2024 16
Code Detection
02/22/2024 SOEN 6431 17
Detection Techniques
Array a = new Array(10);
18
02/22/2024 SOEN 6431 19
02/22/2024 SOEN 6431 20
Detection Strategies
21
Exact String
Matching
Definition
◦ Two sections of code are said to be a maximal
exact match if their lines match exactly
character by character but the preceding lines
do not match and the following lines do not
match.
22
Parameterized String Example
Was found in the X-Window C code
Fragment 1:
copy-number (&pmin, &pmax ,
pfi->min-bounds.lbearing,
pfi->max-bounds.lbeaing);
*pmin++ = *pmax++ = J , J ;
copy-number(&pmin, kpmax,
pfi->min-bounds.rbearing,
pf i->max-bounds .rbearing) ;
*pmin++ = *pmax++ = J , J ;
Fragment 2:
copy-number(&pmin, &pmax,
pfh->min-bounds.left,
pfh->max-bounds.left);
*pmin++ = *pmax++ = J , J ;
copy-number(&pmin, &pmax,
pfh->min-bounds.right,
pfh->max-bounds.right); 23
Substring Matching
Substring Matching provides a faster search algorithm.
Phases
1.Normalization
2.Substring Generation
3.Matching
4.Consolidation
5.Reporting
24
Token Parsing
Techniques
Transforms code into tokens by using
language specific constructs
25
Token Parsing Example
26
Token Parsing Example
27
Token Parsing Example
28
Token Parsing Example
29
Graph
Matching
Techniques
Form machine
representation of
code
Identify clones as
identical subgraphs
30
Abstract Syntax Tree Matching
31
Vertices are lines of code
Dependence
Graph
NP complete in general,
Matching k-cutoff in maximal Experiments
graph size used to limit determine k=20 best
runtime
32
02/22/2024 SOEN 6431 33
Need to evaluate different clone detection
techniques
Hard to know the real number of clones in
a non-trivial application
How to compare different types of clones?
35
Comparison
of Clone
Detectors
36
Comparison of Clone Detectors
38
Clone Management
39
Clone Genealogies
40
How do Clones evolve
Consistently changing clones - all lineages in
the clone genealogy include at least one
“consistent change pattern”
Volatile clones – measured based on
presence across various versions. “K-volatile”
Clone
Locally Non-refactorable clones –
genealogy Programmer cannot refactor using pull-up or
extract methods.
Long Lived Clones – Clones that lived across
various versions of the program. Ideal for
refactoring
42
What
can we
do about
clones?
02/22/2024 43
Developers continuously modify, enhance
and adapt software.
44
Decrease the complexity of software by improving its
internal quality by restructuring the software.
45
Software restructuring is informally stated as
the modifications of software to make it
• easier to understand;
• easier to change;
• easier to change its documentation;
• less susceptible to faults when changes are made to it.
SOFTWARE A higher level goal of restructuring is to
RESTRUCT increase the software value
URING: • external software value: fewer faults in software is seen
to be better by customers
CORE IDEA • internal software value: a well-structured system is less
expensive to maintain
46
Developers and managers need to be aware
of restructuring for the following reasons
• better understandability
• keep pace with new structures
• better reliability
• longer lifetime
• automated analysis
SOFTWARE
RESTRUCT Characteristics of restructuring and
refactoring
URING: • The objective of restructuring and refactoring is to
CORE IDEA improve the internal and external values of software.
• Restructuring preserves the external behavior of the
original program.
• Restructuring can be performed without adding new
requirements.
• Restructuring generally produces a program in the same
language.
• Example: a C program is restructured into another C
program.
47
To restructure a software system, one follows a
process with well defined activities.
• Identify what to refactor.
• Determine which refactorings to apply.
• Ensure that refactoring preserves the software’s
behavior.
• Apply the refactorings to the chosen entities.
• Evaluate the impacts of the refactorings.
Activities in a • Maintain consistency.
Refactoring The programmer identifies what to refactor
Process from a set of high-level software artifacts.
• source code;
• design documents; and
• requirements documents.
48
The concept of code smell is applied to
source code to detect what should be
refactored. (Fowler)
• duplicate code;
• long parameter list;
• long methods;
• large classes;
• message chain.
49
Tool support is needed to identify a feasible
subset of refactorings.
50
• Ideally, the input/output behavior of a
program after refactoring is the same as the
behavior before refactoring.
• In many applications, preservation of non-
functional requirements is necessary.
• A non-exclusive list of such non-functional
Ensure that requirements is as follows:
refactoring ◦ Temporal constraints: A temporal constraint
over a sequence of operations is that the
preserves operations occur in a certain order.
the ◦ For real-time systems, refactorings should preserve
temporal constraints.
software’s ◦ Resource constraints: The software after
behavior. refactoring does not demand more resources:
memory, energy, communication bandwidth,
and so on.
◦ Safety constraints: It is important that the
software does not lose its safety properties
after refactoring.
51
Two pragmatic ways of showing that
refactoring preserves the software’s
behavior.
Ensure that
refactoring Testing
preserves
the • Exhaustively test the software before and after
applying refactorings, and compare the
software’s observed behavior on a test-by-test basis.
behavior.
Verification of preservation of call
sequence
• Ensure that the sequence(s) of method calls
are preserved in the refactored program.
52
“Extract” -
Use standard Make a
procedure
Refactoring
methods
“Pull Up” -
Make an
superclass
Code
Clone
Refactoring of code clones
Refactorin
advocated strongly by many
practitioners including
g
Fowler.
53
Extract
54
Pull Up
55
Unfactorable Code Clone
56
What Else can we do about Clones (cont.)
02/22/2024 57
Clones should be
avoided, no matter
what !!
Clones/code
quality/maintenance is
there any relationship?
58
Why Clones can be good
02/22/2024 59
Bad News
02/22/2024 60
CLONE DETECTION TOOLS
61
CCFinder
62
Found 6 duplicate lines in the following files:
63