3 Programs & Programming
3 Programs & Programming
3 Programs & Programming
SECURITY IN
COMPUTING,
FIFTH EDITION
Chapter 3: Programs and Programming
2
Code and data separated, with the heap The same hex value in the same spot in
growing up toward high addresses and the memory can either be a meaningful data
stack growing down from the high value or a meaningful instruction depending
addresses. on whether the computer treats it as code
or data.
3
Buffer Overflows
• Occur when data is written beyond the space allocated for it, such as a
10th byte in a 9-byte array
• The trick for an attacker is finding buffer overflow opportunities that lead
to overwritten memory being executed, and finding the right code to input
char sample[10];
int i;
This is a very simple buffer overflow.
Character B is placed in memory that
for (i=0; i<=9; i++)
sample[i] = ‘A’;
wasn’t allocated by or for this procedure.
sample[10] = ‘B’;
4
Memory Organization
Examples of buffer overflow effects in the context of the earlier AAAAAAAAAAB example.
The memory that’s overwritten depends on where the buffer resides.
6
When procedure A calls procedure B, procedure B gets added to the stack along
with a pointer back to procedure A. In this way, when procedure B is finished
running, it can get popped off the stack, and procedure A will just continue
executing where it left off.
7
Compromised Stack
At its simplest, then, a stack overflow can result in some form of denial-of-service
attack on a system.
Of more interest to the attacker, rather than immediately crashing the program, is
to have it transfer control to a location and code of the attacker’s choosing.
The simplest way of doing this is for the input causing the buffer overflow to
contain the desired target address at the point where it will overwrite the saved
return address in the stack frame.
Then when the attacked function finishes and executes the return instruction,
instead of returning to the calling function, it will jump to the supplied address
instead and execute instructions from there.
Stack Buffer Overflows
Note in this program that the buffers are both the same size. This is a quite
common practice in C programs.
Indeed, the standard C IO library has a defined constant BUFSIZ, which is the
default size of the input buffers it uses.
The problem that may result, as it does in this example, occurs when data are
being merged into a buffer that includes the contents of another buffer, such
that the space needed exceeds the space available.
For the first run in 10.7b, the value read is small enough that the merged
response didn’t corrupt the stack frame.
For the second run, the supplied input was much too large. However, because
a safe input function was used, only 15 characters were read, as shown in the
following line. When this was then merged with the response string, the result
was larger than the space available in the destination buffer.
It overwrote the saved frame pointer, but not the return address. So the
function returned, as shown by the message printed by the main() function.
But when main() tried to return, because its stack frame had been corrupted
and was now some random value, the program jumped to an illegal address
and crashed.
In this case the combined result was not long enough to reach the return
address, but this would be possible if a larger buffer size had been used.
Unsafe C Standard Library Routines
This shows that when looking for buffer overflows, all possible places where externally sourced
data are copied or merged have to be located.
Note that these do not even have to be in the code for a particular program, they can (and
indeed do) occur in library routines used by programs, including both standard libraries and
third-party application libraries.
Thus, for both attacker and defender, the scope of possible buffer overflow locations is very
large.
A list of some of the most common unsafe standard C Library routines is given in Table 10.2 .
These routines are all suspect and should not be used without checking the total size of data
being transferred in advance, or better still by being replaced with safer alternatives.
Machine code
•Specific to processor and operating system. Traditionally needed good assembly language skills to create
Metasploit Project
•Provides useful information to people who perform penetration and exploit research
This is the major reason why buffer overflow attacks are usually targeted at a
specific piece of software running on a specific operating system.
However, more recently a number of sites and tools have been developed that
automate this process thus making the development of shellcode exploits available
to a much larger potential audience
Stack Overflow Variants
Target program can
be: Shellcode functions
Launch a remote shell when connected to
A trusted system utility
Create a reverse shell that connects back to the hacker
Commonly used library code Break out of a chroot (restricted execution) environment, giving full access to
the system
The targeted program need not be a trusted system utility. Another possible target is a program providing a network service; that is, a network
daemon. A common approach for such programs is listening for connection requests from clients and then spawning a child process to handle
that request. The child process typically has the network connection mapped to its standard input and output. This means the child program’s
code may use the same type of unsafe input or buffer copy code as we’ve seen already.
This was indeed the case with the stack overflow attack used by the Morris Worm back in 1988. It targeted the use of gets() in the fingerd
daemon handling requests for the UNIX finger network service (which provided information on the users on the system).
Yet another possible target is a program, or library code, which handles common document formats (e.g., the library routines used to decode and
display GIF or JPEG images). In this case, the input is not from a terminal or network connection, but from the file being decoded and displayed.
If such code contains a buffer overflow, it can be triggered as the file contents are read, with the details encoded in a specially corrupted image.
This attack file would be distributed via e-mail, instant messaging, or as part of a Web page. Because the attacker is not directly interacting with
the targeted program and system, the shellcode would typically open a network connection back to a system under the attacker’s control, to
return information and possibly receive additional commands to execute.
All of this shows that buffer overflows can be found in a wide variety of programs, processing a range of different input, and with a variety of
possible responses.
Packetstorm
Packet Storm includes a large collection of packaged shellcode, including code that can
1. Dehashed—View leaked credentials. 16. URL Scan—Free service to scan and analyse websites.
2. SecurityTrails—Extensive DNS data. 17. Vulners—Search vulnerabilities in a large database.
3. DorkSearch—Really fast Google dorking. 18. WayBackMachine—View content from deleted websites.
4. ExploitDB—Archive of various exploits. 19. Shodan—Search for devices connected to the internet.
5. ZoomEye—Gather information about targets. 20. Netlas—Search and monitor internet connected assets.
6. Pulsedive—Search for threat intelligence. 21. CRT sh—Search for certs that have been logged by CT.
7. GrayHatWarfare—Search public S3 buckets. 22. Wigle—Database of wireless networks, with statistics.
8. PolySwarm—Scan files and URLs for threats. 23. PublicWWW—Marketing and affiliate marketing research.
9. Fofa—Search for various threat intelligence. 24. Binary Edge—Scans the internet for threat intelligence.
10. LeakIX—Search publicly indexed information. 25. GreyNoise—Search for devices connected to the internet.
11. DNSDumpster—Search for DNS records quickly. 26. Hunter—Search for email addresses belonging to a website.
12. FullHunt—Search and discovery attack surfaces. 27. Censys—Assessing attack surface for connected devices.
13. AlienVault—Extensive threat intelligence feed. 28. IntelligenceX—Search Tor, I2P, data leaks, domains, and emails.
14. ONYPHE—Collects cyber-threat intelligence data. 29. Packet Storm Security—Browse latest vulnerabilities and exploits.
15. Grep App—Search across a half million git repos. 30. SearchCode—Search 75 billion lines of code from 40m projects.
Buffer Overflow Defenses
Two broad
defense
approaches
Compile-time Run-time
Aim to harden
Aim to detect and
programs to resist
abort attacks in
attacks in new
existing programs
programs
There is a need to defend systems against such attacks by either preventing them, or at least detecting and aborting such
attacks. These can be broadly classified into two categories:
• Compile-time defenses, which aim to harden programs to resist attacks in new programs
• Run-time defenses, which aim to detect and abort attacks in existing programs
While suitable defenses have been known for a couple of decades, the very large existing base of vulnerable software and
systems hinders their deployment. Hence the interest in run-time defenses, which can be deployed as operating systems and
updates and can provide some protection for existing vulnerable programs.
Compile-time defenses aim to prevent or detect buffer overflows by instrumenting programs when they are compiled. The
possibilities for doing this range from choosing a high-level language that does not permit buffer overflows, to encouraging safe
coding standards, using safe standard libraries, or including additional code to detect corruption of the stack frame.
Compile-Time Defenses: Programming Language
• Requires support from memory management • Support for executable stack code
unit (MMU) • Special provisions are needed
• Long existed on SPARC / Solaris systems
• Recent on x86 Linux/Unix/Windows systems
• Programmers often make assumptions about the maximum expected size of input
• Allocated buffer size is not confirmed
• Resulting in buffer overflow
• 2014 Heartbleed OpenSSL bug is a recent example of a failure to check the validity of a binary
input value…….next
Heartbeat Protocol
• A periodic signal generated by hardware or software to
indicate normal operation or to synchronize other parts of a
system
• Typically used to monitor the availability of a protocol entity
• Runs on top of the TLS Record Protocol
• Use is established during Phase 1 of the Handshake
Protocol
• Each peer indicates whether it supports heartbeats
• Serves two purposes:
• Assures the sender that the recipient is still alive
• Generates activity across the connection during idle periods
Heartbleed
• The Heartbleed bug allows anyone on the Internet to read the memory of the systems
protected by the vulnerable versions of the OpenSSL software.
• OpenSSL is the most popular open source cryptographic library and TLS implementation
used to encrypt traffic on the Internet.
• The most notable software using OpenSSL are the open source web servers like Apache and
nginx. The combined market share of just those two out of the active sites on the Internet was
over 66% at the time.
• This compromises the secret keys used to identify service providers & to encrypt traffic, the
names & passwords of users & actual content. This allows attackers to eavesdrop on
communications, steal data directly from the services & to impersonate services and users.
• OpenSSL was patched quickly but OS vendors and appliance vendors, independent software
vendors have to adopt the fix and notify their users.
• This bug has left large amount of private keys & other secrets exposed on Internet.
THE CODE SEGMENT
The Heartbleed bug is in OpenSSL’s TLS heartbeat implementation.
When a TLS heartbeat is sent, it comes with a couple notable pieces of information:
• Some arbitrary payload data. This is intended to be repeated back to the sender so the sender
can verify the connection is still alive and the right data is being transmitted through the
communication channel and the length of that data, in bytes - len_payload.
• Send the heartbeat response (with len_payload bytes) happily back to the original sender.
The problem is that the OpenSSL implementation never bothered to check that len_payload is actually
correct, and that the request actually has that many bytes of payload. So, a malicious person could send
a heartbeat request indicating a payload length of up to 2^16 (65536), but actually send a shorter payload.
What happens in this case is that memcpy ends up copying beyond the bounds of the payload into the
response, giving up to 64k of OpenSSL’s memory contents to an attacker.
Overflow Countermeasures Summary
• Staying within bounds
• Check lengths before writing
• Confirm that array subscripts are within limits
• Double-check boundary condition code for off-by-one errors
• Limit input to the number of acceptable characters
• Limit programs’ privileges to reduce potential harm
• Many languages have overflow protections
• Code analyzers can identify many overflow vulnerabilities
• Canary values in stack to signal modification
27
Malware
• Programs planted by an agent with malicious intent to
cause unanticipated or undesired effects
• Virus
• A program that can replicate itself and pass on malicious code to
other nonmalicious programs by modifying them
• Worm
• A program that spreads copies of itself through a network
• Trojan horse
• Code that, in addition to its stated effect, has a second,
nonobvious, malicious effect
28
Types of Malware
29
Malware Activation
• One-time execution (implanting)
• Boot sector viruses
• Memory-resident viruses
• Application files
• Code libraries
31
Virus Effects
32
Virus Detection
• Virus scanners look for signs of malicious code infection
using signatures in program files and memory
• Traditional virus scanners have trouble keeping up with
new malware—detect about 45% of infections
• Detection mechanisms:
• Known string patterns in files or memory
• Execution patterns
• Storage patterns
34
Virus Signatures
35
Code Testing
• Unit testing
• Integration testing
• Function testing
• Performance testing
• Acceptance testing
• Installation testing
• Regression testing
• Penetration testing
36
• Flaws relating to invalid handling of input data, Most often occur in scripting languages
specifically when program input data can
accidentally or deliberately influence the flow of • Encourage reuse of other programs and
execution of the program system utilities where possible to save
coding effort
• Often used as Web CGI scripts
• Consider excerpt of PHP code from a CGI script shown which takes a name provided as input to the script, typically from a
form field similar to that shown in Figure 11.2b .
• It uses this value to construct a request to retrieve the records relating to that name from the database. The vulnerability in
this code is very similar to that in the command injection example.
• The difference is that SQL metacharacters are used, rather than shell metacharacters. If a suitable name is provided, for
example, Bob, then the code works as intended, retrieving the desired record.
• However, an input such as Bob'; drop table suppliers results in the specified record being retrieved, followed by deletion of
the entire table.
• To prevent this type of attack, the input must be validated before use. Any metacharacters must either be escaped,
canceling their effect, or the input rejected entirely.
From Security in Computing, Fifth Edition, by Charles P. Pfleeger, et al. (ISBN: 9780134085043). Copyright 2015 by Pearson Education, Inc. All rights reserved.
Real World Hacking & validation of tools used in
lab
• https://www.fidusinfosec.com/tp-link-remote-code-execution-cve-2017-13772
Code Injection
A third common variant is the code injection attack, where the input includes code that is executed by the attacked system.
Figure 11.4a shows start of a vulnerable PHP calendar script. The flaw results from the use of a variable to construct the name of a file
that is then included into the script. Note that this script was not intended to be called directly. Rather, it is a component of a larger,
multifile program. The main script set the value of the $path variable to refer to the main directory containing the program and all its code
and data files.
Using this variable elsewhere in the program meant that customizing and installing the program required changes to just a few lines.
Unfortunately, attackers do not play by the rules. Just because a script is not supposed to be called directly does not mean it is not
possible. The access protections must be configured in the Web server to block direct access to prevent this. Otherwise, if direct access
to such scripts is combined with two other features of PHP, a serious attack is possible.
The first is that PHP originally assigned the value of any input variable supplied in the HTTP request to global variables with the same
name as the field. This made the task of writing a form handler easier for inexperienced programmers. Unfortunately, there was no way
for the script to limit just which fields it expected. Hence a user could specify values for any desired global variable and they would be
created and passed to the script. In this example, the variable $path is not expected to be a form field. The second PHP feature concerns
the behavior of the include command. Not only could local files be included, but if a URL is supplied, the included code can be sourced
from anywhere on the network. Combine all of these elements, and the attack may be implemented using a request similar to that shown
in Figure 11.4b .
This results in the $path variable containing the URL of a file containing the attacker’s PHP code. It also defines another variable, $cmd,
which tells the attacker’s script what command to run. In this example, the extra command simply lists files in the current directory.
However, it could be any command the Web server has the privilege to run. This specific type of attack is known as a PHP remote code
injection or PHP file inclusion vulnerability.
Cross Site Scripting (XSS) Attacks
Commonly seen in scripted Web
applications XSS reflection vulnerability
Exploit assumption that all content from
Attacks where input provided by one • Vulnerability involves the inclusion of script code
• Attacker includes the malicious script content in
in the HTML content one site is equally trusted and hence is
user is subsequently output to another data supplied to a site
• Script code may need to access data associated permitted to interact with other content
user with other pages from the site
• Browsers impose security checks and restrict data
access to pages originating from the same site
• https://www.youtube.com/watch?v=-LDzOi1dyAA
Validating Input Syntax
It is necessary to
ensure that data By only accepting
Alternative is to
conform with any Input data should be known safe data the
compare the input
assumptions made compared against program is more
data with known
about the data what is wanted likely to remain
dangerous values
before subsequent secure
use
Given that the programmer cannot control the content of input data, it is necessary to ensure that such data conform with any
assumptions made about the data before subsequent use. If the data are textual, these assumptions may be that the data
contain only printable characters, have certain HTML markup, are the name of a person, a userid, an e-mail address, a
filename, and/or a URL.
Alternatively, the data might represent an integer or other numeric value. A program using such input should confirm that it
meets these assumptions. An important principle is that input data should be compared against what is wanted, accepting only
valid input. The alternative is to compare the input data with known dangerous values.
The problem with this approach is that new problems and methods of bypassing existing checks continue to be discovered. By
trying to block known dangerous input data, an attacker using a new encoding may succeed. By only accepting known safe
data, the program is more likely to remain secure.
This type of comparison is commonly done using regular expressions. It may be explicitly coded by the programmer or may be
implicitly included in a supplied input processing routine. Figures 11.2d and 11.3b show examples of these two approaches. A
regular expression is a pattern composed of a sequence of characters that describe allowable input variants. Some characters
in a regular expression are treated literally, and the input compared to them must contain those characters at that point. Other
characters have special meanings, allowing the specification of alternative sets of characters, classes of characters, and
repeated characters. Details of regular expression content and usage vary from language to language. An appropriate reference
should be consulted for the language in use.
If the input data fail the comparison, they could be rejected. In this case a suitable error message should be sent to the source
of the input to allow it to be corrected and reentered. Alternatively, the data may be altered to conform. This generally involves
escaping metacharacters to remove any special interpretation, thus rendering the input safe.
Alternate Encodings
Unicode used for Canonicalization
Growing requirement to internationalization • Transforming input data into a
support users around the • Uses 16-bit value for characters single, standard, minimal
May have multiple means globe and to interact with • UTF-8 encodes as 1-4 byte representation
of encoding text them using their own sequences • Once this is done the input data
• Many Unicode decoders accept can be compared with a single
languages any valid equivalent sequence
representation of acceptable input
values
The issue of multiple, alternative encodings of the input data could occur because the data are encoded in HTML or some
other structured encoding that allows multiple representations of characters.
Growing requirement to support users around the globe The Unicode character set is now widely used for this purpose. It is
the native character set used in the Java language, for example. It is also the native character set used by operating systems
such as Windows XP and later. Unicode uses a 16-bit value to represent each character.
However, many programs, databases, and other computer and communications applications assume an 8-bit character
representation, with the first 128 values corresponding to ASCII. To accommodate this, a Unicode character can be encoded
as a 1- to 4-byte sequence using the UTF-8 encoding. Any specific character is supposed to have a unique encoding.
However, if the strict limits in the specification are ignored, common ASCII characters may have multiple encodings. E.g. the
forward slash character “/”, used to separate directories in a UNIX filename, has the hexadecimal value “2F” in both ASCII
and UTF-8.
Consider the consequences of multiple encodings when validating input. There is a class of attacks that attempt to supply an
absolute pathname for a file to a script that expects only a simple local filename. The common check to prevent this is to
ensure that the supplied filename does not start with “/” and does not contain any “../” parent directory references.
If this check only assumes the correct, shortest UTF-8 encoding of slash, then an attacker using one of the longer encodings
could avoid this check. It used against Microsoft’s IIS Web server in the late 1990s.
Input Fuzzing
• Developed by Professor Barton Miller at the University of
Wisconsin Madison in 1989
• Software testing technique that uses randomly generated data as
inputs to a program
• Range of inputs is very large
• Intent is to determine if the program or function correctly handles abnormal
inputs
• Simple, free of assumptions, cheap
• Assists with reliability as well as security
• Can also use templates to generate classes of known problem
inputs
• Disadvantage is that bugs triggered by other forms of input would be missed
• Combination of approaches is needed for reasonably comprehensive
coverage of the inputs
Vulnerable Compiled Programs
Programs can be vulnerable to PATH variable manipulation
• Must reset to “safe” values
If dynamically linked may be vulnerable to manipulation of LD_LIBRARY_PATH
• Used to locate suitable dynamic library
• Must either statically link privileged programs or prevent use of
this variable
Least privilege
• Run programs with least privilege needed to complete their function
• Security vulnerabilities can result unless care is taken with this interaction
• Such issues are of particular concern when the program being used did not adequately identify all the security
concerns that might arise
• Occurs with the current trend of providing Web interfaces to programs
• Burden falls on the newer programs to identify and manage any security issues that may arise
Detection and handling of exceptions and errors generated by interaction is also important
from a security perspective
48
Summary
• Buffer overflow attacks can take advantage of the fact that
code and data are stored in the same memory in order to
maliciously modify executing programs
• Programs can have a number of other types of
vulnerabilities, including off-by-one errors, incomplete
mediation, and race conditions
• Malware can have a variety of harmful effects depending
on its characteristics, including resource usage, infection
vector, and payload
• Developers can use a variety of techniques for writing and
testing code for security