[go: up one dir, main page]

0% found this document useful (0 votes)
23 views43 pages

Chapter 1 CGI

A University CEP (Continuing Education Program) Students Payment Management System

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views43 pages

Chapter 1 CGI

A University CEP (Continuing Education Program) Students Payment Management System

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

CHAPTER 1

SERVER SIDE PROGRAMMING


Static vs Dynamic Pages & Client Side vs Server
Side Scripting

 The most basic type of Web page is a completely


static, text-based one, written entirely in HTML.
 The contents of the HTML file on the server are
exactly the same as the source code of the page
on the client.

<html>
<title>An Average Website</title>
<body bgcolor="#003399" text="#ffcc33">
<h1>An Average Website</h1>
<p>This is an average website. </p>
</html>
The above HTML code is static.
Static vs Dynamic Pages

 If the user reloads a static website, they would see the exact same
content every time.
 Its content was written directly by an author, and when the user goes
to the site, that code is downloaded into a browser and interpreted.

 Client-side technologies cannot do anything that requires connecting


to a back end server.
 JavaScript cannot assemble a customized drop-down list on the fly
(when a page is requested)from user preferences stored in a database
 If a change is needed in the list, the Web developer must go and edit
the page by hand.
 This gap is filled by server-side programming.
Static vs Dynamic Pages …

Client-Side Technology Main Use Example Effects

Cascading Style Sheets, Formatting pages: controlling Overlapping, different


Dynamic HTML size, color, placement, layout, colored/sized fonts
timing of elements Layers, exact positioning

Client-side scripting Event handling: controlling Link that changes color on


(JavaScript, VBScript) consequences of defined mouseover
events Mortgage calculator

Java applets Delivering small standalone Moving logo


applications Crossword puzzle
Flash animations film Animation Short cartoon
Static vs Dynamic Pages

 What does dynamic web page mean?
 A basic distinction exists between static and dynamic Web
pages
 But dynamic can mean almost anything beyond plain HTML.
 Web developers use the term to describe both client- and
server-side functions.

 On the client, it can mean


 multimedia presentations,
 scrolling headlines,
 pages that update themselves automatically, or
 elements that appear and disappear.

 On the server, the term generally denotes content


assembled on the fly, at the time the page is requested.
 If you display the current date and time on a page, for
Static vs Dynamic Pages

 In contrast to a static website, a dynamic website is one
whose content is regenerated(redeveloped) every time a
user visits or reloads the site.

 Server-side web scripting is mostly about connecting Web


sites to back end servers, such as databases.
 This enables the following types of two-way
communication:
 Server to client: Web pages can be assembled from back
end-server output.
 Client to server: Customer-entered information can be acted
upon.
 Common examples of client-to-server interaction are
online forms with some drop-down lists that the script
assembles dynamically on the server.
Static vs Dynamic Pages

 There are two ways to generate dynamic
content.
 One method is to use scripting languages.
 Popular server side scripting languages are:
 PHP
 Servlet
 Active Server Pages
 Coldfusion

 The other method is to use CGI(Common


Gateway Interface).
What is CGI?
 The Common Gateway Interface (CGI) is a method used by
web servers to run external programs known as CGI scripts
 This is done most often to generate web content dynamically.
 Whenever a web page queries a database, or a user submits a
form, a CGI script is usually called upon to do the work.

 A plain HTML document that the Web daemon retrieves is


static.
 This means it exists in a constant state: a text file that doesn't
change.
 A CGI program, on the other hand, is executed in real-time,
(request is sent)so that it can output dynamic information.
What is CGI?
 CGI is simply a specification, which defines a standard way for
web servers to run CGI scripts and for those programs to send
their results back to the server.
 The job of the CGI script is to read information that the
browser has sent and to generate some form of valid output.
 Once it has completed its task, the CGI script finishes and
exits.

 CGI is not a language, but protocol.


 It is a simple protocol that can be used to communicate
between Web forms and your program.
 A CGI program can be written in any programming language:
C/C++, Python, Perl, Visual Basic, etc.
 Perl is a very popular language for CGI scripting because of its
unrivalled text-handling abilities, easy scripting, and relative
What is CGI?
What is CGI?
 Here is what happens during execution of CGI:
 Client sends request with a URL+additional info
 Web server receives the request
 Web server identifies the request as a CGI request
 Web server locates the handler program
 Web server starts up the handling program. This is heavy
weight process creation.
 Web server feeds request parameters to handler. This
happens through stdin or environment variables
 Handler program executes
 Output of the handler is sent to the Web server via stdout.
Output is typically a web page.
 Web server returns output to the requesting web browser
What is CGI?
The Hello world Test
 A CGI program must always send back at least one
header line indicating the data type of the content
(usually text/html).
 The header line should be the first output
statement in the program.
 The web server will typically add a few header lines
of its own like Date, Server, Connection, etc.
 The following program just prints Hello world.
 It is preceded by HTTP headers as required by the CGI
interface.
 Here the header specifies that the data is text or
html.
The Hello world Test…
#include <iostream.h>
int main()
{
cout<<"Content-Type: text/html \n\n"; //you must
include
cout<<“<p>Hello world</p>\n\n";
return 0;
}

 CGI programs can send back an http status line.


 However, the web server would send one if you don’t.
cout<<“HTTP/1.1 200 OK\r\n”; //HTTP status line
The Hello world Test…
Other HTTP header lines are:
Header Description
Content-type: type A MIME string defining the format of the file being returned.
Example: Content-type:text/html

Expires: Date The date the information becomes invalid. This should be used by
the browser to decide when a page needs to be refreshed. A valid
date string should be in the format 01 Jan 1998 12:00:00 GMT.

Location: URL The URL that should be returned instead of the URL requested.
You can use this field to redirect a request to any file.

Last-modified: Date The date of last modification of the resource.

Content-length: N The length, in bytes, of the data being returned. The browser uses
this value to report the estimated download time for a file.

Set-Cookie: String Set the cookie passed through the string


Getting Data and Other Information

 Much of the most crucial information needed by


CGI applications is made available via
environment variables.
 Programs can access this information as they
would any environment variable (e.g.
getenv(“name”) in C++).
Environment variable Description
GATEWAY_INTERFACE The revision of the CGI that the server uses.
SERVER_NAME The server's hostname or IP address.
SERVER_SOFTWARE The name and version of the server software that is answering the client request.
SERVER_PROTOCOL The name and revision of the information protocol the request came in with.
SERVER_PORT The port number of the host on which the server is running.
REQUEST_METHOD The method with which the information request was issued.
PATH_INFO Extra path information passed to a CGI program.
PATH_TRANSLATED The translated version of the path given by the variable PATH_INFO.
SCRIPT_NAME The virtual path (e.g., /cgi-bin/program.pl) of the script being executed.
DOCUMENT_ROOT The directory from which Web documents are served.
QUERY_STRING The query information passed to the program.
REMOTE_HOST The remote hostname of the user making the request.
REMOTE_ADDR The remote IP address of the user making the request.
AUTH_TYPE The authentication method used to validate a user.
REMOTE_USER The authenticated name of the user.
REMOTE_IDENT The user making the request. This variable will only be set if NCSA IdentityCheck flag is
enabled, and the client machine supports the RFC 931 identification scheme.
CONTENT_TYPE The MIME type of the query data, such as "text/html".
CONTENT_LENGTH The length of the data (in bytes or the number of characters) passed to the CGI program
through standard input.
HTTP_FROM The email address of the user making the request. Most browsers do not support this
variable.
HTTP_ACCEPT A list of the MIME types that the client can accept.
HTTP_USER_AGENT The browser the client is using to issue the request.
HTTP_REFERER The URL of the document that the client points to before accessing the CGI program.
#include<stdio.h>
#include<stdlib.h>
#include <iostream.h>
const char* ENV[ 24 ] = {
"COMSPEC", "DOCUMENT_ROOT", "GATEWAY_INTERFACE", "HTTP_ACCEPT", "HTTP_ACCEPT_ENCODING“,
"HTTP_ACCEPT_LANGUAGE", "HTTP_CONNECTION", "HTTP_HOST", "HTTP_USER_AGENT", "PATH", "QUERY_STRING",
"REMOTE_ADDR", "REMOTE_PORT“, "REQUEST_METHOD", "REQUEST_URI", "SCRIPT_FILENAME“, "SCRIPT_NAME",
"SERVER_ADDR", "SERVER_ADMIN", "SERVER_NAME","SERVER_PORT","SERVER_PROTOCOL", "SERVER_SIGNATURE",
"SERVER_SOFTWARE" };
int main () {
cout << "Content-type:text/html\r\n\r\n";
cout << "<html>\n";
cout << "<head>\n";
cout << "<title>CGI Envrionment Variables</title>\n";
cout << "</head>\n";
cout << "<body>\n";
cout << "<table border = \"0\" cellspacing = \"2\">";
for ( int i = 0; i < 24; i++ ) {
cout << "<tr><td>" << ENV[ i ] << "</td><td>";
char *value = getenv( ENV[ i]);
if ( value != 0 )
cout << value;
else
cout << "Environment variable does not exist.";
cout << "</td></tr>\n";
}
cout << "</table><\n";
cout << "</body>\n";
cout << "</html>\n";
return 0;
}
Processing a Simple Form

 For forms that use METHOD="GET", the data is


passed to the script or program in an
environment variable called QUERY_STRING.

 It depends on the scripting or programming


language used how a program can access the
value of an environment variable.
 In the C++ language, you would use the library
function getenv(“envvarname”) to access the
value as a string.
 getenv (“envvarname”) is defined in the standard
library stdlib.h
 You might then use various techniques to pick up
Processing a Simple
Form…
 The output from the script or program to
“primary output stream” such as cout in the C+
+ is handled in a special way.
 Effectively, it is directed so that it gets sent
back to the browser.
 Thus, by writing a C++ program that writes an
HTML document onto its standard output,
you will make that document appear on user’s
screen as a response to the form submission.
GET vs POST
 The difference between the GET and POST is how the information
from the form is sent to the CGI program, from the server.
 A GET will provide the user's input to the CGI program as an
environment variable called QUERY_STRING.
 The CGI program would read this(query string) environment variable
using the C/C++ getenv() function and parse it to get the user's
input.

 A GET method will show the input data to the user in the URL area
of the browser, showing a string like:
www.check.com/cgi-bin/test.cgi?name=Tola&sex=male&age=25.
 The GET method is acceptable for small amounts of data.
 It is also the default method when a CGI program is run via a link.
GET vs POST…
 With GET, there is a limit how large a URL can be.
 The maximum length of a URL, as decreed by
HTTP standard, is 256 characters.
 However, longer URL may still work, but servers
are not obliged to accept them
 In POST, query string is encoded in the HTTP
request body
 It is not part of the URL
 As a result, they are not limited in size
 Unlike GET, POST allows arbitrarily long form data
to be communicated
 Arguments usually do not appear in server logs
GET vs POST…
POST / HTTP/1.1
Host: localhost:1888
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.12)
Gecko/20051010 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;
q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://springer/~s133ar/cform1.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 22
name=Tola&sex=male&age=25
GET vs POST…
 Your CGI program should inspect (check or examine)the
REQUEST_METHOD environment variable to determine if
the form was a GET or POST method
 Then it can take the appropriate action to retrieve the form.
 The CGI Program can get the request method, Post or Get,
using getenv() and environment variable REQUEST_METHOD.

 Here is how this can be done in C/C++:


char *method;
method = getenv(“REQUEST_METHOD”);
if (method==NULL) /* error! */ {}
else if(strcmp(method,”GET”)==0) {}
else if (strcmp(method,”POST”)==0) {}
GET vs POST…
 Example GET handler:
int main()
{
char *method, *query;
method = getenv(“REQUEST_METHOD”);
if (method==NULL) /* error! */{}
else if(strcmp(method, ”GET”)==0)
query = getenv(“QUERY_STRING”);
Cout<<“Content-type: text/html\r\n\r\n”;
Cout<<“<H1>Your query was “<<query <<” </H1>\n”;
return(0);
}
GET vs POST…
 A POST will provide the user's input to the CGI program, as if it were
type at the keyboard, using the standard input device, or stdin.
 If POST is used, then an environment variable called
CONTENT_LENGTH indicates how much data is being sent.
 You can read this data into a buffer, by doing something like:

char *method, *query;


method = getenv(“REQUEST_METHOD”);
if(strcmp(method, "POST") == 0)
{
len = atoi(getenv("CONTENT_LENGTH"));
query = new char[len + 1];
fread(query, 1, len, stdin);
}
Cout<<“Content-type: text/html\r\n\r\n”;
Cout<<“<H1>Your query was “<<query <<” </H1>\n”;
Data Parsing
 Now we have the data passed from a form stored in a
string variable and we want to use it.
 However, the data is still in unusable form as it is URL
encoded.
 If you have a form with two input fields, let’s call them
name and email, declared as follows
< INPUT TYPE=”text” MAXLENGTH=30 NAME="name">
< INPUT TYPE=”text” MAXLENGTH=20 NAME="email">
 Suppose the user types John David into name and then
david@host.domain into email.
 What will then be read in by your program is
name=John+David&email=david%40host.domain
Data Parsing…
 So as you can see, the data from the fields is not in a
particularly usable form because it is in URL encoded
form.
 Hence, you have to do some further processing to get the
data you need.
 Data String Formatting:
 The string consists of the name of the input(NAME) followed
by the value(JOHN) that input takes.
 The field name is separated from the data value by an "="
 One set of field(EMAIL) and data(DAVID) is separated from the
next by an "&"
 Spaces in the input data are replaced by "+"
 Non letters and numbers are replaced by "%xx" where "xx" is
the hex value corresponding to that character. Eg &=40
Data Parsing…
 URLs can only be sent over the Internet using the ASCII
character-set.
 Since URLs often contain characters outside the ASCII set,
the URL has to be converted into a valid ASCII format.
 URL encoding replaces non ASCII characters with a "%"
followed by two hexadecimal digits. “LIKE &”=40

 URLs cannot contain spaces.


 URL encoding normally replaces a space with a + sign.
 The CGI program then has to "decode" this information in
order to access the form data.
 The encoding scheme is the same for both GET and POST.
Character URL-encoding
€ %80
£ %A3
© %A9
® %AE
À %C0
Á %C1
 %C2
à %C3
Ä %C4
Å %C5
! %21
" %22
# %23
$ %24
% %25
& %26
' %27
( %28
) %29
Understanding the Decoding
Process
 In order to access the information contained within the
form, a decoding must be applied to the data.

 The algorithm for decoding form data follows:


 Determine request protocol (either GET or POST) by checking the
REQUEST_METHOD environment variable.
 If the protocol is GET, read the query string from
QUERY_STRING and/or the extra path information from PATH_INFO.
 If the protocol is POST, determine the size of the request using
CONTENT_LENGTH and read that amount of data from the standard
input.
 Split the query string on the "&" character, which separates key-
value pairs (the format is key=value&key=value...).
 Decode the hexadecimal and "+" characters in each key-value pair.
 Create a key-value table with the key as the index.
Example for removing URL encoding:
char* changeSpecialCharacters (char *query)
{
int t = 0;
char digits[17], hex[4], temp[1000], ch;
strcpy(digits, "0123456789ABCDEF\0");
for(int i = 0; i < strlen(query); i++)
{
if(query[i] == '+') //if space
temp[t++] = ' ';
else if(query[i] == '%') //if hexadecimal encoded character is found
{
hex[0] = query[++i];
hex[1] = query[++i];
hex[2] = '\0';
ch = 0;
for(int j = 0; j < strlen(digits); j++)
{
if(hex[0] == digits[j]) //convert first hexadecimal digit to num
ch = ch + 16 * j;
if(hex[1] == digits[j]) //convert second hexadecimal digit to num
ch = ch + j;
}
temp[t++] = ch;
}
else
temp[t++] = query[i];
}
temp[t] = '\0';
query = (char*)temp;
return query;
}
#include<iostream.h>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>
char *method, *query;
char str[1000], temp[100];
int prevo = -1;
char sepr[100];
char* separate(char []);
char* GET(char *);
char*POST(char*);
void changeSpecialCharacters();

void getQuery()
{
method = getenv("REQUEST_METHOD");
for(int i = 0; i < strlen(method); i++)
method[i] = toupper(method[i]);
if(strcmp(method, "GET") == 0)
query = getenv("QUERY_STRING");
else if(strcmp(method,"POST") == 0)
{
int len = atoi(getenv("CONTENT_LENGTH"));
query = new char[len];
fread(query, len, 1, stdin);
}
else
query = "unknown";
}
void changeSpecialCharacters() {
int t = 0;
char hex[4], digits[18], ch;
strcpy(digits, "0123456789ABCDEF");
strcpy(temp, "\0");
for(int i = 0; i < strlen(query); i++)
{
if(query[i] == '+')
temp[t++] = ' ';
else if(query[i] == '%‘)
{
hex[0] = query[++i];
hex[1] = query[++i];
hex[2] = '\0';
for(int j = 0; j < strlen(digits); j++) {
if(hex[0] == digits[j])
{
ch = 16 * j;
break;
}
}
for(int j = 0; j < strlen(digits); j++) {
if(hex[1] == digits[j])
{
ch = ch + j;
break;
}
}
temp[t++] = ch;
}
else
temp[t++] = query[i];
}
temp[t] = '\0';
query = (char*)temp;
cout<<"\n<br>Decoded URL: "<<query;
}
char* GET(char *name)
{
char *value;
int eq = -1;
strcpy(str, "\0");
prevo = -1;
for(int i = 0; i < strlen(query); i++)
{
strcpy(str, "\0");
if(query[i] == '&')
{
for(int j = (prevo + 1), t = 0; j < i; j++, t++)
{
str[t] = query[j];
str[t+1] = '\0';
}
if(strncmp(str, name, strlen(name)) == 0)
{
value = separate(str);
return value;
}
prevo = i;
}
}
strcpy(str, "\0");
for(int i = (prevo + 1), t = 0; i < strlen(query); i++, t++)
{
str[t] = query[i];
str[t+1] = '\0';
}
if(strncmp(str, name, strlen(name)) == 0)
{
value = separate(str);
return value;
}
return "";
}
char* separate(char field[])
{
int u = 0;
char *ret;
strcpy(sepr,"");
for(int t = 0; t < strlen(field); t++)
{
if(field[t] == '=')
{
for(int i = (t + 1); i < strlen(field); i++)
sepr[u++] = field[i];
sepr[u] = '\0';
break;
}
}
ret = sepr;
return ret;
}
int main()
{
cout<<"Content-type: text/html\r\n\r\n";
getQuery();
changeSpecialCharacters();
if (method == NULL)
{
cout<<"<p>No posting method identified.</p>";
return 0;
}
cout<<"\n<br> First name: "<<GET("first_name");
cout<<"\n<br> Last name: "<<GET("last_name");
cout<<"\n<br> Password: "<<GET("password");
return 0;
}
HTML form for above CGI:
<html>
<head>
<title>CGI Test</title>
<script language="JavaScript">
function validate() {
if(inp.first_name.value=="") {
alert("First name is empty");
return false;
}
if(inp.last_name.value=="") {
alert("Last name is empty");
return false;
}
if(inp.password.value=="") {
alert("Password name is empty");
return false;
}
return true;
}
</script>
</head>
<body>
<form name="inp" method="get" action="/cgi-bin/Test.cgi" onSubmit="return validate();">
<span class="style1"><strong>Registration Form</strong></span> <br /> <br />
First name: <input type="text" name="first_name" /> <br /> <br />
Last name: <input type="text" name="last_name" /> <br /> <br />
Password: <input type="password" name="password" /> <br /> <br />
<input type="submit" value="Submit">
</form>
</body>
</html>
Test the CGI
 To test the above CGI, first compile the C++ file
and run it.
 This will create an executable file.
 Rename the executable file to Test.cgi and put in
the CGI directory.
 Now you can open the form, fill and then submit
it to run the CGI program.
Security
 Since a CGI program is executable, it is basically the
equivalent of letting the world run a program on your
system
 This isn't safe at all – it creates security risk.
 Therefore, there are some security precautions that need to
be taken when it comes to using CGI programs.
 The first one is the fact that CGI programs need to reside in
a special directory, so that the Web server knows to
execute the program rather than just display it to the
browser.
 This directory is usually under direct control of the
webmaster, prohibiting the average user from creating CGI
programs.
Security…
 The other is, when dealing with forms, it is
extremely critical to check the data.
 A malicious user can embed shell metacharacters ─
characters that have special meaning to the shell ─
in the form data.
 This could cause big problem to your system.
 For example, here is a form that asks for user name:
<FORM action="/cgi-bin/finger.cgi"
method="POST">
<input type="text" name="user" size=40>
<input type="submit" value="Get Information">
</form>
Security…
#include<dos.h>
int main()
{
system(“mkdir “<<get(“user”));
return 0;
}
 What would happen if the user enters “John; del *”
 If this is passed to commandline, it could cause
catastrophic damage
Security…
 The false security of HTML form Hidden input, limited options, and
the POST method
 One way to input constant data from a form, or to allow several sequential
inputs from the same user, is to use the <input type="hidden"> tag.
 You should be aware that anyone can see this information using "View
Source". So, don't hide you secrets there.

 Related to this is the issue of limiting user choices to the options in a SELECT
box.
 This will stop random data from being entered, but unfortunately it is quite
easy to construct a URL that contains a query string with whatever the bad
guy wants.
 For example, say you have a select box that limits the user to "male" or
"female" parameter.
 http://biolinx.bios.niu.edu/cgi-bin/z012345/your_program.cgi?sex=male
 A modestly clever user could change this to:
 http://biolinx.bios.niu.edu/cgi-bin/z012345/your_program.cgi?
sex=monday
 Your carefully chosen options would be subverted.
 Most CGI scripts are designed to work with both POST and GET.
Security…
Scripts that read or write files
 A script that writes a file can be a problem.

 In the simplest case, the contents of that file are completely trashed

by malicious user.
 A more dangerous case is that a bad guy might write a file that

contains executable code that would cause you problems if you


inadvertently executed it.
 As a good example, if "rm -rf *" gets executed by the shell, all

programs in that directory and below will be deleted.


 Be sure that permissions for files to be written are set at 666!

 Read-only files (files whose last permission number is 4: e.g. 744)


might give away information to the bad guys.
 Don't keep important information here.
 One particular source of problems here can be "encrypted"
passwords.
 Encryption is a great thing, but we can’t prevent an expert from
cracking the encryption.

You might also like