[go: up one dir, main page]

Academia.eduAcademia.edu
Tracking End-Users in Web Databases Boris Rozenberg,Yaron Gonen, Ehud Gudes, and Nurit Gal-Oz Erez Shmueli Deutche Telekom Laboratories at BGU and Department of Information System Engineering Ben Gurion University Beer Sheva 84105, Israel Deutche Telekom Laboratories at BGU and Department of Computer Science Ben Gurion University Beer Sheva 84105, Israel and to connect each user to a private database account or connection, could exhaust memory and computing resources on the application server, database or both. The solution is to use a Connection pool, a pool of pre-established database connections that links the application to a database schema. The application opens, and keeps open, several physical database connections. When an incoming request is made, the application will grab a connection from the pool, issue a query, and then return the connection back to the pool. For many applications, the criterion of performance is so important that connection pooling is the only viable solution. Additionally, web applications usually connect all end users to the same database account, where this account has the union of all privileges for all users. In this model, the DBMS cannot distinguish between SQL statements of different users. This implies the following security limitations: Abstract—When a database is accessed via a web application, users usually receive a pooled connection to the database. From a database point of view, such a connection is always established by the same user (i.e. the web application) and specific data on the end user is not available. As a consequence, users' specific transactions cannot be audited and fine-grained access control cannot be enforced at the database level. In this paper we propose a method and a system which provide the ability to track the end users in web databases. The new method can be applied to legacy web applications without requiring any changes in their existing infrastructure. Furthermore, the new users tracking ability provides a basis for native database protection mechanisms, and intrusion detection systems. Keywords- DB users tracking; web users tracking; auditing I. INTRODUCTION Auditing: The user's identity is not known and their individual actions are untraceable as well as unregulated. In today's web application world, the HTTP protocol is used to connect users to application servers rather than directly to the database. This has important implications on application security. Unfortunately, HTTP, while useful for its original purpose of static content, is stateless and in many cases is not optimal for many of today's current demands. Stateless means that there is a new connection established for every request. As such, the state of individual and recurring connections can't be supported by the protocol. The stateless aspect of HTTP was solved by the application servers, for the purpose of client to application server communication. The solution utilizes browsers cookies and the ability of the application server to create and maintain transparent sessions and state for each client. However, this represents only part of the architecture. The connection from the application server to the database still remains a challenge. Access Control: The principle of minimal privilege is violated. It is impossible to authorize the web application user with appropriate privileges at the database level since all application users have access to the same data. It means that roles cannot be defined and therefore fine-grained access control to the database cannot be enforced. Ensuring that only the right privileges are available to the user is left only to the application. Intrusion Detection: Not only that the database does not identify the user who accesses it, but it is also impossible to follow SQL statements of the same user. Attackers can exploit this flaw to access sensitive data, or use unauthorized functions. There exist attacks, such as SQL injection [26], or business logic violations [2, 3], which could be seen only at the session level (composed of multiple transactions), and thus could not be detected at the database level. The application server connects to the database on the user's behalf. The way this is done makes a large difference for two critical reasons: security and performance. The ideal solution is to connect the users to the database using a 1:1 mapping model, i.e. every end user has a distinct database account or connection. This is secure because the DBMS knows who is connected and can employ its security mechanisms, specifically access control and auditing. User-tracking which means associating the user with its corresponding database SQL statement, becomes therefore an essential task for solving the above problems. However, in the web environment the user is almost never directly connected to the database. The user is connected to the application, and the application is connected to the database. However, Web applications typically support many end users 978-1-4577-0460-4/11/$26.00 ©2011 IEEE Not much academic literature exists on the User-tracking problem. A proposal to solve the problem of users' identification was presented in [2], but the proposed approach 105 the web application and the database, thus making it impractical for legacy systems. that is based on Parameterized views requires changes in both the web application and the database. In terms of commercial products, few methods for passing users' identity from the web application to the database exist in latest Oracle versions, but again, they require changes in legacy systems. Also, several third party vendors, such as Imperva [24] suggest a solution to this problem. However, there is no published technical information on how they do it or an evaluation of their performance. Latest versions of Oracle support several ways to pass the user identity through the web application to the database. For users known to the database, the proxy authentication in OCI, for example, can be used. OCI enables a web application to set up, within a single database connection, a number of "lightweight" user sessions, each of which uniquely identifies a connected user. The application can switch between these sessions as required to process transactions on behalf of users. But in order to support the fine-grained authorization, Oracle must maintain all the application users. Since these users are created/dropped dynamically by the web application and the number of such users may be tremendous, this solution is not practical for a typical web application. Furthermore, in order to use this method, legacy application should be modified. In this paper, we propose a new method and system that provides a way to link each SQL statement in the database to the end user that performed it (via the web application), thus making the web application's access to databases more traceable. Furthermore, the new users tracking ability provides a basis for fine grained access control. The rest of the paper is structured as follows: in Section 2 we outline the related work; in Section 3 we present the overall system architecture as well as a detailed description of the developed algorithms; in Section 4 we describe our experimental evaluation process, and we conclude in Section 5 with directions for future work. II. Several 3rd party products that refer to the problem of user identification at the database level have been proposed. Imperva [24], for example, has developed multiple methods for identification of end users regardless of how these users access the database. According to vendors, Imperva’s SecureSphere product supports architectures that use connection pooling to aggregate multiple user sessions into a single database connection. SecureSphere tracks end users without requiring any changes in existing databases and web applications. However, the algorithms used by SecureSphere were not published and no information is supplied regarding their quality nor regarding their efficiency. RELATED WORK Most DBMS authorization mechanisms (usually implemented using Roles, Grant/Revoke operations and Views [10, 11]) rely heavily on the identity of the user accessing the database [10, 11]. When lacking this information, these authorization mechanisms cannot be applied, and intrusions may become easier. Several papers suggested methods to detect attacks on web databases like SQL injection [1, 3, 26]. However all these papers assume that the problem of user-tracking is solved which is the major goal of the present paper. Much effort was invested in developing methods for detecting intrusions to databases. Various IDS for databases that also rely on user identity have been proposed in the literature [3-9]. III. As was explained in [3], intrusions are usually identified within a logical session. Few methods for session boundaries identification that rely on known user identity have been proposed [14-17]. The session identification task involves identifying session boundaries from the log of database events. The importance of session identification is derived from the following motivations: it enables grouping all user activities within one session and can be useful for performance optimization by utilizing the prediction of future queries; such grouping basically defines a role that can be used to derive the actual access rights of a user; a deviation from such a grouping may indicate an attempt for intrusion, thus it can be used for intrusion detection. OUR SYSTEM We propose a system that provides an ability to monitor and track queries from the end user through the Web application to the database. The general conceptual view is depicted in Fig. 1. Figure 1. The UT system conceptual view. A pre-requisite of identifying logical sessions is identifying users. However, as explained previously, the user identity is unknown at the database level in the web environment. In spite of the importance of this problem, to the best of our knowledge, only a small number of published works tried to address it. The most relevant work to ours, by Roichman and Gudes [2], proposed a parameterized views method that allows transferring the web user identity to the database, thus enabling the using of database’ built-in access control mechanisms. The drawback of that approach is that it requires changes in both, As can be seen in Fig. 1, the UT (Users Tracking) system receives input from users to WebServer and from WebServer to DB network traffic or from the WebServer and DB log files (if available). The output of the system is a mapping of users to the database SQL statements that they performed. A. The System Architecture Fig. 2 provides a detailed view of the system. The UT system contains two layers, namely the Network Management 106 layer, and the Users Tracking layer. Following is a brief description of each layer. the protected Web Application, the Web Application functionality and the mapping function of user's requests to the actual transactions in the database. In the Real-Time mode the system will create the log of transactions with the actual user name attached to each transaction. Figure 2. The UT system architecture. The Network Management layer includes modules which handle the acquisition and management of network packets. It handles the reconstruction of TCP sessions and the parsing of HTTP and DB protocols. The output of this layer is a full session between the Web Server and a user (HTTP requests) and a full session between the Web Server and the database (SQL statements). The implementation of this layer is a technical issue and is not going to be further discussed in this paper. Figure 3. Users Tracking – main stages. Fig. 3 describes the main stages of the Users Tracking task and the following subsections give a detailed description of the algorithms developed for each stage. We assume that the only information available is from log files, and no additional semantic information is provided. B. Web Application Actions Discovery Each Web application has a way of working to accomplish its functionality and is represented by an ordered sequence of actions that users can perform. For example, in a banking Web Application, in order to transfer money to another account, the user should request a transfer form (the first action) fill the form (the second action) and submit the form (the third action). Since users can only perform the operations available at the application interface and no other operation is available for the end-users, the main question we tackle in this section is which actions are supported by an arbitrary Web Application. Typically, these actions are applied by HTTP GET or POST requests from the user's browser. Let us look at the following HTTP request: The Users Tracking layer includes modules which handle the Web Application users' identification and mapping of users' actions to SQL statements in the database. This layer is responsible for discovering of the actions available in the Web Application, discovering of SQL statements possible in the system, and matching between the discovered actions and SQL statements. This layer can be operated in two modes: Learning and Real-Time mode. In the Learning mode the output of this layer is a set of actions existing in certain Web Application, a set of SQL statements usable in the database, and for each action the mapping to the set of SQL statements in the database. In the Real-Time mode, the output of this layer is two log files. The first one contains a log of users' actions in terms of Web Application actions. The second one contains a log of SQL statements with a username attached. GET http://my.dns/act.php?lname= Smit&fname=Jon Two more modules are identified in Fig. 2: the Logical Sessions Identification module and the Role Mining module. Once the user identification is complete, one can divide the physical sessions into logical sessions where each logical session corresponds to a set of SQL statements that are all performed by a single role. The identification of logical sessions has two important uses: intrusion detection (when one logical session that contains statements performed by different roles), and role mining (which reflect the actual privileges and actual roles of users in the system). The design of these two modules, which rely heavily on the Users tracking module, will be reported in future papers. It contains the PHP script name (other script-engines such as ASP may be considered) followed by the list of parameters for the script. So, if we assume that there's a different script to each of the actions in the Web application, we can harvest all the scripts, and compile a list of all the actions. However, let as look at the following example: GET http://my.dns/act.php?lname= Smit&fname=Jon&op=add GET http://my.dns/act.php?lname= Smit&fname=Jon&op=remove We face a problem: the script name is the same for both requests, but the action itself resides in a parameter, whose name is of course arbitrary. As can be guessed, the script As mentioned above, the system can be operated in one of two modes: Learning and Real-Time. In the Learning mode the system will automatically learn the login request/response of 107 Suppose we have the following log file (only one script Trade.php is considered): act.php adds the input name to a list if the parameter op equal to "add", and removes the input name from the list if the parameter op equal to "remove". 1. GET http://my.dns/Trade.php?act= request&type=buy&tradeId=21&cnt=10 2. GET http:// my.dns /Trade.php? tradeId=22& type=buy&act=request&cnt=5 3. GET http:// my.dns /Trade.php?act=request&tradeId=23 &cnt=7&type=buy 4. GET http:// my.dns /Trade.php? cnt=8 &act=request&type=sell&tradeId=24 5. GET http://my.dns/Trade.php?act= request&type=buy&tradeId=28&cnt=1 6. GET http://my.dns/Trade.php? tadeId=29&type=buy&act=request&cnt =2 Our approach is to define an action to be a concatenation of the script name and some relevant parameters of the script. The key question is how to identify the relevant parameters? We propose a method comprising of two phases: the training phase and the pruning phase. We assume that we have a Web Server log file with all HTTP requests submitted by users to a certain Web Application during a specified period of time (this is the task of the Network Monitoring layer of our system to create such a log file). First we split this log into two parts – one for the training and one for the pruning. Next we run the following algorithm (Fig. 4): We use the first four lines for the training and last two lines for the pruning (for the purpose of the example only). In general the percent of data used for training is application dependent and should be provided by the system administrator. In line two of the algorithm we create a set STrade.php of all the parameters of the script: 1. For each PHP script rk in the Training Data(1 k NumOfPHPScripts) 2. Let Srk = {p1,p2,…pn} be a set of candidates to comprise the action for request rk 3. For each parameter pi in Srk (1 i n) 4. Obtain the set Spi of different values of pi 5. For each PHP script rk in the Pruning Data 6. For each parameter pi in rk Spi 7. if pi.value 8. Srk = Srk-{pi} STrade.php={act,type,tradeId,cnt} In the lines 3-4 we create the set of different values for each parameter in the set STrade.php as following: Sact = {request} Stype = {buy,sell} StradeID = {21,22,23,24} Scnt = {10,5,7,8} Figure 4. Web application actions discovery - Algorithm 1. In line two of the algorithm for each PHP script in the log file we generate the set Srk of parameters of the script. Initially each parameter in Srk is a candidate to comprise the action for this script (together with a script name and other parameters). In lines 3-4 of the algorithm, for each parameter in the set Srk we create a set of different values that this parameter receives over the training part of the log file. Next in lines 5-7 of the algorithm we test on the pruning part of the log whether some parameter receives a value unseen previously. If indeed some parameter receives a value unseen previously, this parameter cannot be considered as a candidate to comprise the action and it's removed from the set of candidates in line eight of the algorithm. The assumption behind this approach is that in typical Web application, all the actions should appear during certain time interval (domain dependent parameter). For example, if we have a bank Web application log for one month period, it's reasonable to assume that all possible actions will appear during the first week of the log. Thus, in our approach we could split the one month log into two parts: one week for training and three weeks for pruning. In the training phase we consider all different values of each parameter of each script to be a candidate to comprise the action. If it's true and the parameter comprises the action, it should not receive unseen values in the rest of the log, and if it does, we remove it from the list of candidates. Finally, for each PHP script all remaining parameters (in the set Srk) together with a script name comprise the action. Following is an example for the algorithm operation. Next, in the lines 6-8, for each parameter we test whether it receives new values over the pruning part of the log, by comparing the actual values with a set of previously obtained values. We get that 28 is not in the StradeID set and 1 is not in the Scnt set. At this point we reduce the set STrade.php to be {act, type}. It means that the action is defined as concatenation of "STrade.php", the value of parameter act and the values of parameter type. Thus we have two actions in this example: 1. STrade.php+Sact=request+Stype=buy 2. STrade.php+Sact=request+Stype=sell C. User Name Identification The goal of this task is to detect when a user is logging in into a Web Application and what is his/her ID (username). Since the parameter's name (login) varies and it's applicationdependent, we can choose one of the following methods to identify the username: Keywords - to maintain a list of words that might indicate a user logging in. Manual - every Web application wished to be analyzed will have to describe the login process, so the system will be able to identify the username field in HTTP requests. D. Web Users Tracking After user login event has been detected, the goal of Users tracking task is to attach the user ID to all the actions the user 108 has performed with the Web application. This can be achieved by associating the user ID with HTTP session ID, which is typically generated and sent from a web server to a client to identify the current interaction session. The client usually stores and sends the session ID as an HTTP cookie and/or sends it as a parameter in GET or POST queries [25]. This approach will allow us to track all user activity in terms of Web application actions. Note that after this stage we have a log of web actions with each action associated with the user-id that performed it. f :A Q that maps every action to a sequence of SQL statements. For example: f ( a5 ) q1 , q10 , q2 , meaning that the action a5 is mapped to the sequence q1 , q10 , q2 . function The main technique we use is based on frequent sequence mining. First, by using Timestamps which are present in the logs, we identify the SQL statements which can be mapped to the action. Note this set is not unique since several actions may be performed concurrently by the Web application. Then we find the subset of mapped SQL statements by using the frequent sequence mining technique. The intuition is that a frequent sequence corresponds to the actual set of mapped statements. Algorithm 2 is used to construct this mapping: E. SQL Statements Identification Our goal in this step is to find (and fingerprint) the set of SQL statements which are possible in the specific Web application, given a log of SQL statements for some time period. We employ an approach presented in [1] for the purpose of SQL injection detection to learn the structure of each SQL statement possible in the system and to fingerprint that structure. There are a large number of such possible statements, but they usually differ only in parameters values. If we replace the parameters in each statement with variables, we get some high level representation of the SQL statement called the fingerprint. Fig. 5 gives an example of an SQL statement and its corresponding fingerprint: A 1. For each ai DSQa 2. i 3. For each action ai' SA Tstart=Start time of ai' 4. Tstop=Stop time of ai' Let's S Qa be a sequence of all 5. i In order to do so we first use an SQL lexical analyzer [19] to create the parse-tree. Since the SQL statements in the real database log do not contain "?" as parameters, but the values themselves, we need a schema-description in order to differentiate schema objects from parameters. Using such schema-description given by the user or extracted from the database, we automatically extract the parameters from the statement and create a fingerprint. the queries from Db log with the TStart QtimeStamp TStop Add SQa to DSQa 6. i i 7. For each DSQa i f (ai' ) 8. max Seq[CAMLS ( DSQa )] i Figure 6. Map Function Creation - Algorithm 2. ' For each action ai SA we know its start and stop time (from 3.2). (Note that it can be executed in parallel with other actions.) We can separate all the queries executed during this period of time on the database. Denote this sequence of queries as S Qa (line five of the algorithm). The sequence S Qa contains Figure 5. Select statement fingerprint – example. i i queries executed by F. Mapping from Web Actions to SQL Statements This is a major step in the methodology. The main goal is to attach the username to each SQL query performed by the Web application on the database. So far we are able to attach the username to each action in terms of the Web application. What we need in order to complete the task is to know the mapping function from the Web application action to the actual SQL statements for each action existing in the Web application. Let's first define the problem formally: ' i a (and maybe some other queries). In line six of the algorithm, for each action ai' we create a dataset DS Qa of all S Qa (one sequence for each instance of the i i ' i action a in the SA ) and for each such dataset we can find the longest frequent sequence (not necessarily successive), by using the CAMLS algorithm [20] (line eight of the algorithm). Each such sequence represents a set of SQL queries that the ' corresponding action ai is mapped to. Given a set of actions, supported by the Web application A {a1 , a2 ,..., ak } , a set of SQL statements We choose CAMLS among other existing algorithms for frequent sequences mining. Given a database and a minimal support value, CAMLS efficiently generates all sequences that repeat (i.e., frequent) in the database with a support equal to or greater than a minimal support value. In our implementation we enumerate frequent sequences (with very high values of supported by the database Q {q1 , q2 ,..., ql } , a sequence of actions performed by the Web application during some specified period of time SA a1' , a2' ,...an' , ai' A , and a sequence of corresponding SQL statements we should produce a subjective Sq q1 , q2 ,..., qm , qi Q 109 fraction (delta) and we try to tag the DBlog with the action ID using the map function obtained so far (line 3). If all the transactions have been tagged or minSupportThreshold has been reached, we stop and create the final map function from the tagged DBLog in the lines 8-10 of the algorithm. support like 100%) for each DS Qa separately and take the i longest sequence as the output. This approach works well for the case when each web action has exactly one mapping to the sequence of transactions. But some web actions have more then one mapping. For example, in our domain (see the evaluation section) if some user buys his/her first security, the system performs insert (tID = 14) into Holdings Table. In the case, the user already has securities, the system performs update (tID = 13) on the Holdings Table. In both cases the user performs the same action (aID = 6) on the Web application with the same parameters. This is depicted by the Log shown in Fig. 7. G. DB Users Tracking This is the last step of the algorithm which is mainly needed in the Run-time phase. Given a mapping function f : A Q from the previous section, a tagged sequence of actions S 'A a1u1 , au22 ,... auno , aui j A (every action is tagged with a user ID), and a sequence of SQL queries S q q1 , q 2 ,..., q m , q i Q we should produce a tagged sequence of SQL queries S 'q q1u1 , q u22 ,... q uno , q ui j Q. Note that the first query in the sequence S q belongs to the first action in the sequence S 'A . We can eliminate (or mark) all the queries in the sequence S q that belong to the first action in the sequence S 'A (using f). We are left with S 'A minus the first action and S q minus all the queries belonging to the first action. Now we can continue doing the same recursively. The marking of all the queries might be tricky: for example, what happens when two or more actions contain the same queries on the database? In this case, in order to know which query to eliminate, we'll need to examine the parameters of the query and compare them to the action parameters in Web application (see Fig. 7). Figure 7. Multiple Mapping - Example. The mapping problem in this case is much more complex. We cannot use high values of support anymore. We tried to use one of the bi-partite graph matching algorithms [27], but the results were disappointing. Therefore we propose Algorithm 3 (Fig. 8) for solving the mapping problem in this case. 1. Set minSupport to 100% 2. Run algorithm 2 with current minSupport value 3. Tag (still not tagged) queries in the DBLog with the action ID using the map function created by algorithm 2 /*if there are multiple choices – compare the parameters of the action with the parameters of the SQL queries. If the parameters are same, or there are no parameters - don't tag */ 4. Select all not tagged queries in the DBLog and create set Ps of all possible web actions that not tagged queries can belong to (using the time stamps) 5. minSupport = minSupport - delta 6. while (Ps && In Fig. 9 two examples are shown. In example (a) a single user (u1) executes the action (a1), f(a1)=<t1,t10,t2>; in the example (b), two users (u1, u2) execute the same action (a1) in parallel, f(a1)=<t1,t10,t2>. The two actions a1 can match the two transactions t1 in two different ways. The exact match is determined by checking the equality of action parameters and SQL statement parameters. minSupport minSupportThreshold) execute steps 1-5 for new minSupport value and actions set A = Ps (only problematic actions) 8. for each action a in the WebLog 9. select Start and Stop Time 10. if there is no other action a in the interval between TStart and TStop 11. add all transactions tagged as belonging to a to f(a) 7. Figure 8. Map Function Creation - Algorithm 3. Figure 9. DB Users Tracking - Example. We run algorithm 2 iteratively each time reducing the minSupport value (starting from 100%) by a predefined small 110 IV. TABLE I. EVALUATION In order to evaluate our system we need a comprehensive environment with thousands of users working in. Since such environment was not available to us, we decided to simulate it on a real-life scenario based on a benchmark which is commonly used to evaluate DBMS performance. Our simulation was divided into two phases. During the first phase we evaluated the Network Management layer only and during the second step we evaluated the Users Tracking Layer separately on a simulated input. Recall that the goal of the Network Management Layer is to reconstruct a full session between the Web Server and a user (HTTP requests) and a full session between the Web Server and the database (SQL statements) from the network traffic. We ran this module on several Web applications and recorded its output to the log files. These logs were tested manually for the correctness. From this point we assume that we have this layer working correctly and move to the second phase. The second phase simulates a comprehensive environment with thousands of users, and logs out both the web server log and the database log. Our log generator simulates the operation of a web application and of a database server, and therefore outputs two logs accordingly. The log generator is build of two parts - the standard TPC-E emulator, and a Java emulator, developed by us. TPC Benchmark™ E (TPC-E) [18] is an On-Line Transaction Processing (OLTP) workload which is commonly used to evaluate the performance of various database systems. It is a mixture of intensive transactions that simulate the activities found in complex OLTP application environments. The database schema, data population, transactions, and implementation rules have been designed to be broadly representative of modern OLTP systems. The TPC-E emulator, which the log generator is based upon, models the activity of a brokerage firm that manages customer accounts, execute customer trade orders, and be responsible for the interactions of customers with financial markets. It is composed of four different roles: customers, brokers, newsmen and market. We omit here the database design and the transaction flow of the business model. TPC-E generates all the events in the system and passes them to our emulator which creates a web log record and a series of SQL statements associated with each web-action according to Table 1. These statements are logged in DB log records. We have generated 10 triples of logs: <Web application log, DB log, tagged DB log> (tagged DB Log is identical to DB log, but also contains the username attached to each query) with the following parameters: MAPPING FROM WEB ACTIONS TO SQL QUERIES. Role customer Web action trade request broker submit trade broker show trade requests market show submitted requests market handle trade newsman insert news all read news all read single news item all Query security details SQL queries select on security table (also join with code tables) insert to trade table with status 'request' update trade status to 'submitted' select on accounts only this broker handles select on trade table for requests (also join with code tables) select for submitted or pending trans' select if holding exists insert/update/delete holdings check for pending condition (select on security table) insert/update/delete holdings or nothing select on company table insert to news_items table select on new_items for titles. select on news_items and company tables select on new_items for title and item body select on news_items and company tables select on security table Note, that in our evaluation web actions and map functions are known in advance. So in order to evaluate the web action identification and map function creation algorithms, we run the corresponding modules of the system and compare the results with the expected results. In order to evaluate the Users tracking algorithm, we ran our User Tracking module on the pairs <Web application log, DB log> and compare its output with the tagged DB log. The results of this evaluation were very encouraging. In all the runs where each web action appears at least twice in the training set, the output was identical to the expected one. In the cases when some web action appears only one time or does not appear at all in the training, the corresponding SQL statements were not tagged MaxNumOfUsers: 500-1000 (from the set of 4 roles ) MaxNumOfWebActions 16500 – 17500 (from the set of 16 different actions) MaxNumberOfSQLQueries 19000 - 21600 (from the set of 14 different transactions) MaxNumOfPhysicalSessions: 2000-2500 MaxNumOfConcurentUsers: 1-10 111 [9] with a user name (because a map function for these actions was not learned). V. CONCLUSION AND FUTURE WORK [10] In this paper we presented algorithms and a prototype system that provides users tracking functionality for web databases. The system enables to track all activities from the end user through the Web application to the database, without requiring any changes in existing databases, and web applications. [11] [12] [13] Users' tracking is only a first step towards two important functionalities we plan to add to our system. The first is logical sessions' identification [14-17] and the second is actual users' privileges/roles mining [21-23]. For both of them the Users Identification is a pre-requisite. [14] [15] REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [16] [17] Buehrer, T., Weide, B.W., Sivilotti, P.A.G., 2005. Using Parse Tree Validation to Prevent SQL Injection Attacks. In Proceedings of the 5th international workshop on Software Engineering and Middleware, Portugal. Roichman, A., Gudes, E., 2007. Fine-grained Access Control to Web Databases. In Proceedings of 12th SACMAT Symposium. Roichman, A., Gudes, E., 2008. DIWeDa - Detecting Intrusions in Web Databases. In Proceedings of 22nd Annual IFIP Conference, U.K. Halfond, W., Orso, A., 2006. Preventing SQL Injection Attacks Using AMNESIA. In Proceedings of 28th International Conference on Software Engineering, China. Low, W.L., Lee, S.Y., Teoh, P., 2002. DIDAFIT: Detecting Intrusions in Databases Through Fingerprinting. In Proceedings of the 4th International Conference on Enterprise Information Systems. Bertino, E., Terzi, E., Kamra, A., Vakali, A., 2005. Intrusion Detection in RBAC-administered Databases. In Proceeding of 21st Computer Security Applications Conference, USA. Chung, C., Gertz, M., Levitt, K., 1999. A misuse detection system for database systems. In Proceedings of IFIP TC11 WG11.5 Third Working Conference. Hu, Y., Panda, B.,2004. A Data Mining Approach for Database Intrusion Detection. In Proceedings of the ACM Symposium on Applied computing, Cyprus. [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] 112 Srivastava, A., Reddy, S.R., 2005. Intertransaction Data Dependency for Intrusion Detection in Database Systems. Part of Information and System Security course, School of Information Technology, IIT Kharagpur, India. Bertino, E., Samarati, P., Jajodia, S., 1997. An Extended Authorization Model for Relational Databases, In Proceeding of IEEE Transactions on Knowledge and Data Engineering, Volume 9, Issue 1, Pages: 85-101. Griffiths, P. P., Wade, B.W., 1976. An Authorization Mechanism for a Relational Database System. ACM, Transactions on Database Systems. Gulutzan, P., Pelzer, T., 1999. SQL-99 Complete, Really An ExampleBased Reference Manual of the New Standard. R&D Books Miller Freeman, Inc. Ramakrishnan, R., Gehrke, J., 2001. Database Management Systems, Chapter 17.1, Introduction to Database security. Second Edition. Gupta, K. K., Nath, B., Ramamohanarao, K., 2008. User Session Modeling for Effective Application Intrusion Detection. SEC 2008: 269284. He, D., Goker, A., 2000. Detecting session boundaries from Web user logs. In 22nd. Annual Colloquium on IR Research IRSG 2000, Cambridge, UK, 57–66. Yao, O., An, A., Huang, X., 2005. Finding and Analyzing Database User Sessions. DASFAA 2005: 851-862. Yao, O., An, A., Huang, X., 2006. Applying language modeling to session identification from database trace logs. Knowl. Inf. Syst. 10(4): 473-504. TPC-E benchmark Web Site. http://www.tpc.org/tpce/ http://zql.sourceforge.net/. Gonen, Y., Gal-Oz, N., Yahalom, R., Gudes, E., 2010. CAMLS: A Constraint-based Apriori Algorithm for Mining Long Sequences. DASFAA 2010: 63-77. Molloy, I., Chen, H., Li, T., Wang, Q., Li, N., Bertino, E., Calo, S., Lobo, J., 2008. Mining roles with semantic meanings. In Proc. ACM Symposium on Access Control Models and Technologies (SACMAT). Molloy, I., Ninghui, L., Tiancheng, L., Ziqing, M., Wang, O., Lobo, J., 2009. Evaluating role mining algorithms. In Proc. ACM Symposium on Access Control Models and Technologies (SACMAT). Ni, Q., Lobo, J., Calo, S. B., Rohatgi, P., Bertino, E., 2009. In Proc. ACM Symposium on Access Control Models and Technologies (SACMAT). Imperva Web Site. http://www.imperva.com. http://en.wikipedia.org/wiki/Session_(computer_science). Kamra, A., Terzi, E., Bertino, E., 2008. Detecting anomalous access patterns in relational databases. VLDB J. 17(5): 1063-1077. Cormen, Leiserson, Rivest, Stein, 2009. Introduction to algorithms, The MIT press.