[go: up one dir, main page]

Academia.eduAcademia.edu
REGULATION MECHANISM FOR CACHING IN PORTAL APPLICATIONS Mehregan Mahdavi Department of Computer Engineering, Guilan University, Rasht, Iran John Shepherd, Boualem Benatallah School of Computer Science and Engineering, Sydney, Australia Keywords: Web Caching, Collaborative Caching, Portal, Regulation. Abstract: Web portals are emerging Web-based applications that provide a single interface to access different data or service providers. Caching data from different providers at the portal can increase the performance of the system in terms of throughput and user-perceived delay. The portal and its providers can collaborate in order to determine the candidate caching objects. The providers allocate a caching score to each object sent to the portal. The decision for caching an object is made at the portal mainly based on these scores. However, the fact that it is up to providers to calculate such caching scores may lead to inconsistencies between them. The portal should detect these inconsistencies and regulate them in order to achieve a fair and effective caching strategy. 1 INTRODUCTION The World Wide Web has already changed many aspects of life such as communication, education, business, shopping, and entertainment. It provides a convenient and inexpensive infrastructure for communicating and exchanging data between users and data sources. Users can search for information, products, and services, to use or buy. Web sites of universities, people’s home pages, yellow and white pages, on-line stores, flight reservation, hotel booking, and electronic banking are just some examples. These Web sites are referred to as providers. There are large number of such providers that provide the same sort of information, products, and services. Therefore, it can be time consuming for users to navigate through them in order to find what they need. Web portals are emerging class of Web applications that provide a single interface for accessing different providers. The users only need to visit the portal instead of navigating through individual providers. In other words, they save time and efforts for users. Portals, such as Expedia (www.expedia.com) and Amazon (www.amazon.com) are examples of such applications. Performance and in particular providing a fast re- sponse time is one of the critical issues that today’s Web applications must deal with. Previous research has shown that abandonment of Web sites dramatically increases with increase in response time (Zona Research Inc., 2001), resulting in loss of revenue by businesses. Nowadays, many Web sites employ dynamic Web pages by accessing a back-end database and formatting the result into HTML or XML pages. Accessing the database and assembling the final result on the fly is an expensive process and a significant factor in the overall performance of such systems. Server workload or failure and network traffic are other contributing factors for slow response times. With the increasing use of the Web-enabled applications there is a need for better performance. Caching is one of the key techniques that addresses some of the performance issues of such applications. Caching can improve the response time. As a result, customer satisfaction is increased and better revenue for the portal and the providers is generated. In addition, network traffic and the workload on the providers’ servers are reduced. This in turn improves throughput and scalability and reduces hardware and software costs. Caching a particular object at the portal depends on the available storage space, response time 152 Mahdavi M., Shepherd J. and Benatallah B. (2007). REGULATION MECHANISM FOR CACHING IN PORTAL APPLICATIONS. In Proceedings of the Second International Conference on Software and Data Technologies - PL/DPS/KE/WsMUSE, pages 152-157 DOI: 10.5220/0001340701520157 Copyright c SciTePress REGULATION MECHANISM FOR CACHING IN PORTAL APPLICATIONS (QoS) requirements, access and update frequency of objects (Kossmann and Franklin, 2000). Available caching systems enable system administrators to specify caching policies. This is done mainly by including or excluding objects or object groups (e.g., objects with a common prefix in the URI) to be cached, determining expiry date for caching objects or object groups, and etc. Server logs (i.e., access log, and database update log) are also used to identify objects to be cached (Oracle Corporation, 2001b; IBM Corporation, 2006; Dynamai, 2006; Florescu et al., 2000; Yagoub et al., 2000). Existing caching approaches have examined caching in a general setting and can provide some benefit to portals. However, portals have distinctive properties to which existing techniques cannot be easily adapted and used. Most importantly, the portal and providers are managed by different organizations and administrators. Therefore, the administrator of portal does not normally have enough information to determine caching policies for individual providers. Moreover, since the portal may be dealing with a (large) number of providers, determining the best objects for caching manually or by processing logs is impractical. On one hand, an administrator cannot identify candidate objects in a dynamic environment where providers may join and leave the portal frequently. On the other hand, keeping and processing access logs in the portal is impractical due to high storage space and processing time requirements. Also, database update logs are not normally accessible by the portal. The portal and its providers can collaborate in order to provide a more effective caching strategy. A caching score (called cache-worthiness) is associated to each object, determined by the provider of that object. It represents the usefulness of caching this object at the portal. Larger values represent more useful objects for caching. The cache-worthiness score is sent by the provider to the portal in response to a request from the portal (Mahdavi et al., 2003; Mahdavi et al., 2004; Mahdavi and Shepherd, 2004). The fact that it is up to providers to calculate cache-worthiness scores may lead to inconsistencies between them. Although, all providers may use the same overall strategy to score their objects, the scores may not be consistent. In the absence of any regulation of cache-worthiness scores, objects from providers who give higher scores will get more chance to be cached, and such providers will get more cache space than others. This leads to unfair treatment of providers. As a result those who give lower scores get comparatively less cache space and their performance improvements are expected to be less than those who score higher. It may also result in less effective cache performance as a whole. To achieve an effective caching strategy, the portal should detect these inconsistencies and regulate the scores given by different providers. The remainder of this paper is organized as follows: Section 2 provides an overview about Web caching. In Section 3 we explain the regulation mechanism used in a collaborative caching environment. Experimental results are presented in Section 4. Finally, some conclusions are presented in Section 5. 2 WEB CACHING BACKGROUND Web caching has been studied extensively. Browser and proxy caches are the most common caching strategies for (static) Web pages. Caching dynamic Web pages has been studied in (Aberdeen Group, 2001; Chutney Technologies, 2001; Chutney Technologies, 2006; Akamai Technologies Corporate, 2006; Chidlovskii and Borghoff, 2000; Candan et al., 2001; Oracle Corporation, 2006; Oracle Corporation, 2001a; Anton et al., 2002; Challenger et al., 1999; Dynamai, 2006; TimesTen Inc., 2002). Caching Web objects has already created a multi-million dollar business: Content Delivery/Distribution Network (CDN) (Oracle Corporation, 2006; Vakali and Pallis, 2003). Companies such as Akamai (Akamai Technologies Corporate, 2006) have been providing CDN services for several years. CDN services are designed to deploy edge servers at different geographical locations. Examples of edge servers include Akamai EdgeSuite (Akamai Technologies Corporate, 2006) and IBM WebSphere Edge Server (IBM Corporation, 2006). Some applications may need a customized caching technique. Application-level caching is normally enabled by providing a cache API, allowing application writers to explicitly manage the cache to add, delete, and modify cached objects (Degenaro et al., 2001; Bortvedt, 2004; Sun Microsystems, 2005; Apache Software Foundation, 2004). When considering caching techniques, a caching policy is required to determine which objects should be cached (Podlipnig and Boszormenyi, 2003; Balamash and Krunz, 2004; Cao and Irani, 1997; Datta et al., 2002; Aggrawal et al., 1999; Cheng and Kambayashi, 2000a; Young, 1991; Wong and Yeung, 2001). For rapidly changing data we might prefer not to cache them because of the space, communication or computation costs. Products such as Oracle Web Cache (http://www.oracle.com), IBM WebSphere Edge Server (http://www.ibm.com), and Dynamai (http://www.persistence.com) enable sys- 153 ICSOFT 2007 - International Conference on Software and Data Technologies tem administrators to specify caching policies. Weave (Florescu et al., 2000; Yagoub et al., 2000) is a Web site management system which provides a language to specify a customized cache management strategy. The performance of individual cache servers increases when they collaborate with each other by replying each other’s misses. A protocol called Intercache Communication Protocol (ICP) was developed to enable querying other proxies in order to find requested web objects. (Li et al., 2001; Paul and Fei, 2000; Cheng and Kambayashi, 2000b; Fan et al., 2000; Rohm et al., 2001; Chandhok, 2000). In Summary Cache, each cache server keeps a summary table of the content of the cache at other servers. When a cache miss occurs, the server probes the table to find the missing object in other servers. It then sends a request only to those servers expected to contain the missing object (Fan et al., 2000). In Cache Array Routing Protocol (CARP) all proxy servers are included in an array membership list and the objects are distributed over the servers using a hash function (Microsoft Corporation, 1997). 3 REGULATION MECHANISM In current systems, caching policies are defined and tuned by parameters which are set by a system administrator based on the previous history of available resources, access and update patterns. A more useful infrastructure should be able to provide more powerful means to define and deploy caching policies, preferably with minimal manual intervention. As the owners of objects, providers are deemed more eligible and capable of deciding objects for caching purpose. A caching score (called cache-worthiness) can be associated to each object, determined by the provider of that object. The cache-worthiness score is sent by the provider to the portal in response to a request from the portal. Cache-worthiness scores are determined by providers via an off-line process which examines the provider’s server logs, calculates scores and then stores the scores in a local table. In calculating cache-worthiness, the providers consider parameters such as access frequency, update frequency, computation/construction cost and delivery cost. More details on the caching strategy are presented in (Mahdavi et al., 2003; Mahdavi et al., 2004; Mahdavi and Shepherd, 2004). A typical cache-worthiness calculation would assign higher scores to objects with higher access frequency, lower update frequency, higher computation/construction cost, and higher delivery cost. How- 154 ever, each provider can have its own definition of these scores, based on its own policies and priorities. For example, a provider might choose not to process server logs for defining the scores. It might, for example, choose to let the system administrator assign some zero or non-zero values to objects. Relying on the providers to calculate and assign caching scores may lead to inconsistencies between them. The following factors contribute to causing inconsistencies in caching scores among providers: • Each provider uses a limited number of log entries to extract required information, and the available log entries may vary from one to another • Providers may use other mechanisms to score the objects (they are “not required” to use the above approach) • Malicious providers may claim that all of their own objects should be cached, in the hope of getting more cache space To achieve a fair and effective caching strategy, the portal should detect these inconsistencies and regulate the scores given by different providers. For this purpose, the portal uses a regulating factor λ(m) for each provider and applies it to the cache-worthiness scores and uses the result in the calculation of the overall caching scores received from provider m. This factor has a neutral value in the beginning and is adapted dynamically by monitoring the cache behavior. This is done by tracing false hits and real hits. A false hit is a cache hit occurring at the portal when the object is already invalidated. False hits degrade the performance and increase the overheads both at portal and provider sites, without any outcome. These overheads include probing the cache validation table, generating validation request messages, wasting cache space, and probing cache look-up table. A real hit is a cache hit occurring at the portal when the object is still fresh and can be served by the cache. The performance of the cache can only be judged by real hits. The portal monitors the performance of the cache in terms of tracing real and false hits and dynamically adapts λ(m) for each provider. For those providers with higher ratio of real hits, the portal upgrades λ(m) by a small amount δ1 . The new value for λ(m) is calculated by adding a small value to it. Therefore, all the cache-worthiness scores from that provider are treated as being higher than before. Choosing a small increment (close to 0) ensures that the increase is done gradually. Recall that we impose an upper bound of 1 on cache-worthiness scores. The new value of λ(m) will be: REGULATION MECHANISM FOR CACHING IN PORTAL APPLICATIONS (1) For those providers with higher ratio of false hits, the portal downgrades λ(m) by a small amount δ2 . The new value for λ(m) is calculated by decreasing a small value from it. Therefore, all the cacheworthiness scores from that provider are treated as lower scores. Choosing a small decrement (close to 0) ensures that the decrease is done gradually. Recall that we impose a lower bound of 0 on cacheworthiness scores. The new value of λ(m) will be: λ(m) ← λ(m) − δ2 (2) A high false hit ratio for a provider m, indicates that the cache space for that particular provider is not utilized. That is because the cached objects for that provider are not as worthy as they should be. In other words, the provider has given higher cacheworthiness scores to its objects. This can be resolved by downgrading the scores from that provider and treat them as they were lower. Unlikely, a high real hit ratio for a provider m, indicates that the cache performance for this provider is good. Therefore, provider m is taking good advantage of the cache space. Upgrading the cache-worthiness scores of provider m results in more cache space being assigned to this provider. This ensures fairness in the cache usage based on how the cache is utilized by providers. The fair distribution of cache space among providers will also result in better cache performance. The experimental results confirm this claim. 4 EXPERIMENTS In order to evaluate the performance of the collaborative caching strategy, a test-bed has been built. This test-bed enables us to simulate the behavior of a business portal with different number of providers, message sizes, response time, update rate, cache size, etc. We examine the behavior of the system under a range of different scenarios. The performance results show that the collaborative caching strategy (i.e., CacheCW outperforms other examined caching strategies by at least 22% for throughput, 24% for network bandwidth usage and 18% for average access time. Examined caching strategies include Least Recently Used(LRU), First In First Out (FIFO), Least Frequency Used(LFU), Size (SIZE), Size Adjusted LRU (LRU-S), and Size Adjusted and Popularity Aware LRU (LRU-SP). To address the issue of inconsistencies, a regulation factor is assigned to each provider. Every provider’s cache-worthiness score is multiplied by the corresponding factor. Therefore, providers whose scores are low should have a high regulation factor and vice versa. The regulation factor changes over time by monitoring false and real hit ratio. For this purpose, the providers in the simulation are set up in such a way that first one deliberately overestimates, second one underestimates, and third one produces the standard cache-worthiness score. The same pattern applies for subsequent providers. In other words, one third of providers overestimate, one third underestimate, and for the remaining one third the normal cache-worthiness score is considered. Each provider was initially given a regulation factor of 1, so that each cache-worthiness score from them was not modified. False and real hit ratio were used to downgrade or upgrade the regulation factor. However, using real hit ratio over the occupied cache space by each provider for upgrading the regulation factor was the most successful among all the variations used. Using real hit ratio by itself did not produce the desired results, as the performance of the cache for a provider depends both on the real hit ratio and the cache space occupied by the provider. Providers were monitored to see if the regulation factor was moving in such a way as to separate the three groups of providers so that the underestimating providers consistently had the highest factor, followed by the accurately estimating provider, with the overestimating provider having the lowest regulation factor. Figure 1 shows the changes in regulation factor for different groups of providers. One provider from each group is used in the Figure. However, all providers in each group show similar results. The results demonstrate that the regulation factor does separate the providers accordingly. UnderEst NormalEst OverEst 1.15 Regulation Factor λ(m) ← λ(m) + δ1 1.1 1.05 1 0.95 0.9 0.85 0.8 0 120 240 360 480 600 720 Time (min) Figure 1: Regulation Factor. When upgrading or downgrading the regulation factor, we use small values δ1 and δ2 by which λ(m) is upgraded or downgraded. By choosing very small 155 ICSOFT 2007 - International Conference on Software and Data Technologies value for δ1 and δ2 , it takes a long time for the system to adjust itself. On the other hand, choosing large value for δ1 and δ2 makes the regulation factor fluctuate unnecessarily. Choosing an appropriate value makes the system adjust itself in a fair amount of time. For this purpose we have examined different values for δ1 and δ2 . ∆i (CW ) = CWi+1 −CWi (3) δ1 = f1 × ∆(CW ) : 0 < f1 ≤ 1 (4) δ2 = f2 × ∆(CW ) : 0 < f2 ≤ 1 (5) our example, these are providers that underestimate and are shown as UnderEst. providers that overestimate (i.e., OverEst) result in less improvement. In other words, the improvement in average access time for individual providers is in proportion with their utilization of cache space. Total average access time is also improved as a result of regulation process and better utilization of cache space (i.e., 6.59 compared to 7.26) as shown in Table 3. Table 2: Average access times for individual providers. UnderEst NormalEst OverEst NoReg 7.54 7.21 7.04 Reg 6.63 6.58 6.55 Where: CWi, j : CWi : Cache-worthiness score of object O j at provider Pi Average value of cache-worthiness scores at provider Pi Smaller values for f1 and f2 make the adjustment smoother, but more timely. The experiment result show that any value for f 1 and f2 in the range of 0 < f1 , f2 ≤ 1 will generate the expected results. The results in this experiment are generated based on f1 = 0.1 and f2 = 0.1. When ∆(CW ) = 0, although very unlikely, the regulation factor will be zero. In other words, in this special case the regulation process will stay unchanged. However, in the next interval, when ∆(CW ) is calculated again the regulation process will resume. The value for ∆(CW ) is calculated using available objects in the cache. Our experiments show that even using a subset of cached objects to generate ∆(CW ) will give the same results and an estimation of the value, in case the overhead is an issue, will generate the desired results. According to the experiments regulation results in improvement in the throughput. The results are shown in Table 1. The resulted throughput after regulation is 305 compared to 278 which is 10% improvement in throughput Table 1: Throughput. NoReg Reg Throughput 278 305 Average access time for individual providers is improved as a result of better utilization of cache space, as shown in Table 2. However, those providers that take better advantage of cache, show better improvement (i.e., those with higher real hit ratio). In 156 Table 3: Total average access time. Average Access Time NoReg 7.26 Reg 6.59 5 CONCLUSION In this work, we discussed how a collaborative caching strategy can overcome the limitations of current systems in providing an effective caching strategy in portal applications. We addressed the issue of heterogeneous caching policies by different providers and introduced a mechanism to deal with that. This is done by tracing the performance of the cache and regulating the scores from different providers. As a result, better performance for individual providers is achieved and the performance of the cache as a whole is improved. REFERENCES Aberdeen Group (2001). Cutting the Costs of Personalization With Dynamic Content Caching. http://www.chutneytech.com/tech/aberdeen.cfm. An Executive White Paper. Aggrawal, C., Wolf, J. L., and Yu, P. S. (1999). Caching on the World Wide Web. IEEE Transactions on Knowledge and Data Engineering (TKDE), 11(1):94–107. Akamai Technologies http://www.akamai.com. Corporate (2006). Anton, J., Jacobs, L., Liu, X., Parker, J., Zeng, Z., and Zhong, T. (2002). Web Caching for Database Aplications with Oracle Web cache. In ACM SIGMOD Conference, pages 594–599. Oracle Corporation. Apache Software Foundation (2004). JCS and JCACHE (JSR-107). http://jakarta.apache.org/turbine/jcs/index.html. REGULATION MECHANISM FOR CACHING IN PORTAL APPLICATIONS Mahdavi, M., Benatallah, B., and Rabhi, F. (2003). Caching Dynamic Data for E-Business Applications. In International Conference on Intelligent Information Systems (IIS’03): New Trends in Intelligent Information Bortvedt, J. (2004). Functional Specification for ObProcessing and Web Mining (IIPWM), pages 459– ject Caching Service for Java (OCS4J), 2.0. 466. http://jcp.org/aboutJava/communityprocess/jsr/cacheFS.pdf. Balamash, A. and Krunz, M. (2004). An Overview of Web Caching Replacement Algorithms. In IEEE Comunications Surveys and Tutorials, volume 6, pages 44–56. Candan, K. S., Li, W.-S., Luoand, Q., Hsiung, W.-P., and Agrawal, D. (2001). Enabling Dynamic Content Caching for Database-Driven Web Sites. In ACM SIGMOD Conference, pages 532–543. Mahdavi, M. and Shepherd, J. (2004). Enabling Dynamic Content Caching in Web Portals. In 14th International Workshop on Research Issues on Data Engineering (RIDE’04), pages 129–136. Cao, P. and Irani, S. (1997). Cost-Aware WWW Proxy Caching Algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, pages 193–206. Mahdavi, M., Shepherd, J., and Benatallah, B. (2004). A Collaborative Approach for Caching Dynamic Data in Portal Applications. In The Fifteenth Australasian Database Conference (ADC’04), pages 181–188. Challenger, J., Iyengar, A., and Dantzig, P. (1999). A Scalable System for Consistently Caching Dynamic Web Data. In IEEE INFOCOM, pages 294–303. Microsoft Corporation (1997). Cache Array Routing Protocol and Microsoft Proxy Server 2.0. http://www.mcoecn.org/WhitePapers/Mscarp.pdf. White Paper. Chandhok, N. (2000). Web Distribution Systems: Caching and Replication. http://www.cis.ohiostate.edu/ jain/cis788-99/web-caching/index.html. Cheng, K. and Kambayashi, Y. (2000a). LRU-SP: A SizeAdjusted and Popularity-Aware LRU Replacement Algorithm for Web Caching. In IEEE Compsac, pages 48–53. Cheng, K. and Kambayashi, Y. (2000b). Multicache-based Content Management for Web Caching. In WISE. Oracle Corporation (2001a). Oracle9i Application Server: Database Cache. Technical report, Oracle Corporation, http://www.oracle.com. Oracle Corporation (2001b). Oracle9iAS Web Cache. Technical report, Oracle Corporation, http://www.oracle.com. Oracle Corporation (2006). http://www.oracle.com. Chidlovskii, B. and Borghoff, U. (2000). Semantic caching of Web queries. VLDB Journal, 9(1):2–17. Paul, S. and Fei, Z. (2000). Distributed Caching with Centralized Control. In 5th International Web Caching and Content Delivery Workshop. Chutney Technologies (2001). Dynamic Content Acceleration: A Caching Solution to Enable Scalable Dynamic Web Page Generation. In SIGMOD Conference. Podlipnig, S. and Boszormenyi, L. (2003). A Survey of Web Cache Replacement Strategies. In ACM Computing Surveys, volume 35, pages 374–398. Chutney Technologies http://www.chutneytech.com. Rohm, U., Bohm, K., and Schek, H.-J. (2001). CacheAware Query Routing in a Cluster of Databases. In ICDE. (2006). Datta, A., Dutta, K., Thomas, H. M., VanderMeer, D. E., and Ramamritham, K. (2002). Accelerating Dynamic Web Content Generation. IEEE Internet Computing, 6(5):27–36. Degenaro, L., Iyengar, A., and Ruvellou, I. (2001). Improving Performance with Application-Level Caching. In International Conference on Advances in Infrastructure for Electronic Business, Science, and Education on the Internet (SSGRR). Dynamai (2006). http://www.persistence.com. Fan, L., Cao, P., and Broder, A. (2000). Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. In IEEE/ACM Transactions on Networking, volume 8. Florescu, D., Yagoub, K., Valduriez, P., and Issarny, V. (2000). WEAVE: A Data-Intensive Web Site Management System. In The Conference on Extending Database Technology (EDBT). IBM Corporation (2006). http://www.ibm.com. Kossmann, D. and Franklin, M. J. (2000). Cache Investment: Integrating Query Optimization and Distributed Data Placement. In ACM TODS. Li, D., Cao, P., and Dahlin, M. (2001). WCIP: Web Cache invalidation Protocol. http://www.ietf.org/internetdrafts/draft-danli-wrec-wcip-01.txt. Sun Microsystems (2005). JSRs: Java Specification Requests. http://www.jcp.org/en/jsr/overview. TimesTen Inc. (2002). Mid-Tier Caching. Technical report, TimesTen Inc., http://www.timesten.com. Vakali, A. and Pallis, G. (2003). Content Delivery Networks: Status and Trends. IEEE Internet Computing, pages 68–74. Wong, K. Y. and Yeung, K. H. (2001). Site-Based Approach to Web Cache Design. IEEE Internet Computing, 5(5):28–34. Yagoub, K., Florescu, D., Valduriez, P., and Issarny, V. (2000). Caching Strategies for Data-Intensive Web Sites. In Proceedings of 26th International Conference on Very Large Data Bases (VLDB), pages 188– 199, Cairo, Egypt. Young, N. E. (1991). The k-Server Dual and Loose Competitiveness for Paging. Algorithmica, 11(6):525–541. Zona Research Inc. (2001). Zona search Releases Need for Speed http://www.zonaresearch.com/info/press/01may03.htm. ReII. 157