Abstract
Noticeable performance improvement via ever-increasing transistors is gradually trapped into a predicament since software cannot logically and efficiently utilize hardware resource, such as multi-core resource. This is an inevitable problem in dynamic binary translation (DBT) system as well. Though special purpose hardware as aide tool, through some interfaces, provided by DBT enables the system to achieve higher performance, the limitation of it is significant, that is, it is impossible to be used widely by another one. To overcome this drawback, we focus on building compatible software architecture to acquire higher performance without platform dependence. In this paper, we propose a novel multithreaded architecture for DBT system through partitioning distinct function module, which is to adequately utilize multiprocessors resource. This new architecture devides couples the common DBT system (DBTs) working routine into dynamic translation, optimization, and translated code execution phases, and then ramifies them into different threads to enable them concurrently executed. In this new architecture, several efficient novel methods are presented to cope with intractable work that puzzles most researchers, such as communication mechanism, cache layout, and mutual exclusion between threads. Experimental results using SPECint 2000 indicate that this new architecture for DBT system can achieve higher performance — speed up the traditional DBT system by about average 10.75%, with better CPU utilization.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hu W W, Hou R, Xiao J H, Zhang L B. High performance general-purpose microprocessors: Past and future. Journal of Computer Science and Technology, 2006, 21(5): 631–640.
Wells P, Chakraborty K, Sohi G. Dynamic heterogeneity and the need for multicore virtualization. ACM SIGOPS Operating Systems Review, 2009, 43(2): 5–14.
Tera-scale research prototype: Connecting 80 simple sores on a single test chip. ftp://download.intel.com/research/platform/terascale/tera-scaleresearchprototypebackgrounder.pdf, Jan. 10, 2010.
Moore RW, Baiocchi J A, Childers B R, Davidson JW, Hiser J D. Addressing the challenges of DBT for the ARM architecture. In Proc. LCTES, Dublin, Ireland, Jun. 19–20, 2009, pp.147–156.
Bellard F. QEMU, a fast and portable dynamic translator. In Proc. USENIX ATC, Anaheim, USA, Apr. 10–15, 2005, p.41.
Ung D, Cifuentes C. Machine-adaptable dynamic binary translation. In Proc. DYNAMO, Boston, USA, Jan. 18, 2000, pp.41–51.
Wang C, Ying V, Wu Y. Supporting legacy binary code in a software transaction compiler with dynamic binary translation and optimization. In Proc. CC, Budapest, Hungary, Mar. 29-Apr. 6, 2008, pp.291–306.
Kondoh G, Komatsu H. Dynamic binary translation specialized for embedded systems. In Proc. VEE, Pittsburgh, USA, Mar. 17–19, 2010, pp.157–166.
Payer M, Gross T. Fast binary translation: Translation efficiency and runtime efficiency. In AMAS-BT, Austin, USA, June 20, 2009.
Lu J, Chen H, Yew P, Hsu W. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 2006, 6: 1–24.
Zhang W F, Brad C, Tullsen D M. An event-driven multithreaded dynamic optimization framework. In Proc. PACT, Saint Louis, USA, Sept. 17–21, 2005, pp.87–98.
Hazelwood K, Lueck G, Cohn R. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proc. ISMM, Dublin, Ireland, Jun. 9–20, 2009, pp.20–29.
Adams K, Agesen O. A comparison of software and hardware techniques for x86 virtualization. In Proc. ASPLOS, San Jose, USA, Oct. 21–25, 2006, pp.2–13.
Baraz L, Devor T, Etzion O, Goldenberg S, Skalesky A, Wang Y, Zemach Y. IA-32 execution layer: A two-phase dynamic translator designed to support IA-32 applications on Itaniumbased systems. In Proc. MICRO, San Diego, USA, Dec. 3–5, 2003, pp.191–201.
Cmelik R F, Ditzel D R, Kelly E J, Hunter C B, Laird D A, Wing M J , Zyner G B. Combining hardware and software to provide an improved microprocessor. US Patent # 6031992, 2000.
Klaiber A. The technology behind Crusoe processors. Transmeta Technical Brief, 2000.
Li T, Liang A, Liu B, Lin L, Guan H. A hardware/software codesigned virtual machine to support multiple ISAS. In Proc. AMSBT, Beijing, China, Jun. 21, 2008, pp.38–44.
Bala V, Duesterwald E, Banerjia S. Dynamo: A transparent runtime optimization system. In Proc. PLDI, Vancouver, Canada, Jun. 18–21, 2000, pp.1–12.
Wang C, Wu Y, Araujo G. Software-based transparent and comprehensive control-flow error detection. In Proc. CGO, New York, USA, Mar. 26–29, 2006, pp.333–345.
Luk C, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi V, Hazelwood K. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. PLDI, Chicago, USA, Jun. 12–15, 2005, pp.190–200.
Qin F, Wang C, Li Z, Kim H S, Zhou Y, Wu Y. LIFT: A low-overhead practical information flow tracking system for detecting security attacks. In Proc. MICRO, Orlando, USA, Dec. 9–13, 2006, pp.135–148.
Wu Q, Reddi V, Wu Y, Lee J, Conners D, Brooks D, Martonosi M, Clark D. A dynamic compilation framework for controlling microprocessor energy and performance. In Proc. MICRO, Barcelona, Spain, Nov. 12–16, 2005, pp.271–282.
Sridhar S, Shapiro J S, Bungale P P. HDTrans: A lowoverhead dynamic translator. ACM SIGARCH Computer Architecture News, 2005, 35(1): 135–140.
Hiser J D,Williams D, HuW, Davidson JW, Mars J, Childers B R. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In Proc. CGO, San Jose, USA, Mar. 11–14, 2007, pp.61–73.
Shi H H,Wang Y, Guan H B, Liang A L. An intermediate language level optimization framework for dynamic binary translation. ACM SIG/PLAN Notice, 2007, 42(5): 3–9.
SPEC CPU2000 documentation, http://www.spec.org/osg/cpu2000/docs/, Jan. 10, 2010.
Hazelwood K, Smith M D. Managing bounded code caches in dynamic binary optimization systems. ACM Transactions on Architecture and Code Optimization, 2006, 3(3): 263–294.
Stallings W. Operating Systems: Internals and Design Principles. Sixth Edition, Prentice Hall, 2008.
Sun Y, Zhang W. Improving Java performance and energy dissipation through efficient code caching. Design Automation for Embedded Systems, 2009, 13(3): 179–192.
Baiocchi J, Childers B R. Heterogeneous code cache: Using scratchpad and main memory in dynamic binary translators. In Proc. DAC, San Francisco, USA, Jul. 26–31, 2009, pp.744–749.
Hazelwood K, Smith M D. Code cache management schemes for dynamic optimizers. In Proc. INTERACT, Sydney, Australia, Jul. 21–25, 2002, p.102.
Hazelwood K. Code cache management in dynamic optimization systems [Ph.D. Dissertation]. Harvard University, May, 2004.
Chernoff A, Herdeg M, Hookway R, Reeve C, Rubin N, Tye T, Yadavalli S B, Yates J. FX!32: A profile-directed binary translator. IEEE Micro, 1998, 18(2): 56–64.
Ebcioglu K, Altman E R. DAISY: Dynamic complication for 100% architectural compatibility. In Proc. ISCA, Denver, USA, Jun. 2–4, 1997, pp.26–37.
Altman E R, Gschwind M, Sathaye S, Kosonocky S, Bright A, Fritts J, Ledak P, Appenzeller D, Filan Z. BOA: The architecture of a binary translation processor. IBM Research Report RC 21665, 1999.
Dehnert J C, Grant B K, Banning J P, Johnson R, Kistler T, Klaiber A, Mattson J. The Transmeta Code Morphing Software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proc. CGO, San Francisco, USA, Mar. 23–26, 2003, pp.15–24.
Scott K, Kumar N, Velusamy S, Childers B, Davidson J W, Soffa M L. Retargetable and reconfigurable software dynamic translation. In Proc. CGO, San Francisco, USA, Mar. 23–26, 2003, pp.36–47.
Cifuentes C, Lewis B, Ung D. Walkabout — A retargetable dynamic binary translation framework. In Workshop on Binary Translation, Charlottesville, USA, Sept. 22–25, 2002.
Cifuentes C, Emmerik M. UQBT: Adaptable binary translation at low cost. Computer, 2000, 33(3): 60–66.
Bruening D, Duesterwald E, Amarasinghe S. Design and implementation of a dynamic optimization framework for Windows. In Workshop on FDDO, Austin, USA, Dec. 1, 2001.
Chen W K, Lerner S, Chaiken R, Gilles D M. Mojo: A dynamic optimization system. In Workshop on FDDO, Monterey, USA, Dec. 10, 2000.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China under Grant Nos. 60970108, 60970107, the Science and Technology Commission of Shanghai Municipality under Grant Nos. 09510701600, 10DZ1500200, 10511500102, IBM SUR Funding and IBM Research-China JP Funding.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ma, RH., Guan, HB., Zhu, EZ. et al. Partitioning the Conventional DBT System for Multiprocessors. J. Comput. Sci. Technol. 26, 474–490 (2011). https://doi.org/10.1007/s11390-011-1148-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-011-1148-1