[go: up one dir, main page]

skip to main content
research-article
Open access

A Bimodal Scheduler for Coarse-Grained Reconfigurable Arrays

Published: 06 June 2016 Publication History

Abstract

Compilers for Course-Grained Reconfigurable Array (CGRA) architectures suffer from long compilation times and code quality levels far below the theoretical upper bounds. This article presents a new scheduler, called the Bimodal Modulo Scheduler (BMS), to map inner loops onto (heterogeneous) CGRAs of the Architecture for Dynamically Reconfigurable Embedded Systems (ADRES) family. BMS significantly outperforms existing schedulers for similar architectures in terms of generated code quality and compilation time. This is achieved by combining new schemes for backtracking with extended and adapted forms of priority functions and cost functions, as described in the article. BMS is evaluated by mapping multimedia and software-defined radio benchmarks onto tuned ADRES instances.

References

[1]
B. Bougard, B. De Sutter, D. Verkest, L. Van der Perre, and R. Lauwereins. 2008. A coarse-grained array accelerator for software-defined radio baseband processing. IEEE Micro 28, 4 (2008), 41--50.
[2]
F. Bouwens, M. Berekovic, B. De Sutter, and G. Gaydadjiev. 2008. Architecture enhancements for the ADRES coarse-grained reconfigurable array. In Proc. 3rd Conf. on High Performance Embedded Architectures and Compilers. 66--81.
[3]
T. Cervero, A. Kanstein, S. López, B. De Sutter, R. Sarmiento, and J.-Y. Mignolet. 2008. Architectural exploration of the H.264/AVC decoder onto a coarse-grain reconfigurable architecture. In Proc. of the Conf. on Design of Circuits and Integrated Systems.
[4]
L. Chen and T. Mitra. 2014. Graph minor approach for application mapping on CGRAs. ACM Trans. Reconf. Technol. Syst. 7, 3 (2014), 21.
[5]
B. De Sutter, O. Allam, P. Raghavan, R. Vandebriel, H. Cappelle, T. Vander Aa, and B. Mei. 2010. An efficient memory organization for high-ILP inner modem baseband SDR processors. Signal Process. Syst. 61, 2 (2010), 157--179.
[6]
B. De Sutter, O. Coene, T. Vander Aa, and B. Mei. 2008. Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays. In Proc. ACM SIGPLAN/SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES’08). 151--160.
[7]
B. De Sutter, P. Raghavan, and A. Lambrechts. 2013. Handbook of Signal Processing Systems (2 ed.). Springer, Chapter Coarse-Grained Reconfigurable Array Architectures, 553--592.
[8]
B. De Sutter, D. Verkest, E. Brockmeyer, E. Delfosse, A. Vandecappelle, and J.-Y. Mignolet. 2009. Design and tool flow of multimedia MPSoC platforms. Signal Process. Syst. 57, 2 (2009), 229--247.
[9]
V. Derudder, B. Bougard, A. Couvreur, A. Dewilde, S. Dupont, L. Folens, L. Hollevoet, F. Naessens, D. Novo, P. Raghavan, T. Schuster, K. Stinkens, J.-W. Weijers, and L. Van Der Perre. 2009. A 200Mbps+ 2.14nJ/b digital baseband multi processor system-on-chip for SDRs. In Proc. Symp. on VLSI Circuits. 292--293.
[10]
C. Ebeling, L. McMurchie, S. Hauck, and S. M. Burns. 1995. Placement and routing tools for the Triptych FPGA. IEEE Trans. VLSI Syst. 3, 4 (1995), 473--482.
[11]
M. Hamzeh, A. Shrivastava, and S. B. K. Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In Proc. 49th Annual Design Automation Conf. 1284--1291.
[12]
M. Hamzeh, A. Shrivastava, and S. B. K. Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proc. Annual Design Automation Conf. 1--10.
[13]
C. Jang, J. Kim, J. Lee, H.-S. Kim, D. Yoo, S. Kim, H.-S. Kim, and S. Ryu. 2011. An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures. In Proc. ACM SIGPLAN/SIGBED Conf. Languages, Compilers, and Tools for Embedded Systems (LCTES). 151--160.
[14]
A. Kanstein, S. López, and B. De Sutter. 2007. Optimizing coarse-grain reconfigurable hardware utilization through multiprocessing: An H.264/AVC decoder example. In Proceedings of the SPIE Conference: VLSI Circuits and Systems III, Vol. 6590.
[15]
W. Kim, Y. Choi, and H. Park. 2013. Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures. ACM Trans. Architec. Code Optim. 10, 4 (2013), 1--24.
[16]
W. Kim, D. Yoo, H. Park, and M. Ahn. 2012. SCC based modulo scheduling for coarse-grained reconfigurable processors. In Proc. Conf. on Field-Programmable Technology. 321--328.
[17]
Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. 2005. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In 2005 Design, Automation and Test in Europe Conference and Exposition (DATE 2005), 7--11 March 2005, Munich, Germany. 12--17.
[18]
M. S. Lam. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI). 318--328.
[19]
A. Lambrechts. 2009. Energy-Aware Datapath Optimizations at the Architecture-Compiler Interface. Ph.D. Dissertation. Katholieke Universiteit Leuven.
[20]
A. Lambrechts, P. Raghavan, M. Jayapala, B. Mei, F. Catthoor, and D. Verkest. 2009. Interconnect exploration for energy versus performance tradeoffs for coarse grained reconfigurable architectures. IEEE Trans. VLSI Syst. 17, 1 (2009), 151--155.
[21]
G. Lee, K. Choi, and N. Dutt. 2011. Mapping multi-domain applications onto coarse-grained reconfigurable architectures. IEEE Trans. CAD Integr. Circ. Syst. 30, 5 (2011), 637--650.
[22]
J. Llosa, E. Ayguadé, A. González, M. Valero, and J. Eckhardt. 2001. Lifetime-sensitive modulo scheduling in a production environment. IEEE Trans. Comput. 50, 3 (2001), 234--249.
[23]
S. A. Mahlke, D. C. Lin, W. Chen, R. E. Hank, and R. A. Bringmann. 1992. Effective compiler support for predicated execution using the hyperblock. In Proc. Symp. Microarch. 45--54.
[24]
B. Mei, B. De Sutter, T. Vander Aa, M. Wouters, A. Kanstein, and S. Dupont. 2008. Implementation of a coarse-grained reconfigurable media processor for AVC decoder. Signal Process. Syst. 51, 3 (2008), 225--243.
[25]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proc. Conf. on Field-Programmable Technology. 166--173.
[26]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Proc. 13th Conf. Field Programmable Logic and Application (FPL). 61--70.
[27]
D. Novo, T. Schuster, B. Bougard, A. Lambrechts, L. Van der Perre, and F. Catthoor. 2009. Energy-performance exploration of a CGA-based SDR processor. Signal Process. Syst. 56, 2--3 (2009), 273--284.
[28]
T. K. Oh, B. Egger, H. Park, and S. A. Mahlke. 2009. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In Proc. ACM SIGPLAN/SIGBED Conference on Languages, compilers, and tools for embedded systems (LCTES). 21--30.
[29]
J. Pager, R. Jeyapaul, and A. Shrivastava. 2015. A software scheme for multithreading on CGRAs. ACM Trans. Embedded Comput. Syst. 14, 1 (2015), 19.
[30]
H. Park, K. Fan, M. Kudlur, and S. A. Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In Proc. Conf. Compilers, Architecture, and Synthesis for Embedded Systems. 136--146.
[31]
H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-S. Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proc. Conf. on Parallel Architecture and Compilation Techniques. 166--176.
[32]
H. Park, Y. Park, and S. A. Mahlke. 2009a. A dataflow-centric approach to design low power control paths in CGRAs. In Proc. IEEE Symp. on Application Specific Processors. 15--20.
[33]
H. Park, Y. Park, and S. A. Mahlke. 2009b. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications. In Proc. Symp. on Microarchitecture. 370--380.
[34]
J. J. K. Park, Y. Park, and S. A. Mahlke. 2013. Efficient execution of augmented reality applications on mobile programmable accelerators. In Proc. Conf. on Field-Programmable Technology. 176--183.
[35]
Y. Park, H. Park, and S. A. Mahlke. 2009c. CGRA express: Accelerating execution using dynamic operation fusion. In Proc. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems. 271--280.
[36]
R. B. Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. Symp. on Microarchitecture. 63--74.
[37]
D. Suh, K. Kwon, S. Kim, S. Ryu, and J. Kim. 2012. Design space exploration and implementation of a high performance and low area coarse grained reconfigurable processor. In Proc. on Conf. Field-Programmable Technology. 67--70.
[38]
T. Suzuki, H. Yamada, T. Yamagishi, D. Takeda, K. Horisaki, T. Vander Aa, T. Fujisawa, L. Van der Perre, and Y. Unekawa. 2011. High-throughput, low-power software-defined radio using reconfigurable processors. IEEE Micro 31, 6 (2011), 19--28.
[39]
T. Vander Aa, M. Palkovic, M. Hartmann, P. Raghavan, A. Dejonghe, and L. Van der Perre. 2011. A multi-threaded coarse-grained array processor for wireless baseband. In Proc. 9th IEEE Symp. Application Specific Processors. 102--107.
[40]
J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. 2008. SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In Proc. 13th Asia South Pacific Design Automation Conf. (ASP-DAC). 776--782.

Cited By

View all
  • (2022)PathSeeker: A Fast Mapping Algorithm for CGRAs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774520(268-273)Online publication date: 14-Mar-2022
  • (2022)CaSMapProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527426(259-273)Online publication date: 18-Jun-2022
  • (2021)Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRAElectronics10.3390/electronics1018221010:18(2210)Online publication date: 9-Sep-2021
  • Show More Cited By

Index Terms

  1. A Bimodal Scheduler for Coarse-Grained Reconfigurable Arrays

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 2
    June 2016
    200 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2952301
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2016
    Accepted: 01 February 2016
    Revised: 01 December 2015
    Received: 01 June 2015
    Published in TACO Volume 13, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Modulo scheduling
    2. cost functions
    3. placement and routing

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Agency for Innovation by Science and Technology in Flanders, Belgium

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)54
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)PathSeeker: A Fast Mapping Algorithm for CGRAs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774520(268-273)Online publication date: 14-Mar-2022
    • (2022)CaSMapProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527426(259-273)Online publication date: 18-Jun-2022
    • (2021)Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRAElectronics10.3390/electronics1018221010:18(2210)Online publication date: 9-Sep-2021
    • (2020)Towards Higher Performance and Robust Compilation for CGRA Modulo SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298914931:9(2201-2219)Online publication date: 1-Sep-2020
    • (2018)URECA: Unified register file for CGRAs2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8342172(1081-1086)Online publication date: Mar-2018
    • (2018)RAMPProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196101(1-6)Online publication date: 24-Jun-2018
    • (2018)Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/DecompressionACM Transactions on Architecture and Code Optimization10.1145/316201815:1(1-26)Online publication date: 22-Mar-2018
    • (2018)RAMP: Resource-Aware Mapping for CGRAs2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465892(1-6)Online publication date: Jun-2018
    • (2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
    • (2017)HEAL-WEAR: An Ultra-Low Power Heterogeneous System for Bio-Signal AnalysisIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2017.270149964:9(2448-2461)Online publication date: Sep-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media