Cortex-A7 Configuration and Signoff Guide
Cortex-A7 Configuration and Signoff Guide
™ ™
Revision: r0p5
Change history
Proprietary Notice
Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM® in the EU and other countries,
except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the
trademarks of their respective owners.
Neither the whole nor any part of the information contained in, or the product described in, this document may be
adapted or reproduced in any material form except with the prior written permission of the copyright holder.
The product described in this document is subject to continuous developments and improvements. All particulars of the
product and its use contained in this document are given by ARM in good faith. However, all warranties implied or
expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded.
This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or
damage arising from the use of any information in this document, or any error or omission in such information, or any
incorrect use of the product.
Where the term ARM is used it means “ARM or any of its subsidiaries as appropriate”.
Confidentiality Status
This document is Confidential. This document may only be used and distributed in accordance with the terms of the
agreement entered into by ARM and the party that ARM delivered this document to.
Product Status
Web Address
http://www.arm.com
Preface
About this book ................................................................................................................ vi
Feedback .......................................................................................................................... x
Chapter 1 Introduction
1.1 About implementation ................................................................................................... 1-2
1.2 Implementation resources ............................................................................................ 1-3
1.3 Implementation controls and constraints ...................................................................... 1-4
1.4 Implementation inputs ................................................................................................. 1-16
1.5 Implementation flow .................................................................................................... 1-17
1.6 Implementation outputs .............................................................................................. 1-19
1.7 Implementation reference data ................................................................................... 1-20
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. iii
ID041213 Confidential
Contents
Chapter 9 Sign-off
9.1 About sign-off ................................................................................................................ 9-2
9.2 Obligations for sign-off .................................................................................................. 9-3
9.3 Requirements for sign-off ............................................................................................. 9-4
9.4 Steps for sign-off ........................................................................................................... 9-5
9.5 Completion of sign-off ................................................................................................... 9-6
Appendix A Revisions
This preface introduces the Cortex-A7 MPCore Configuration and Sign-off Guide. It contains
the following sections:
• About this book on page vi.
• Feedback on page x.
Note
Throughout this document, implementation_<technology> indicates a directory name that
indicates the process you are using to implement the Cortex-A7 MPCore processor. For
example, implementation_<technology> might be implementation_tsmc_cln32lp.
Implementation obligations
This book is designed to help you implement an ARM product. The extent to which the
deliverables can be modified or disclosed is governed by the contract between ARM and the
Licensee. There might be validation requirements which, if applicable, are detailed in the
contract between ARM and the Licensee and which, if present, must be complied with prior to
the distribution of any devices incorporating the technology described in this document.
Reproduction of this document is only permitted in accordance with the licenses granted to the
Licensee.
ARM assumes no liability for your overall system design and performance. Verification
procedures defined by ARM are only intended to verify the correct implementation of the
technology licensed by ARM, and are not intended to test the functionality or performance of
the overall system. You or the Licensee are responsible for performing system level tests.
You are responsible for applications that are used in conjunction with the ARM technology
described in this document, and to minimize risks, adequate design and operating safeguards
must be provided for by you. Publishing information by ARM in this book of information
regarding third party products or services is not an express or implied approval or endorsement
of the use thereof.
The rnpn identifier indicates the revision status of the product described in this book, where:
rn Identifies the major revision of the product.
pn Identifies the minor revision or modification status of the product.
Intended audience
This manual is written for experienced hardware engineers who might or might not have
experience of ARM products, but who have experience of writing Verilog and of performing
synthesis, and who want to implement a Cortex-A7 MPCore processor in an System-on-Chip
(SoC) design.
Chapter 1 Introduction
Read this for a description of the Cortex-A7 MPCore processor design platforms
and tools, including the supported design flow, directory structure, and design
hierarchy.
Chapter 9 Sign-off
Read this for a description of the ARM verification criteria, and how to sign off
your design.
Appendix A Revisions
Read this for a description of technical changes in this document.
Glossary
The ARM glossary is a list of terms used in ARM documentation, together with definitions for
those terms. The ARM glossary does not contain terms that are industry standard unless the
ARM meaning differs from the generally accepted meaning.
Conventions
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. vii
ID041213 Confidential
Preface
Typographical conventions
Style Purpose
bold Highlights interface elements, such as menu names. Denotes signal names. Also used for terms in descriptive
lists, where appropriate.
monospace Denotes text that you can enter at the keyboard, such as commands, file and program names, and source code.
monospace Denotes a permitted abbreviation for a command or option. You can enter the underlined text instead of the full
command or option name.
monospace italic Denotes arguments to monospace text where the argument is to be replaced by a specific value.
monospace bold Denotes language keywords when used outside example code.
<and> Encloses replaceable terms for assembler syntax where they appear in code or code fragments. For example:
MRC p15, 0 <Rd>, <CRn>, <CRm>, <Opcode_2>
SMALL CAPITALS Used in body text for a few terms that have specific technical meanings, that are defined in the ARM glossary.
For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and UNPREDICTABLE.
Signals
Signal level The level of an asserted signal depends on whether the signal is
active-HIGH or active-LOW. Asserted means:
• HIGH for active-HIGH signals.
• LOW for active-LOW signals.
Additional reading
ARM publications
This book contains information that is specific to this product. See the following documents for
other relevant information:
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. viii
ID041213 Confidential
Preface
• CoreSight Embedded Trace Macrocell™ v3.5 Architecture Specification (ARM IHI 0014).
• AMBA AXI™ and ACE® Protocol Specification, AXI3™, AXI4™, and AXI4-Lite™, ACE and
ACE-Lite™ (ARM IHI 0022).
Feedback
ARM welcomes feedback on this product and its documentation.
If you have any comments or suggestions about this product, contact your supplier and give:
• An explanation with as much information as you can provide. Include symptoms and
diagnostic procedures if appropriate.
Feedback on content
This chapter introduces the supported design flow and structure of the deliverables for the
Cortex-A7 MPCore processor. It contains the following sections:
• About implementation on page 1-2.
• Implementation resources on page 1-3.
• Implementation controls and constraints on page 1-4.
• Implementation inputs on page 1-16.
• Implementation flow on page 1-17.
• Implementation outputs on page 1-19.
• Implementation reference data on page 1-20.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-1
ID041213 Confidential
Introduction
Outputs:
Inputs:
Verified design, GDSII
RTL Implementation
Models
Models
Reports and logs
Resources:
EDA tools
Testbenches
Test vectors
Scripts
Documentation
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-2
ID041213 Confidential
Introduction
<simulator>a
Purpose Vendor Tool
tool specifier
a. <simulator> is used as part of a command to run your chosen simulator throughout this guide.
Note
• The Cortex-A7 MPCore Release Note describes any special requirements that might affect
the flow, such as details of any special tool requirements that enable optional flows within
the implementation.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-3
ID041213 Confidential
Introduction
This section describes the processor signals you must connect and the timing constraints on the
signals. It contains the following sections:
• Clock signals.
• Reset signals on page 1-5.
• AMBA4 master interface on page 1-5.
• Debug interfaces on page 1-7.
• Trace interfaces on page 1-9.
• Interrupts on page 1-11.
• Scan test and MBIST signals on page 1-12.
• Standby signals on page 1-12.
• Performance monitoring signals on page 1-13.
• Configuration pins on page 1-13.
The timing constraints for signals are classified according to the percentage of the clock period
that is available for external logic:
• For inputs this is the delay between the last register and the input port.
• For outputs this is the delay between the output port and the first register.
Note
Actual clock frequencies and input and output timing constraints vary according to application
requirements and the silicon process technologies used. The maximum operating clock
frequencies change according to the constraints and the process technology you use.
Clock signals
The Cortex-A7 MPCore processor includes a system clock, CLKIN, which drives the logic in
the processor. Table 1-2 shows the clock signals.
CLKIN Input -
ACLKENM Input 40
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-4
ID041213 Confidential
Introduction
Reset signals
nCORERESET[3:0] Input 40
nDBGRESET[3:0] Input 40
nL2RESET Input 40
nMBISTRESET Input 40
L1RSTDISABLE[3:0] Input 40
L2RSTDISABLE Input 40
nSOCDBGRESETa Input 40
nETMRESET[3:0]a Input 40
ARREADYM Input 40
ARVALIDM Output 50
ARADDRM[39:0] Output 60
ARLENM[7:0] Output 60
ARSIZEM[2:0] Output 60
ARBURSTM[1:0] Output 60
ARLOCKM Output 60
ARCACHEM[3:0] Output 60
ARPROTM[2:0] Output 60
ARIDM[5:0] Output 60
ARSNOOPM[3:0] Output 60
ARDOMAINM[1:0] Output 60
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-5
ID041213 Confidential
Introduction
ARBARM[1:0] Input 60
RVALIDM Input 50
RLASTM Input 60
RDATAM[127:0] Input 60
RRESPM[3:0] Input 60
RIDM[5:0] Input 60
RREADYM Input 60
AWREADYM Output 40
AWVALIDM Output 60
AWADDRM[39:0] Output 60
AWLENM[7:0] Output 60
AWSIZEM[2:0] Output 60
AWBURSTM[1:0] Output 60
AWLOCKM Output 60
AWCACHEM[3:0] Output 60
AWPROTM[2:0] Output 60
AWIDM[4:0] Output 60
AWSNOOPM[2:0] Output 60
AWDOMAINM[1:0] Output 60
AWBARM[1:0] Output 60
WREADYM Input 50
WVALIDM Output 60
WLASTM Output 60
WDATAM[127:0] Output 60
WSTRBM[15:0] Output 60
WIDM[4:0 Output 60
BVALIDM Input 50
BRESPM[1:0] Input 60
BIDM[4:0] Input 60
BREADYM Output 60
ACREADYM Output 60
ACVALIDM Input 40
ACADDRM[39:0] Input 60
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-6
ID041213 Confidential
Introduction
ACPROTM[2:0] Input 60
ACSNOOPM[3:0] Input 60
CRREADYM Input 40
CRVALIDM Output 50
CRRESPM[4:0] Output 60
CDREADYM Input 40
CDVALIDM Output 50
CDDATAM[127:0] Output 60
CDLASTM Output 60
RACKM Output 60
WACKM Output 60
BROADCASTINNER Input 40
BROADCASTOUTER Input 40
BROADCASTCACHEMAINT Input 40
SYSBARDISABLE Input 40
nAXIERRIRQ Output 60
Debug interfaces
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-7
ID041213 Confidential
Introduction
Authentication interface
DBGEN[3:0] Input 60
SPIDEN[3:0] Input 50
NIDEN[3:0] Input 50
SPNIDEN[3:0] Input 60
APB interface
PCLKENDBG Input 40
PSELDBG Input 60
PADDRDBG[16:2]a Input 60
PADDRDBG31 Input 60
PWRITEDBG Input 60
PRDATADBG[31:0] Output 60
PWDATADBG[31:0] Input 60
PENABLEDBG Input 60
PREADYDBG Output 60
PSLVERRDBG Output 60
COMMRX[3:0] Output 40
COMMTX[3:0] Output 40
DBGACK[3:0] Output 40
DBGNOPWRDWN[3:0] Output 40
DBGRESTART[3:0] Input 50
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-8
ID041213 Confidential
Introduction
DBGRESTARTED[3:0] Output 40
DBGROMADDR[39:12] Input 50
DBGROMADDRV Input 50
DBGSELFADDR[39:17]a Input 50
DBGSELFADDRV Input 50
DBGOSUNLOCKCATCH[3:0]b Input 50
DBGHALTREQ[3:0]b Input 50
DBGLOCKSET[3:0]b Input 50
DBGHOLDRST[3:0]b Input 50
DBGSWENABLE[3:0] Input 50
DBGTRIGGER[3:0] Output 40
EDBGRQ[3:0] Input 50
APBACTIVEb Output 50
DBGPWRUPREQ[3:0]c Output 40
DBGPNOPWRDWN[3:0]c Output 40
DBGPWRDUP[3:0]c Input 50
Trace interfaces
Your design implements a set of trace interface signals for each of the cores included in the
Cortex-A7 MPCore processor. This section describes both the CORTEXA7INTEGRATION level and
CORTEXA7 level trace interface signals.
ATCLKEN Input 60
ATIDMx[6:0] Output 60
AFREADYMx Output 60
AFVALIDMx Output 60
ATBYTESMx[2:0] Output 60
ATDATAMx[63:0] Output 60
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-9
ID041213 Confidential
Introduction
ATREADYMx Output 60
ATVALIDMx Output 60
CTIASICCTLx[7:0] Output 60
CTICHINACK[3:0] Output 60
CTIEXTTRIG[3:0] Output 60
CTICHOUT[3:0] Output 60
ETMASICCTLx[7:0] Output 50
ETMEN[3:0] Output 50
ETMEXTOUTx[1:0] Output 50
ETMFIFOPEEKx[7:0] Output 60
ETMPWRUP[3:0] Output 50
ETMPWRUPREQ[3:0] Output 50
ETMSTANDBYWFX[3:0] Output 60
nCTIIRQ[3:0] Output 60
MAXEXTIN[2:0] Input 60
MAXEXTOUT[1:0] Input 60
PMUEVENTx[29:0] Output 30
CISBYPASS Input 50
CIHSBYPASS[3:0] Input 60
CTICHIN[3:0] Input 50
CTICHOUTACK[3:0] Input 60
CTIEXTTRIGACK[3:0] Input 60
TSCLKCHANGE Input 60
SYNCREQ Input 60
Table 1-11 on page 1-11 shows the CORTEXA7 level trace interface signals.
ETMICTLx[19:0] Output 60
ETMIAx[31:0] Output 60
ETMDCTLx[10:0] Output 60
ETMDAx[31:0] Output 60
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-10
ID041213 Confidential
Introduction
ETMDDx[63:0] Output 60
ETMCIDx[31:0] Output 60
ETMWFXPENDINGx Output 60
ETMPWRUPx Input 60
ETMEXTOUTx[1:0] Input 60
ETMVMIDx[7:0] Output 60
Interrupts
Table 1-11 shows the interrupt signals. There are nVFIQ[3:0], nVIRQ[3:0], nIRQ[3:0],
nFIQ[3:0], nIRQOUT[3:0], and nFIQOUT[3:0] signals for each processor in the
multiprocessor device.
nFIQ[3:0] Input 50
nIRQ[3:0] Input 50
nVFIQ[3:0] Input 50
nVIRQ[3:0] Input 50
IRQS[n:0]a Input 60
nFIQOUT[3:0]b Output 60
nIRQOUT[3:0]b Output 60
nCNTPNSIRQ[3:0] Output 40
nCNTPSIRQ[3:0] Output 40
nCNTVIRQ[3:0] Output 40
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-11
ID041213 Confidential
Introduction
nCNTHPIRQ[3:0] Output 40
CNTVALUEB[63:0] Input 40
TSVALUEB[63:0]a Input 40
DFTRSTDISABLE Input 60
DFTSE Input 20
DFTRAMHOLD Input 60
MBISTACK Output 60
MBISTADDR[13:0] Input 50
MBISTARRAY[8:0] Input 50
MBISTBE[7:0] Input 60
MBISTCFG Input 50
MBISTINDATA[85:0] Input 50
MBISTOUTDATA[85:0] Output 50
MBISTREADEN Input 60
MBISTREQ Input 60
MBISTWRITEEN Input 60
Standby signals
STANDBYWFI[3:0] Output 40
STANDBYWFE[3:0] Output 40
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-12
ID041213 Confidential
Introduction
STANDBYWFEL2 Output
EVENTI Input 60
EVENTO Output 50
nPMUIRQ[3:0] Output 40
Configuration pins
The majority of the signals on the configuration pins are static. That is, their values are sampled
only on startup and on restarting after a reset. Two configuration pins , CP15SDISABLE[3:0]
and CFGSDISABLE, are dynamic. Any change on these pins takes effect immediately when
the processor is active. Table 1-16 shows the configuration pins.
ACINACTM Input 60
CFGSDISABLEa Input 60
CFGEND[3:0] Input 40
CFGTE[3:0] Input 40
CP15SDISABLE[3:0] Input 50
VINITHI[3:0] Input 40
CLUSTERID[3:0] Input 40
PERIPHBASE[39:15] Input 40
SMPnAMP[3:0] Output 40
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-13
ID041213 Confidential
Introduction
Table 1-17 shows the clock enable signals included in the processor, the associated processor
signals and signal groups, and the corresponding synchronous system clock domain.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-14
ID041213 Confidential
Introduction
Certain paths require multi-cycle timing constraints. Table 1-18 shows the multi-cycle paths
according to the direction point.
To DFTSO*
Through *u_ca7caches_tlb_rams*/SO[*]
*u_ca7caches_tlb_rams*/SI[*]
*u_ca7_scu_l1d_tagrams/g_l1d_cpu*_rams*u_l1d_tagram_cpu*_way*/SO[*]
*u_ca7_scu_l1d_tagrams/g_l1d_cpu*_rams*u_l1d_tagram_cpu*_way*/SI[*]
*g_l2_rams*u_ca7_l2_*rams*/SO[*]a
*g_l2_rams*u_ca7_l2_*rams*/SI[*]a
From DFTSE
DFTRAMHOLD
DFTRSTDISABLE
DFTSI*
DFTRAMBYP
PERIPHBASE[*]
CLUSTERID[*]
CFGTE[*]
VINITHI[*]
DBGROMADDR[*]
DBGROMADDRV
DBGSELFADDR[*]
DBGSELFADDRV
BROADCASTINNER
BROADCASTOUTER
BROADCASTCACHEMAINT
u_cortexa7/u_cortexa7l2/g_l2_rams.u_ca7_l2_datarams/u_l2_dataram_*_low/CLKa
u_cortexa7/u_cortexa7l2/g_l2_rams.u_ca7_l2_datarams/u_l2_dataram_*_high/CLKa
a. These paths relate to the L2 data RAM read when the L2 cache is present. When L2_LATENCY is set to:
0 A multicycle setup path of 2 cycles must be specified.
1 A multicycle setup path of 3 cycles must be specified.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-15
ID041213 Confidential
Introduction
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-16
ID041213 Confidential
Introduction
Start
Configure RTL
Build
Integrate memories
RTL
Validate RTL
Perform synthesis
Create layout
Perform characterization
Complete
Key implementation tasks on page 2-3 gives details of the steps in the implementation flow.
Note
Your contract requires you to complete sign-off as part of the completed flow. See
Implementation obligations on page vi.
Because the Cortex-A7 MPCore processor RTL is highly configurable, you must validate your
design at a number of points during implementation. Figure 1-3 on page 1-18 shows a simplified
view of the implementation process, indicating where you must test or validate your design, and
the additional validation recommended by ARM.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-17
ID041213 Confidential
Introduction
Start
Correct? No
Yes
3 Configure RTL
Correct? No
Yes
Optional, but recommended by ARM
Modify vector capture template 5 E Perform synthesis and place and route
No Correct?
9 Sign-off
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-18
ID041213 Confidential
Introduction
• Logs and reports showing logical equivalence of post-layout netlist with configured RTL.
• Components:
— post-layout netlist.
— Synthesis timing model.
— Graphic Data System II (GDS II) data.
— Standard Delay Format (SDF) data.
• Test:
— Automatic Test Pattern Generation (ATPG) vectors.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-19
ID041213 Confidential
Introduction
Table 1-19 shows the top-level directories for each stage of implementation.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-20
ID041213 Confidential
Introduction
The logical/ directory contains the RTL hierarchy. Figure 1-4 shows the Cortex-A7 MPCore
RTL hierarchy.
<release_directory>/
logical/
ca7biu/
ca7dcu/
ca7dpu/
ca7icu/
ca7pfu/
ca7scu/
ca7stb/
ca7tlb/
cortexa7/
cortexa7integration/
cscti/
gic400/
models/
cells/
rams/
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-21
ID041213 Confidential
Chapter 2
Key Implementation Points
This chapter describes the key implementation points you must consider when you implement
the Cortex-A7 MPCore processor. It contains the following sections:
• About key implementation points on page 2-2.
• Key implementation tasks on page 2-3.
Note
Some of the implementation steps listed in this chapter are EDA tool specific and are not
described in this document. See the supplied implementation reference methodology
documents.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 2-1
ID041213 Confidential
Key Implementation Points
You can use this chapter to check that you have covered the implementation steps described in
the other chapters.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 2-2
ID041213 Confidential
Key Implementation Points
1. Validate delivered RTL using source code. How to validate the Cortex-A7 MPCore processor. See
Chapter 5 RTL Validation.
3. Perform RAM integration and run the testbench. How to integrate your RAM blocks into the Cortex-A7
MPCore processor. See Chapter 4 Memory Integration.
4. Confirm RTL configuration using source code. How to validate the Cortex-A7 MPCore processor using test
vectors. See Chapter 5 RTL Validation.
5. Determine optimum floorplan. What to consider when placing RAM blocks and what other
recommendations for optimizing performance. See Chapter 6
Floorplan Guidelines.
11. Use the standard implementation flow to implement a DFT Reference data for production testing for the processor. See
solution. Chapter 7 Design for Test and the supplied implementation
reference methodology documents.
12. Perform dynamic verification and Logical Equivalence Reference data dynamic verification process. See Chapter 8
Checking (LEC). Dynamic Verification and the supplied implementation
reference methodology documents.
13. Perform sign-off in accordance with the required criteria. What are the verification criteria before you sign off in the
macrocell design in addition to your normal SoC flow sign-off
checks. See Chapter 9 Sign-off.
Note
You must complete the implementation process to produce complete and verified deliverables,
see Requirements for sign-off on page 9-4.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 2-3
ID041213 Confidential
Chapter 3
Configuration Guidelines
This chapter describes the guidelines for RTL configuration. These enable you to configure the
implementation to the specific requirements of the target application. It contains the following
sections:
• About configuration guidelines on page 3-2.
• Configuration options on page 3-3.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-1
ID041213 Confidential
Configuration Guidelines
Caution
For successful configuration of the RTL you must:
• Set the configurable options, see Configuration options on page 3-3.
• Integrate the memory, see Chapter 4 Memory Integration.
• Validate your configured RTL, see Validating RTL configuration on page 5-8.
If you do not complete and validate your configuration correctly, your synthesized design might
malfunction.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-2
ID041213 Confidential
Configuration Guidelines
The Cortex-A7 MPCore is a multiprocessor device that can be configured with between one and
four individual processors, and is implemented using either the top-level CORTEXA7INTEGRATION
Verilog module or the top-level CORTEXA7 Verilog module. Figure 3-1 shows a block diagram of
a Cortex-A7 MPCore processor configured with:
• Four processors at the CORTEXA7 level.
• Trace, CTI, APBROM, and APB decoder at the CORTEXA7INTEGRATION level.
CORTEXA7INTEGRATION
CORTEXA7
CTI
CTI
CTI
Trace L1 L1 L1 L1 L1 L1 L1 32KB L1
instruction data instruction data instruction data instruction data
cache† cache† cache† cache† cache† cache† cache† cache†
CTI
CTI
CTI
CTI
Processor 0 Processor 1 Processor 2 Processor 3
APBROM
Optional Optional
Interrupt Snoop Control Unit (SCU) L2 cache
APB decoder Controller controller††
Processor power
down Master interface
†
Configurable L1 cache size 8KB, 16KB, 32KB, or 64KB
††
Configurable L2 cache size None, 128KB, 256KB, 512KB, 1024KB
Note
If required, you can implement the Cortex-A7 MPCore processor without the
CORTEXA7INTEGRATION level.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-3
ID041213 Confidential
Configuration Guidelines
Verilog parameters control the configuration of your Cortex-A7 MPCore implementation. There
are two levels of parameter:
• Global parameters control your implementation of the Cortex-A7 MPCore processor as
a whole.
• Processor-level parameters control your implementation of each individual processor in
the multiprocessor device. For example, they control whether each processor in the
multiprocessor device is configured with a Floating Point Unit (FPU) or NEON Media
Processing Engine (MPE).
You must define a complete set of processor-level configuration options for each
processor in your multiprocessor implementation.
Table 3-1 shows the global configuration options for the Cortex-A7 MPCore processor. See
Global configuration options on page 3-7 for descriptions of each of the configuration options.
Configuration
Feature Permitted values Comment
parameter
L1 instruction cache size L1_ICACHE_SIZE 000 = 8KB $ You must select a permitted
001 =16KB $ value
L1 data cache size L1_DCACHE_SIZE
011 = 32KB $
111 = 64KB $
Trace for each processorb ETM_PRESENT 0 = ETM and CTI for each processor not present -
1 = ETM and CTI for each processor present
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-4
ID041213 Confidential
Configuration Guidelines
Configuration
Feature Permitted values Comment
parameter
The configuration options for each implemented processor in the multiprocessor device include
which floating-point support you require, chosen from the following options:
• Implement the NEON MPE and the FPU.
• Implement the FPU only.
• Do not implement any floating-point support.
Table 3-2 shows the processor-level configuration options and the permitted parameter settings
for those options. <n> denotes processor number (0-3).
FPU FPU_<n> 0, 1 You can select either permitted value. 1 indicates the
feature is included. Must be 1 if NEON_<n> is 1.
NEON NEON_<n> 0, 1 You can select either permitted value. 1 indicates the
feature is included.
Global configuration on page 3-4 introduces the global configuration options and lists the
global configuration parameters. Individual processor configuration gives the same information
for the processor level configuration. Using the information in those sections, you can set your
chosen configuration options for the CORTEXA7INTEGRATION top-level Verilog module as follows:
If you are implementing at the CORTEXA7 top-level Verilog module or you are performing RAM
integration and using the testbench:
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-5
ID041213 Confidential
Configuration Guidelines
logical/cortexa7/verilog/CORTEXA7_CONFIG.v
Example 3-1 shows the configuration file for a Cortex-A7 MPCore implementation with:
• Trace.
• L2 cache.
• 32 interrupts.
• Two uniform processors, each without FPU and NEON.
// ------------------------------------------------------
// Integration layer parameters
// ------------------------------------------------------
parameter ETM_PRESENT = 0, // Include CTI and ETM for each core in the cluster
parameter [16:0] APBADDR_A7ROM = 17'h0_0000, // ROM Table configuration: ROM APB debug base address
parameter [16:0] APBADDR_CPU = 17'h1_0000, // ROM Table configuration: CPU APB debug base address
parameter [16:0] APBADDR_CTI = 17'h1_8000, // ROM Table configuration: CTI APB debug base address
parameter [16:0] APBADDR_ETM = 17'h1_C000, // ROM Table configuration: ETM APB debug base address
// ------------------------------------------------------
// Cluster Parameters
// ------------------------------------------------------
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-6
ID041213 Confidential
Configuration Guidelines
// 000 : 128kB $
// 001 : 256kB $
// 011 : 512kB $
// 111 : 1024kB $
// L2 Latency Encoding
//
// 0 : 2 cycles
// 1 : 3 cycles
// ------------------------------------------------------
// Core 0
// ------------------------------------------------------
// ------------------------------------------------------
// Core 1 (if present)
// ------------------------------------------------------
// ------------------------------------------------------
// Core 2 (if present)
// ------------------------------------------------------
// ------------------------------------------------------
// Core 3 (if present)
// ------------------------------------------------------
Global configuration on page 3-4 introduces the global configuration options. The following
sections describe each of these options:
• L2 cache.
• Number of processors in the multiprocessor device.
• Number of interrupts on page 3-8.
• CORTEXA7INTEGRATION level component base addresses on page 3-8.
L2 cache
You must configure the number of processors in the multiprocessor device to a value from one
to four by setting the NUM_CPUS Verilog parameter.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-7
ID041213 Confidential
Configuration Guidelines
Number of interrupts
Set the Verilog parameter NUM_SPIS to define the number of interrupts in your design. You can
set NUM_SPIS from 0-480 in steps of 32. If the integrated GIC is not present, NUM_SPIS should
be 0.
Integrated GIC
You must set the CORTEXA7INTEGRATION level ROM and components base addresses. Table 3-3
shows the default values.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-8
ID041213 Confidential
Configuration Guidelines
You must set the processor-level options for each processor included in the multiprocessor
device. This section describes the options:
• Implementing the Floating Point Unit (FPU).
• Implementing the NEON Media Processing Engine (MPE).
To configure the FPU in a processor included in the multiprocessor device, set the Verilog
parameter FPU_<n> to 1, where <n> is the processor number (0-3). If a processor in the
multiprocessor device does not require the FPU, set the Verilog parameter FPU_<n> to 0.
Note
For each processor included in the multiprocessor device, set the value of the FPU_<n> parameter
to either 1 or 0.
To configure the NEON MPE in a processor included in the multiprocessor device, set the
Verilog parameter NEON_<n> to 1, where <n> is the processor number 0-3. If a processor in the
multiprocessor device does not require the NEON Media Processing Engine, set the Verilog
parameter NEON_<n> to 0.
Note
• For each processor in the multiprocessor device, set the value of the NEON_<n> parameter
to either 1 or 0.
• For any processor in the multiprocessor device, if you set the value of the NEON_<n>
parameter to 1 you must also set the value of the FPU_<n> parameter to 1.
The following section describes additional configuration requirements that might apply to your
implementation:
• Implementation defined cells.
1. Install your implementation defined cells at the same level in the hierarchy, for example
in the directory logical/models/cells/my_cells/.
2. Configure your synthesis tools to point to this location for these cells.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-9
ID041213 Confidential
Chapter 4
Memory Integration
This chapter describes the RAM organization and how to integrate your RAM blocks into the
processor. It contains the following sections:
• About memory integration on page 4-2.
• Resource requirements for memory integration on page 4-4.
• Controls and constraints for memory integration on page 4-5.
• Blocks for memory integration on page 4-7.
• Flow for memory integration on page 4-20.
• Confirmation of memory integration on page 4-22.
• Outputs from memory integration on page 4-26.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-1
ID041213 Confidential
Memory Integration
A Cortex-A7 MPCore processor includes the following memory modules that you must
implement:
• Instruction cache data RAM.
• Instruction cache tag RAM.
• Data cache data RAM.
• Data cache tag RAM.
• Data cache dirty RAM.
• Translation Lookaside Buffer (TLB) RAM.
• Snoop Control Unit (SCU) duplicate tag RAM.
You must implement these modules for each processor in the multiprocessor device, and your
implementation as either:
• Uniform, meaning it has the same RAM configuration for each processor in the
multiprocessor device.
• Non-uniform, meaning the RAM configurations are not the same for all processors in the
multiprocessor device.
If L2_CACHE_PRESENT is set, a Cortex-A7 MPCore processor also includes the following memory
modules that you must implement:
• L2 cache tag RAM.
• L2 cache data RAM.
You must ensure you have set the L1 instruction and data cache size parameters, L1_ICACHE_SIZE
and L1_DCACHE_SIZE , and if L2 cache is present, L2_CACHE_SIZE.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-2
ID041213 Confidential
Memory Integration
Outputs:
Inputs: Memory
Configured RTL
RTL Integration
Reports and logs
Resources:
RAM models, standard cell libraries
HDL simulators
Memory integration testbench
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-3
ID041213 Confidential
Memory Integration
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-4
ID041213 Confidential
Memory Integration
• Byte-write, or bit-write, control signals must be used for some RAM blocks. If byte-write
or bit-write RAM blocks are not available, you must use blocks of narrower RAM. This
increases the overall RAM area.
Table 4-1 shows the access timing requirements for the L1 and SCU duplicate tag RAMs
RAM type Setup time as a percentage of clock cycle Access time as a percentage of clock cycle
Data 40 50
Tag 40 35
Dirty 40 50
TLB 40 35
SCU tag 40 35
Figure 4-2 shows the L1 and SCU duplicate tag RAMs access timings.
Clock
Write enable
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-5
ID041213 Confidential
Memory Integration
Table 4-2 shows the timing requirements for the L2 tag and data RAMs.
RAM type Setup time as a percentage of clock cycle Access time as a percentage of clock cycle
L2 tag 20 50
L2 data 20 50
Clock
Write enable
Figure 4-4 shows the L2 data RAMs access timings when latency is 2 cycles.
Read access
Write access
Clock
CLKEN
Clk_bank
Chip select
Write enable
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-6
ID041213 Confidential
Memory Integration
Location Contents
Table 4-4 shows example RAM blocks instantiations in the logical/model/rams directory for
each processor in the Cortex-A7 MPCore processor and the SCU.
Note
Items prefixed with u_ indicate an instance name of a module.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-7
ID041213 Confidential
Memory Integration
The size of the instruction cache and data cache RAM instances and the SCU duplicate tag RAM
instances depend on your configured cache sizes. You can configure different cache sizes for
each processor in the multiprocessor device. For more information, see RAM instance sizes.
Note
• The Cortex-A7 MPCore design provides a write enable signal to each RAM, and byte or
bit enables for RAM blocks that require byte or bit enables. You can ignore the write
enable signal if your RAM only requires the byte-write or bit-write enable inputs.
• If you instantiate a larger RAM than required, for example if your RAM generator cannot
produce a RAM of the required size, you must tie the redundant upper address bit or bits
LOW.
• In the ideal case, you can produce a single block of compiled RAM for each block of
RAM. This might not be possible if:
— Your RAM does not have the required byte-write control. In this case you must
construct the RAM out of multiple blocks of byte-wide RAM. See Producing
byte-write memory from word-write RAM on page 4-20.
— Your compiler cannot produce a single RAM block that is the required size, or a
single RAM block might not meet the timing requirements. In this case, you must
produce the RAM out of two or more blocks of smaller RAM. See Producing a
large memory from smaller RAM blocks on page 4-20.
• Table 4-5 shows the L1 cache and SCU RAMs. For these RAMs, the size of the RAM
instances depends on the cache size you have configured.
• Table 4-7 on page 4-9 shows the L2 cache data and tag RAMs.
Snoop control unit tag RAM - - Yes 32x32 64x31 128x30 256x29 4
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-8
ID041213 Confidential
Memory Integration
The multiprocessor device must be configured to specify what cache sizes have been
implemented. To do this, you must set the following parameters to the correct value:
• L1_ICACHE_SIZE for the instruction cache.
• L1_DCACHE_SIZE for the data cache.
Note
Regardless which level of integration you use, you must always update the CORTEXA7_CONFIG.v
configuration file.
Table 4-8 shows the encoding values. The instruction cache can be a different size from the data
cache. All processors have the same sizes of L1 cache and TLB RAMs.
Each processor has the same size of RAMs by default. It is possible to create a non-uniform
configuration, where the processors in a multiprocessor device have different cache sizes.
8KB 0b000
16KB 0b001
32KB 0b011
64KB 0b111
If L2_CACHE_PRESENT is set the L2 cache must be configured to specify what cache sizes have been
implemented. To do this set the L2_CACHE_SIZE parameter to the correct value. See Table 4-9 on
page 4-10.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-9
ID041213 Confidential
Memory Integration
128KB 0b000
256KB 0b001
512KB 0b011
1024KB 0b111
The following sections describe the implementation of each of the RAM blocks:
• Instruction cache data RAM on page 4-11.
• Instruction cache tag RAMs on page 4-11.
• Data cache data RAM on page 4-12.
• Data cache tag RAM on page 4-13.
• Data cache dirty RAM on page 4-14.
• TLB RAM on page 4-15.
• SCU duplicate tag RAMs on page 4-16.
• L2 tag on page 4-17.
• L2 data on page 4-18.
Note
These sections describe the implementation of each RAM block. Before implementing your
RAM blocks, ARM strongly recommends that you look at the files:
• logical/models/rams/generic/ca7caches_tlb_rams.v.
• logical/models/rams/generic/ca7_scu_l1d_tagrams.v.
• logical/models/rams/generic/ca7_l2_datarams.v.
• logical/models/rams/generic/ca7_l2_tagrams.v.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-10
ID041213 Confidential
Memory Integration
All the instruction cache data RAM signals are active HIGH. If any of the signals in your
instruction cache data RAMs are active LOW, you must reverse the polarity.
The instruction cache data RAMs are two 72-bit wide RAM blocks with 18-bit word enables.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The RAM is byte enabled, divided into two ways. The following describes how to connect the
Instruction cache data RAM enable, address, write, and data signals:
Write enable Connect ic_dataram_wr_i to the write enable pin of each RAM used. The
write enable pin is also known as the global write enable.
Byte write enable Connect ic_dataram_strb_i[3:0] to the byte write enable pin of each
RAM. Each address bit represents 18 bits of data. If a bit writable RAM is
used you must replicate each bit 18 times.
8KB ic_dataram_addr_i[8:0]
16KB ic_dataram_addr_i[9:0]
32KB ic_dataram_addr_i[10:0]
64KB ic_dataram_addr_i[11:0]
Write data Connect ic_dataram_wdata_i[71:0] to the input data pins of each RAM.
All the instruction cache tag RAMs signals are active HIGH. If any of these signals are active
LOW you must reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-11
ID041213 Confidential
Memory Integration
The RAM is word enabled, divided into two ways. The following describes how to connect the
Instruction cache tag RAM enable, address, write, and data signals:
Enable Connect ic_tagram_en_i[<n>] to the chip enable pin of way <n> of your
RAM. Where <n> is 0-1. If more than one RAM block is used per way use
this for each block.
Write enable Connect ic_tagram_wr_i to the write enable pin of each RAM used. The
write enable pin is also known as the global write enable.
If a bit writable RAM is used replicate each bit 31 times.
Address Connect ic_tagram_addr_i to the address pins of each RAM. The number
of bits to be used is dependent on the cache size, see Table 4-11.
Table 4-11 shows which bits of the address bus, ic_tagram_addr_i,
connect to the RAM blocks for each cache size.
8KB ic_tagram_addr_i[6:0]
16KB ic_tagram_addr_i[7:0]
32KB ic_tagram_addr_i[8:0]
64KB ic_tagram_addr_i[9:0]
Write data Connect ic_tagram_wdata_i[30:0] to the input data pins of each RAM.
All the data cache data RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The RAM is byte enabled, divided into four ways each containing two banks. The following
describes how to connect the data cache data RAM enable, write, and data signals:
Write enable Connect dc_dataram_wr_i to the write enable pin of each RAM used.
The write enable pin is also known as the global write enable.
Byte write enable Connect dc_dataram_strb<n>_i[3:0] to the byte write enable pin of each
bank. Where <n> is 0-7. Each bit represents a byte of data hence if a bit
writable RAM is used replicate each bit 4 times.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-12
ID041213 Confidential
Memory Integration
8KB dc_dataram_addr<n>_i[7:0]
16KB dc_dataram_addr<n>_i[8:0]
32KB dc_dataram_addr<n>_i[9:0]
64KB dc_dataram_addr<n>_i[10:0]
All the data cache tag RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The RAM is word enabled, divided into 4 ways. The following describes how to connect the
data cache tag RAM enable, write, address, and data signals:
Enable Connect dc_tagram_en_i[<n>] to the chip enable pin of way <n> of your
RAM. Where <n> is 0-3. If more than one RAM block is used per way use
this for each block.
Write enable Connect dc_tagram_wr_i to the write enable pin of each RAM used.
Write enable is also known as global write enable. If bit writable RAMs
are used, replicate each bit N times, where N depends on cache size, see
Table 4-13.Table 4-13 shows the number of bits for different bit writable
data cache tag RAM sizes.
Number
Cache size
of bits, N
8KB 32
16KB 31
32KB 30
64KB 29
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-13
ID041213 Confidential
Memory Integration
8KB dc_tagram_addr_i[4:0]
16KB dc_tagram_addr_i[5:0]
32KB dc_tagram_addr_i[6:0]
64KB dc_tagram_addr_i[7:0]
8KB dc_tagram_rdata<n>_i[31:0]
16KB dc_tagram_rdata<n>_i[31:1]
32KB dc_tagram_rdata<n>_i[31:2]
64KB dc_tagram_rdata<n>_i[31:3]
Write data Connect dc_tagram_wdata_i to the input data pins of each RAM. The
size of the RAM depends on the cache size. The bigger the cache size the
fewer bits stored in physical memory, see Table 4-16.
8KB dc_tagram_wdata_i[31:0]
16KB dc_tagram_wdata_i[31:1]
32KB dc_tagram_wdata_i[31:2]
64KB dc_tagram_wdata_i[31:3]
All the data cache dirty RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-14
ID041213 Confidential
Memory Integration
The RAM is bit enabled. To connect the data cache dirty RAM enable, address, write, and data
signals:
Write enable Connect dc_dirtyram_wr_i to the write enable pin of each RAM used.
The write enable pin is also known as the global write enable.
Bit write enable Connect dc_dirtyram_strb_i[19:0] to the bit write enable pin of each
bank.
8KB dc_dirtyram_addr_i[4:0]
16KB dc_dirtyram_addr_i[5:0]
32KB dc_dirtyram_addr_i[6:0]
64KB dc_dirtyram_addr_i[7:0]
TLB RAM
All the TLB RAM signals are active HIGH. If any of these signals are active LOW you must
reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The RAM is word enabled, divided into two ways. The following describes how to connect the
TLB RAM enable, write, address, and data signals:
Enable Connect tlb_ram_en_i[<n>] to the chip enable pin of way <n> of your
RAM. Where <n> is 0-1. If more than one RAM block is used per way use
this for each block.
Write enable Connect tlb_ram_wr_i to the write enable pin of each RAM used. The
Write enable is also known as the global write enable. If a bit writable
RAM is used replicate each bit 86 times.
Write data Connect tlb_ram_wdata_i[85:0] to the input data pins of each RAM.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-15
ID041213 Confidential
Memory Integration
All the SCU duplicate tag RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The SCU requires four SCU duplicate tag RAM arrays for each processor included in the
Cortex-A7 MPCore processor.
Note
SCU duplicate tag RAMs must always be instantiated. This includes implementations where
NUM_CPUS is set to 1 and ACVALID is tied LOW.
SCU duplicate tag RAMs must have the same number of indexes as the related Data tag RAM
which is defined by the size of the data cache configured for the processor.
The RAM is word enabled, divided into 4 ways. The following describes how to connect the
SCU duplicate tag RAM enable, write, address, and data signals:
Write enable Connect l1d_tagram_cpu<m>_wr_i to the write enable pin of each RAM used.
The write enable is also known as the global write enable. If bit writable If bit
writable RAMs are used, replicate each bit N times, where N depends on cache
size, see Table 4-13 on page 4-13.
8KB l1d_tagram_cpu<m>_addr_i[4:0]
16KB l1d_tagram_cpu<m>_addr_i[5:0]
32KB l1d_tagram_cpu<m>_addr_i[6:0]
64KB l1d_tagram_cpu<m>_addr_i[7:0]
Write data Connect l1d_tagram_cpum_wdata_i to the input data pins of each RAM. The
size of the RAM bigger the size the fewer bits stored in physical memory, see
Table 4-16 on page 4-14.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-16
ID041213 Confidential
Memory Integration
L2 tag
All the L2 tag RAM signals are active HIGH. If any of these signals are active LOW you must
reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The RAM is word enabled, divided into 8 ways. The following describes how to connect the L2
tag RAM enable, write, address, and data signals:
Enable Connect l2_tagram_en_i[<n>] to the chip enable pin of way <n> of your RAM,
where <n> is 0-7. If more than one RAM block is used per way use this for each
block.
Write enable Connect l2_tagram_wr_i to the write enable pin of each RAM used.
Write enable is also known as global write enable.
128KB l2_tagram_addr_i[7:0]
256KB l2_tagram_addr_i[8:0]
512KB l2_tagram_addr_i[9:0]
1024KB l2_tagram_addr_i[10:0]
Read data Connect l2_tagram_rdata_way<n>_o to the output data pins of way <n> of
your RAM, where <n >is 0-7. The size of the RAM depends on the cache size.
The bigger the size the fewer bits stored in physical memory, see Table 4-20.
Table 4-20 shows the L2 tag RAM read data connections.
128KB l2_tagram_rdata_way<n>_i[32:0]
256KB l2_tagram_rdata_way<n>_i[32:1]
512KB l2_tagram_rdata_way<n>_i[32:2]
1024KB l2_tagram_rdata_way<n>_i[32:3]
Write data Connect l2_tagram_wdata_i to the input data pins of each RAM. The size of the
RAM depends on the cache size. The bigger the size the fewer bits stored in
physical memory, see Table 4-21 on page 4-18.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-17
ID041213 Confidential
Memory Integration
128KB l2_tagram_wdata_i[32:0]
256KB l2_tagram_wdata_i[32:1]
512KB l2_tagram_wdata_i[32:2]
1024KB l2_tagram_wdata_i[32:3]
L2 data
All the L2 data RAM signals are active HIGH. If any of these signals are active LOW you must
reverse the polarity.
If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.
The RAM is word enabled, divided into 8 ways. The L2 data RAMs is enabled at most once
every two cycles, to support longer-latency RAMs. If the RAM modules you are using do not
support being clocked at the full processor frequency, then a clock gate must be instantiated to
produce a gated clock that can be used with the RAMs. The following describes how to connect
the L2 data RAM enable, write, address, and data signals:
Clock enable Connect l2_dataram_clken_i to the enable pin of all of the clock gates
driving the RAM modules.
Write enable Connect l2_dataram_wr_i to the write enable pin of each RAM used.
Write enable is also known as global write enable.
128KB l2_dataram_addr_i[10:0]
256KB l2_dataram_addr_i[11:0]
512KB l2_dataram_addr_i[12:0]
1024KB l2_dataram_addr_i[13:0]
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-18
ID041213 Confidential
Memory Integration
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-19
ID041213 Confidential
Memory Integration
Note
• Before you perform memory integration, you must configure your RTL, as described in
Chapter 3 Configuration Guidelines.
• When you have integrated your RAM blocks you must validate your memory integration.
See Confirmation of memory integration on page 4-22.
If you do not have memories with byte-write control, you must construct these blocks using, for
example, four byte-wide RAM blocks to achieve a RAM word size of 32 bits. The rules for
connecting the four RAM blocks are:
• Each byte-wide RAM has the same address and chip select controls as the word-wide
RAM.
• One bit of the byte-write control signal connects to the write-enable pin of each of the
byte-wide RAM blocks. For example, bit 0 connects to the RAM representing byte 0.
• Data input and output signals [7:0] connect to the data input and output pins of the RAM
representing byte 0, and data input and output signals [15:8] connect to the data input and
output pins of the RAM representing byte 1, for example.
You might have to create a large memory out of smaller RAM blocks, for one or more of the
following reasons:
• Your RAM compiler cannot produce a RAM of the required size.
• A single large RAM is too slow for your performance requirements.
• A single large RAM does not fit into your floorplan.
The rules for producing a memory out of smaller RAM blocks are:
For example, if you create a RAM out of two smaller RAM blocks, b =2, and the required
address width for that memory size is 10 bits, n =10, then the address width, m, of the two
smaller RAM blocks is 9 bits. Address bits [(m-1):0] apply to all the RAM blocks.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-20
ID041213 Confidential
Memory Integration
• You must ensure that only the addressed RAM is enabled, by performing a b bit decode
of the [(n-1):m] address bits and ANDing these with the RAM enable control signal. In
the above example, you must apply address bits [8:0] to the two RAM blocks, and AND
a 2-bit decode of address bit [9] with the RAM enable to create two RAM enable signals,
that is:
assign RAMEnable_0 = ~Addr[9] & RAMEnable;
assign RAMEnable_1 = Addr[9] & RAMEnable;
• You must connect RAMEnable_0 to the RAM enable port of the first RAM block, and you
must connect RAMEnable_0 to RAMEnable_1 to the RAM enable port of the second RAM
block.
The approach is exactly the same for any memory that you construct from smaller RAM blocks.
2. Copy the generic RAM interface module and the RAM arrays into a new directory where
you will integrate your RAMs, and go to this directory:
cp -r generic <my_ram_dir>
cd <my_ram_dir>
3. Blocks for memory integration on page 4-7 describes how you connect the RAM for each
cache size. Use this to identify the RAM blocks that you require and then generate them
using your library RAM generator.
4. Integrate your RAM blocks into each of the modules, using the organization described in
Blocks for memory integration on page 4-7. All RAM control signals are driven active
HIGH. If your RAM blocks have active LOW control inputs you must invert the RAM
write enable pin, WE, and all the other control signals.
5. Check that the memory integration is correct by running the RAM integration testbench
described in RAM integration testbench on page 4-22.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-21
ID041213 Confidential
Memory Integration
Note
You must run the RAM validation testbench described in RAM integration testbench as part of
the sign-off criteria.
ARM provides a testbench that checks that your RAM blocks are correctly integrated. Before
running the testbench you must have successfully written all your RAM models and configured
your RTL.
1. Go to implementation_<technology>/CORTEXA7_RAMtestbench.
Note
The default directory settings for steps 2 and 3 simulate the example ARM RAMs. If you
want to debug any errors with your RAM integration, you can simulate the ARM RAMs
as a golden reference, otherwise change these paths to point to your modified source.
5. Running the RAM integration testbench depends on the simulator and the version used.
The testbench can be run using a common Makefile, available in the current directory:
• make mti
• make vcs
• make ius
The Makefile provides a clean command to remove any unwanted directories:
make clean
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-22
ID041213 Confidential
Memory Integration
Two RAM integration tests are performed for each RAM. Either or both of these tests will fail
if any integration errors occur. The status of each of the tests is reported in the testbench
summary. To assist you in debugging, the functionality of these tests are as follows:
Word test This test writes unique data to each RAM location, then reads back each location
and checks that the read data matches against that expected. The purpose of this
test is to check that the instantiated RAM is the correct size, and to check that the
address connections are correct. All RAM enables are driven HIGH throughout
this test, and all write enables and byte write enables are driven HIGH during
writes and LOW during reads, so the test does not detect shorts between these
signals, only opens.
If the read data is x, this indicates that one or more of the RAM signals is either
open or miswired.
Bit test This test walks a '1' across the RAM width, checking that all data in and out
connections are correct. This is done in two passes:
• The first pass drives all enables HIGH so that data in and out connectivity
can be tested.
• The second pass only asserts the enable for the portion that is being written
to or read from. This detects shorts between enables. A failure pattern is
driven to the RAM portions not being written to.
The read address is driven to '0' during this test. The write address is driven to a
value of 2 greater than the RAM size, which aliases to '0' if the RAM is the correct
size. If the RAM is too large, the read and write addresses differ and the test
correctly fails.
The testbench is self-checking. If your RAM integration is successful, the simulation completes
with the following message. This is a snapshot running the golden reference.
Summary
=======
CortexA7 Configuration
----------------------
Num CPU = 2
L2 Present = YES
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-23
ID041213 Confidential
Memory Integration
If your RAM integration is unsuccessful, the simulation completes and reports which RAMs
failed integration. For example, if the DDataRAM fails, the summary shows:
Summary
=======
CortexA7 Configuration
----------------------
Num CPU = 2
L2 Present = YES
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-24
ID041213 Confidential
Memory Integration
!!! Processor 0 Cache RAM integration FAILED. Test completed with errors !!!
The simulation also reports expected and actual RAM read data for the failing RAM to assist
you in debugging any errors.
Note
For D-Cache Data types of error, the expected and actual results are those from the testbench,
not necessarily those written to the RAM.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-25
ID041213 Confidential
Memory Integration
• The reports from the memory integration testbench. See Confirmation of memory
integration on page 4-22.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-26
ID041213 Confidential
Chapter 5
RTL Validation
This chapter describes how to validate the Cortex-A7 MPCore processor using test vectors. It
contains the following sections:
• About RTL validation on page 5-2.
• Resource requirements for RTL validation on page 5-3.
• Controls and constraints for RTL validation on page 5-4.
• Inputs for RTL validation on page 5-5.
• Flow for RTL validation on page 5-7.
• Outputs from RTL validation on page 5-9.
• Reference data for RTL validation on page 5-10.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-1
ID041213 Confidential
RTL Validation
Inputs: Outputs:
Delivered RTL, or RTL Validation Validated delivered RTL or configured RTL
configured RTL Logs and reports
Resources:
Simulation environment
Simulation testbench
Compute resources
Test suites
Scripts
Test vectors
Caution
The validation described in this chapter only checks that you have successfully unpacked the
RTL delivered by ARM, and enables you to check your configuration of the RTL. It does not
validate your synthesized RTL, which must pass:
• Logical verification.
• Timing verification.
• Characterization.
See the reference methodology documents supplied by your EDA tool vendor for information
on these processes.
• Compile and run code on your configured RTL, while capturing the vectors. ARM
supplies tests you can run in the simulation environment, and you can also write your own
tests.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-2
ID041213 Confidential
RTL Validation
vectors.cfg Edit this file before running the testbench to correctly set up everything.
crf/ Contains the test vector files in CRF format, compressed using the gzip command
line utility tool. The CRF vectors contain stimulus and expected response values
for the input and output ports of the macrocell respectively.
Capture Creates test vectors in accordance with the settings displayed in the configuration
list.
Replay Runs the vectors on the RTL using the same configuration list.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-3
ID041213 Confidential
RTL Validation
2. Edit the configuration file, vectors.cfg, to include the correct configuration settings. See
vectors.cfg on page 5-5.
3. Edit dotcshrc to set the environment variables for the Cortex-A7 MPCore integration kit
correctly, and then source the file. See dotcshrc on page 5-6.
source dotcshrc
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-4
ID041213 Confidential
RTL Validation
5.4.1 vectors.cfg
Use the options in vectors.cfg to set the correct configuration. The options are described in:
• Generic configuration.
• Replay configuration on page 5-6.
Generic configuration
Table 5-1 shows the options used by the capture stage. The replay stage uses these options when
MODEL is set to RTL. When MODEL is set to NETLIST, make sure the configuration variables match
those used for the netlist.
Option Description
ETM_PRESENT Include CTI and ETM for each processor in the multiprocessor device.a
NUM_SPIS Number of Interrupts. Distributor Interrupt Lines 0 <= NUM_INTS <= 480 in steps of 32.
L1_ICACHE_SIZE Selects I-Cache size for any processor in the multiprocessor device.
L1_DCACHE_SIZE Selects D-Cache size for any processor in the multiprocessor device.
a. The Cortex-A7 MPCore processor vector capture and replay facility does not test any ETM functionality. See the
CoreSight SOC User Guide for more information.
For more information on each option except TESTS, see the Cortex-A7 MPCore Technical
Reference Manual. For more information on the tests in the TESTS variable, see the Cortex-A7
MPCore Integration Manual.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-5
ID041213 Confidential
RTL Validation
Replay configuration
MODEL "RTL" Specifies replay. You must set this to RTL for RTL validation.
STD_CELLSa "<add Standard cells path>" Specifies Standard cell library location.
DUMPVCD_START “<Start time>” Specifies the VCD file dump start time during replay. If DUMPVCD is
set to TRUE, set this variable to the cycle you want the VCD file to
start to populate during replay.
DUMPVCD_CYCLES “<Number of cycles>” Specifies the number of cycles the VCD file is populated for. If
DUMPVCD is set to TRUE, set this variable to the number of cycles the
VCD file is populated for.
5.4.2 dotcshrc
The dotcshrc file is primarily used during the capture stage. You must source this file as
described in Controls and constraints for RTL validation on page 5-4 before execution to set
various environment variables. Table 5-3 shows the options.
Option Description
IK_LOCATION Only used by the capture stage to set the correct location of the integration kit. It must be an absolute path.
IK_OS Used by the capture stage to set ARMBST libraries correctly. Only change this option if the OS of the machine used
to build the integration kit is different from the OS of the machine used to run the integration kit.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-6
ID041213 Confidential
RTL Validation
5.5.1 Capture
To capture the test vectors, use the Cortex-A7 MPCore integration kit.
The perl script ./tools/capture reads vectors.cfg. It runs all the checks necessary to ensure the
environment is correct. The script also edits the files in the integration kit to prepare the
environment for ikvalidate. After the environment is prepared, ikvalidate is run from the
integration kit directory.
All the commands available to ikvalidate are also available here. For more information on
ikvalidate, see the Cortex-A7 MPCore Integration Manual.
At the end of the ikvalidate run, it moves the crf files from the integration kit location to the
crf directory. It also reinstates the original files in the integration kit.
The Makefile enables you to run capture or replay separately, or both together.
For information on how to debug failures that result from running ikvalidate, see the Cortex-A7
MPCore Integration Manual.
5.5.2 Replay
The perl script ./tools/replay provided reads the vectors.cfg file to set all the necessary
variables. It replays all the crf set in the test list available from vectors.cfg.
The Makefile enables you to run capture or replay separately, or both together.
To add verbose information, you can edit the Makefile to include a -v option to the perl script,
for example:
perl -w ./tools/replay -v
5.5.3 Makefile
The Makefile enables you to run capture or replay separately, or both together.
To run capture:
make capture
To run replay:
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-7
ID041213 Confidential
RTL Validation
make replay
To remove working files and directories, generated crf files, and logs:
make cleanall
Note
Only use these debug options if the replay stage is not simulating correctly.
During the replay stage, various debug options are available in the Makefile.
By default, the replay stage creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run replay_novc:
make replay_novc
Note
A file called replay.vc must exist when this option is used.
By default, the replay stage compiles and simulates. If you only require a compile stage, you
can run compile instead:
make compile
By default, the compile option creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run compile_novc:
make compile_novc
Note
A file called replay.vc must exist when this option is used.
The validation testbench automatically detects your RTL configuration. Therefore, using the
testbench to validate your configuration ensures you have a safe database before you start your
implementation.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-8
ID041213 Confidential
RTL Validation
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-9
ID041213 Confidential
RTL Validation
Table 5-4 shows the classes of test vectors supported by the simulation environment.
ca7_dbg_functional Checks the basic functionality of integer operations on the multiprocessor device
ca7_vfp_functional Checks the basic functionality of floating point operations on the multiprocessor device
ca7_advsimd_functional Checks the basic functionality of NEON Advanced SIMD operations on the multiprocessor device
ca7_max_power Vector designed to draw maximum power from the multiprocessor device under test
ca7_power_indicative Runs the Dhrystone 2.1 open source benchmark program on a single processor in a multiprocessor
device
The tests are compatible with all valid configurations of the Cortex-A7 MPCore processor apart
from:
• ca7_vfp_functional which requires the processor to be configured with either the Floating
Point Unit or NEON Media Processing Engine.
When these tests are run on a processor without the appropriate functionality, the tests are
skipped although the test still reports a pass.
See the Cortex-A7 MPCore Integration Manual for a detailed description of these tests.
The testbench prints simulation progress information to the terminal display, with a summary at
the end of the test run, for example:
Test name | Results | Vectors | Comments
| | Applied | Errors |
ca7_dbg_functional | PASSED | 7301 | 0 |
ca7_vfp_functional | PASSED | 2168 | 0 |
ca7_advsimd_functional | PASSED | 2153 | 0 |
ca7_cross_trigger_functional | PASSED | 41586 | 0 |
ca7_max_power | PASSED | 4958 | 0 |
ca7_power_indicative | PASSED | 124248 | 0 |
Note
The number of vectors applied can vary according to the configuration of the Cortex-A7
MPCore processor.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-10
ID041213 Confidential
RTL Validation
All test vector files produced by the capture phase of the flow are compressed using the gzip
command line utility tool. The scripts provided automatically decompress these files during the
replay phase of the flow.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 5-11
ID041213 Confidential
Chapter 6
Floorplan Guidelines
This chapter describes the floorplan used as a starting point for your design. It contains the
following sections:
• About floorplanning on page 6-2.
• Resource requirements for floorplans on page 6-3.
• Controls and constraints for floorplans on page 6-4.
• Inputs for floorplans on page 6-5.
• Considerations for floorplans on page 6-6.
• Outputs from floorplans on page 6-8.
• Reference data for floorplans on page 6-9.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-1
ID041213 Confidential
Floorplan Guidelines
Inputs: Outputs:
Example floorplans Floorplanning Floorplan
Block placement guidelines Reports and logs
Resources:
Floorplanning tool
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-2
ID041213 Confidential
Floorplan Guidelines
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-3
ID041213 Confidential
Floorplan Guidelines
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-4
ID041213 Confidential
Floorplan Guidelines
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-5
ID041213 Confidential
Floorplan Guidelines
Processor
Data
Tag Data
Dirty Instruction L1TLB
RAM TAG
RAM
RAM
Figure 6-3 on page 6-7 shows an example Cortex-A7 MPCore floorplan including two
processors in the multiprocessor device.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-6
ID041213 Confidential
Floorplan Guidelines
Processor 0
SCU and
L2 tag SCU
L2 data RAM processor Pins
RAM tag
integration layer (PIL)
Processor 1
Note
• The aspect ratios shown for memories are arbitrary and you might not be able to achieve
these using your memory compiler.
• You might have to adjust the floorplan if your technology prevents you from routing over
the RAM blocks.
• Clock modules must have placement constraints that prevent these modules being spread
around the design.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-7
ID041213 Confidential
Floorplan Guidelines
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-8
ID041213 Confidential
Floorplan Guidelines
Floorplanning does not place the standard cells. ARM expects the synthesis tool to place the
standard cells using a flat, physically unconstrained, placement methodology.
The SCU communicates with the L2 cache, so ARM recommends you place these modules
close together. See Figure 6-3 on page 6-7.
You must place the data cache tag RAM and data cache data RAM close to the data cache logic.
In particular, data cache tag RAM must be close to the ca7dcu module.
ARM recommends you put some bounds on clock modules to avoid too much dispersion of the
clock module logic.
RAM inputs
Because the timing paths from these blocks are critical, these blocks must be placed as close as
possible to the standard cell area.
Although the address set-up and data out time on the tag and dirty RAM blocks is small
compared to the times for the main memories, the output is required earlier.
If necessary, you can place the dirty RAM blocks further from the standard cell area than the tag
RAM blocks because:
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-9
ID041213 Confidential
Floorplan Guidelines
• Data returning from dirty RAM blocks does not go through comparators, unlike the tag
RAM blocks.
The TLB blocks interface with the instruction cache side, data cache side and the Bus Interface
Unit (BIU). Place these as Figure 6-3 on page 6-7 shows. This keeps the associated logic close
to the BIU and does not effect other critical paths in the design.
Note
All other signals in the design are not timing-critical. You can place them at locations optimal
for the SoC floorplan.
You must place the clock, resets, and interrupts in the center of the floorplan. For details of the
clock, reset and interrupt signals see:
• Clock signals on page 1-4.
• Reset signals on page 1-5.
• Trace interfaces on page 1-9.
Ports relating to the ETM are not timing critical. However, the placement of the ports influences
the placement of the standard cells in the processor. See Trace interfaces on page 1-9.
Note
When placing the ETM ports, you must not compromise the location of the timing-critical ports.
Wide pins
For purposes of rotating the macrocell when the processor is included as a black box in a design,
ARM recommends that you either:
• Make the pins wider, so the router can drop vias on top of them or next to them.
• Create multi-layer pins.
ARM recommends that you incorporate the power grid in the floorplan passed to your synthesis
tool so that the tool has a more accurate representation of the available routing resource.
You must design the grid to meet the requirements of your library. However, ARM recommends
a grid that satisfies an IR drop of 2%, VDD and VSS combined.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-10
ID041213 Confidential
Floorplan Guidelines
Power-gating a design creates additional requirements for the power grid and standard cell
placement. A power-gated floorplan has physical regions for each of the power domains. The
floorplanning implementation stage inserts power switch cells to supply the power rails of each
power-gated domain. Where the implementation reference methodology includes power-gating,
a UPF file and consistent floorplanning scripts are provided. Any change in the specification for
power intent requires you to update the floorplan and power grid.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 6-11
ID041213 Confidential
Chapter 7
Design for Test
This chapter describes production testing for the processor. It contains the following sections:
• About design for test features on page 7-2.
• Reference data for DFT on page 7-3.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 7-1
ID041213 Confidential
Design for Test
Use your standard implementation reference methodology to generate test patterns. See the
supplied implementation reference methodology documents for more information.
See the documentation from your EDA tool vendor for information about:
• The requirements for DFT.
• The DFT controls and constraints.
• The DFT features.
• Confirmation of DFT feature operation, and test coverage.
• Solving DFT feature problems.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 7-2
ID041213 Confidential
Design for Test
Table 7-1 shows the scan test ports that access the internal scan chains in the macrocell.
DFTSE Input Enables Scan chains. High fan out of this signal means this must be a false path, and cannot be
switched at-speed. This signal must be tied LOW during functional mode.
DFTRSTDISABLE Input Enables test tools that might not understand a pipelined reset to bypass reset repeaters.
This makes the internal resets single cycle signals, although only at low frequencies.
The DFTRSTDISABLE signal blocks internally generated resets when set to 1. When set to 0
this signal enables generated resets to propagate. During test, this signal only has to be blocked
during scan shift. This prevents resets propagating while scan chains are shifted, and permits
ATPG to test the related logic when not shifting.
DFTRAMHOLD Input Enables the RAMs to hold data by disabling the chip select to the RAMs when the signal is
asserted. This permits:
• RAMs to maintain values during tests like IddQ.
• Testing shadow logic of the RAMs by testing through the RAMs.
ATPG tools prefer the data in the RAMs to be static during shift. In this scenario,
DFTRAMHOLD must be enabled during shift and disabled during capture.
Figure 7-1 shows how to use DFTRAMHOLD to disable chip selects to the RAMs.
DFTRAMHOLD
RAM chip select
Functional chip select
Note
See the supplied implementation reference methodology documentation for details of the scan
ports.
The synthesis tools supplied by your EDA tool vendor might enable you to insert a test wrapper
that you can use to gain access to the inputs and outputs of the processor in production test. If
your tools support test wrapper insertion you can choose whether or not to implement this
wrapper.
A test wrapper gives increased test coverage when access to the primary inputs and outputs is
impossible because the macrocell is deeply embedded in your SoC design.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 7-3
ID041213 Confidential
Design for Test
Note
See the supplied implementation reference methodology documentation for details of the test
wrapper ports.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 7-4
ID041213 Confidential
Chapter 8
Dynamic Verification
This chapter describes the dynamic verification process. It contains the following sections:
• About dynamic verification on page 8-2.
• Resource requirements for dynamic verification on page 8-3.
• Controls and constraints for dynamic verification on page 8-4.
• Inputs for dynamic verification on page 8-5.
• Flow for dynamic verification on page 8-7.
• Outputs from dynamic verification on page 8-10.
• Confirmation of dynamic verification on page 8-11.
• Measuring power consumption on page 8-12.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-1
ID041213 Confidential
Dynamic Verification
Note
Typically, your contract with ARM requires you to use vector replay and a Logical Equivalence
Checking (LEC) tool to validate your implementation, see Chapter 9 Sign-off. Equivalence
checking tools use formal mathematical techniques to verify logic functions between two
implementations of a design. This shows whether the design functionality is consistent between
the two implementations. You must maintain the functionality of the configured macrocell at
each stage of the design process. You can use equivalence checkers to verify that the RTL
functionality is maintained through successive iterations of the netlist, by a process of building,
mapping, and comparing the design. See the reference methodology documents supplied by
your EDA tool vendor for details of the LEC tools.
Verification of the netlist by vector replay requires running CRF test vectors on your netlist
captured from an RTL reference. This provides a quick method of checking the netlist, and can
be used in addition to formal equivalence checking. The supplied vectors are captured using the
RTL as a reference, so replaying the vectors checks that your netlist matches the cycle-by-cycle
of the RTL reference. Dynamic verification of the netlist uses the same vectors, flow, and tools
as the RTL validation process.
Figure 8-1 shows the top-level inputs, resources, outputs, and controls and constraints for
dynamic verification of your netlist.
Dynamic
Inputs: Outputs:
Verification
Netlist Reports and logs
of netlist
Resources:
CRF test vectors
HDL Simulator
Scripts
Testbench
Gate-level library
The test vectors do not completely cover the functionality of the processor. Therefore, dynamic
verification of the netlist is not an adequate check for design sign-off.
Note
• In addition to passing dynamic verification, your netlist must pass:
— LEC.
— Timing verification.
See the reference methodology documents supplied by your EDA tool vendor for
information about these processes.
• Successful validation of your RTL is a sign-off requirement, see Chapter 9 Sign-off.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-2
ID041213 Confidential
Dynamic Verification
Capture Creates test vectors in accordance with the settings displayed in the configuration
list using the RTL as a reference.
Replay Runs the vectors on the netlist using the same configuration list.
Note
If you have run the the capture stage as part of the RTL validation described in Chapter 5 RTL
Validation using the same configuration as the one used for the netlist under test, you can ignore
the capture stage.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-3
ID041213 Confidential
Dynamic Verification
2. Edit the configuration file, vectors.cfg, to correct the configuration settings. The
configuration settings must be the same as those used to generate the netlist under test.
See vectors.cfg on page 8-5.
3. Edit dotcshrc to set the environment variables for the integration kit correctly, and then
source the file. See dotcshrc on page 8-6.
source dotcshrc
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-4
ID041213 Confidential
Dynamic Verification
8.4.1 vectors.cfg
Use the options in vectors.cfg to set the correct configuration. The options are described in:
• Generic configuration.
• Replay configuration on page 8-6.
Generic configuration
Table 8-1 shows the options used by capture. Replay uses these options when MODEL is set to RTL.
When MODEL is set to NETLIST, make sure the configuration variables match those used for the
netlist.
Option Description
ETM_PRESENT Include CTI and ETM for each processor in the multiprocessor device.a
NUM_SPIS Number of Interrupts. Distributor Interrupt Lines 0 <= NUM_INTS <= 480 in steps of 32.
L1_ICACHE_SIZE Selects L1 instruction cache size for any processor in the multiprocessor device.
L1_DCACHE_SIZE Selects L1 data cache size for any processor in the multiprocessor device.
a. The Cortex-A7 MPCore processor vector capture and replay facility does not test any ETM functionality. See
the CoreSight SOC User Guide for more information.
For more information on each option except TESTS, see the Cortex-A7 MPCore Technical
Reference Manual. For more information on the tests in the TESTS variable, see the Cortex-A7
MPCore Integration Manual.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-5
ID041213 Confidential
Dynamic Verification
Replay configuration
MODEL "RTL" Specifies replay. You must set this to NETLIST for dynamic
verification.
STD_CELLS "<add Standard cells path>" Specifies Standard cell library location.
DUMPVCD_START “<Start time>” Specifies the VCD file dump start time during replay. If DUMPVCD is
set to TRUE, set this variable to the cycle you want the VCD file to
start to populate during replay.
DUMPVCD_CYCLES “<Number of cycles>” Specifies the number of cycles the VCD file is populated for. If
DUMPVCD is set to TRUE, set this variable to the number of cycles the
VCD file is populated for.
8.4.2 dotcshrc
The dotcshrc file is primarily used during the capture stage. You must source this file as
described in Controls and constraints for dynamic verification on page 8-4 before execution to
set various environment variables. Table 8-3 shows the options.
Option Description
IK_LOCATION Only used by the capture stage to set the correct location of the integration kit. It must be an absolute path.
IK_OS Used by the capture stage to set ARMBST libraries correctly. Only change this option if the OS of the machine used
to build the integration kit is different from the OS of the machine used to run the integration kit.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-6
ID041213 Confidential
Dynamic Verification
8.5.1 Capture
Note
You only have to run the capture stage if you have not run RTL validation as described in
Chapter 5 RTL Validation, or validation has been done in a different configuration.
If you have run RTL validation and the configuration has not changed, you can replay the
available crf files directly.
To capture the test vectors, use the Cortex-A7 MPCore integration kit.
The perl script ./tools/capture reads vectors.cfg. It runs all the checks necessary to ensure the
environment is correct. The script also edits the files in the integration kit to prepare the
environment for ikvalidate. After the environment is prepared, ikvalidate is run from the
integration kit directory.
All the commands available to ikvalidate are also available here. For more information on
ikvalidate, see the Cortex-A7 MPCore Integration Manual.
At the end of the ikvalidate run, it moves the crf files from the integration kit location to the
crf directory. It also reinstates the original files in the integration kit.
The Makefile enables you to run capture or replay separately, or both together.
8.5.2 Replay
The perl script ./tools/replay provided reads the vectors.cfg file to set all the necessary
variables. It replays all the crf set in the test list available from vectors.cfg.
The replay testbench contains a define ARM_NETLIST which is used to replay a netlist. The define
is automatically set in the replay.vc file when the MODEL variable is set to NETLIST. As a default,
16 scan chain pins are defined when you instantiate the unit under test (uut). If you have a netlist
with a different number of scan chains, you can edit the netlist as required. The define is also
used to recognize an RTL simulation, and adds the correct set of parameters in the uut.
The Makefile enables you to run capture or replay separately, or both together.
To add verbose information, you can edit the Makefile to include a -v option to the perl script,
for example:
perl -w ./tools/replay -v
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-7
ID041213 Confidential
Dynamic Verification
Note
If you get x-propagation problems when simulating your netlist you might have to:
• Use simulator commands to initialize all sequential elements in your netlist to a non-x
value.
8.5.3 Makefile
The Makefile enables you to run capture or replay separately, or both together.
To run capture:
make capture
To run replay:
make replay
To remove working files and directories, generated crf files, and logs:
make cleanall
Note
Only use these debug options if the replay stage is not simulating correctly.
During the replay stage, various debug options are available in the Makefile.
By default, the replay stage creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run replay_novc:
make replay_novc
Note
A file called replay.vc must exist when this option is used.
By default, the replay stage compiles and simulates. If you only require a compile stage, you
can run compile instead:
make compile
By default, the compile option creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run compile_novc:
make compile_novc
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-8
ID041213 Confidential
Dynamic Verification
Note
A file called replay.vc must exist when this option is used.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-9
ID041213 Confidential
Dynamic Verification
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-10
ID041213 Confidential
Dynamic Verification
The testbench prints simulation progress information to the terminal display, with a summary at
the end of the test run, for example:
Test name | Results | Vectors | Comments
| | Applied | Errors |
ca7_dbg_functional | PASSED | 7301 | 0 |
ca7_vfp_functional | PASSED | 2168 | 0 |
ca7_advsimd_functional | PASSED | 2153 | 0 |
ca7_cross_trigger_functional | PASSED | 41586 | 0 |
ca7_max_power | PASSED | 4958 | 0 |
ca7_power_indicative | PASSED | 124248 | 0 |
Note
The number of vectors applied can vary according to the configuration of the Cortex-A7
MPCore processor.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-11
ID041213 Confidential
Dynamic Verification
8.8.1 ca7_power_indicative.s
To measure the power consumption you have to identify the repeated Dhrystone loops:
2. Use the make capture command to execute the power_indicative test. This generates the
logical/cortexa7_intkit/validation/logs/ca7_power_indicative/tarmac_cluster0_cpu0.
log file.
3. In the tarmac_cluster0_cpu0.log file, search the tarmac log file for 0xc9 to find an
instruction of the form MOV r0,#0xc9.
4. After the MOV r0,#0xc9 instruction, search for the next BL instruction, for example
BL {pc}-0xc4. This branch indicates the start and endpoint for each Dhrystone loop.
Note
ARM recommends you use the fourth iteration for the measurement. Do not use the first
one or two loop iterations for the measurements, since the caches are still loading.
5. Place the starting timestamp and loop length in the vectors.cfg file. Use the parameters:
DUMPVCD = "TRUE" ;
DUMPVCD_START = "" ; This time is in nanoseconds
DUMPVCD_CYCLES = "" ; This is the number of cycles for the power loop
6. For the power measurement, you can use vector replay to generate a .vcd file.
The current drawn during the power_indicative vector is constant with any number of
processors because only one processor is active.
8.8.2 ca7_max_power.s
To measure the power consumption you have to identify the repeated max_power loops:
2. Use the make capture command to execute the max_power test. This generates the
logical/cortexa7_intkit/validation/logs/ca7_max_power/tarmac_cluster0_cpu0.log
file. Log files are also generated for other processors in the multiprocessor device.
3. In the tarmac_cluster0_cpu0.log file, search for the SUBS r10, r10, #1 instruction. The
loop length should be 18 cycles after the fifth iteration.
4. Place the starting timestamp and loop length in the vectors.cfg file. Use the parameters:
DUMPVCD = "TRUE" ;
DUMPVCD_START = "" ; This time is in nanoseconds
DUMPVCD_CYCLES = "" ; This is the number of cycles for the power loop
5. For the power measurement, you can use vector replay to generate a .vcd file
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-12
ID041213 Confidential
Dynamic Verification
The expected relative value for max_power consumption is 1.9x power_indicative figure when
there is a single processor.
The current drawn during the max_power vector scales up with an increase in the number of
processors.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 8-13
ID041213 Confidential
Chapter 9
Sign-off
In addition to your normal SoC flow sign-off checks, you must satisfy additional verification
criteria before you sign off the macrocell design. This chapter describes the sign-off criteria. It
contains the following sections:
• About sign-off on page 9-2.
• Obligations for sign-off on page 9-3.
• Requirements for sign-off on page 9-4.
• Steps for sign-off on page 9-5.
• Completion of sign-off on page 9-6.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 9-1
ID041213 Confidential
Sign-off
Inputs:
Validation reports and logs
Outputs:
Logical Equivalence Check reports and logs Sign-off
Signed off macrocell
Timing Verification reports and logs
Dynamic Verification reports and logs
Resources:
Signatories
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 9-2
ID041213 Confidential
Sign-off
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 9-3
ID041213 Confidential
Sign-off
You must complete the following implementation stages successfully for sign-off:
• Timing verification, by Static Timing Analysis (STA) of the post-layout netlist. See the
supplied implementation reference methodology documents.
Reports and logs from each of these stages are required for sign-off.
A certain minimum set of deliverable outputs is required at the end of the implementation. See
Completion of sign-off on page 9-6.
All ARM partners must fulfill the terms of their contract with ARM to complete sign-off.
Note
You can change the timing constraints to suit your design provided it still meets all the
mandatory requirements for sign-off.
• Design Rule Check (DRC). See the documents supplied by your EDA tool vendor.
• Layout Versus Schematic (LVS). See the documents supplied by your EDA tool vendor.
• Back-annotated netlist simulation of the ATPG vectors. See the supplied implementation
reference methodology documents.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 9-4
ID041213 Confidential
Sign-off
You must run the supplied test vectors on the configured RTL to verify the Cortex-A7 MPCore
RTL deliverables before you begin the synthesis stage. See Chapter 5 RTL Validation. This
confirms that you have successfully installed the Cortex-A7 MPCore processor RTL.
9.4.2 Pre-layout
You must verify the functionality of the compiled netlist before you sign off the macrocell. This
verification consists of Proving logical equivalence between the validated RTL and the
compiled netlist using formal verification tools. See the supplied implementation reference
methodology documents.
9.4.3 Post-layout
You must verify the functionality of the final placed-and-routed netlist before you sign off the
macrocell. This verification consists of proving logical equivalence between the validated RTL
and the final place-and-routed netlist using formal verification tools. See the supplied
implementation reference methodology documents.
Optionally, you can also run vector capture and replay on the compiled netlist. For more
information, see:
• Capture on page 8-7.
• Replay on page 8-7.
You must use Static Timing Analysis (STA) to verify the timing of the final place-and-routed
netlist before you sign off your netlist. You must also run some or all of the supplied test vectors,
and run the supplied validation tests on a netlist with back-annotated timing as a final check.
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 9-5
ID041213 Confidential
Sign-off
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 9-6
ID041213 Confidential
Appendix A
Revisions
This appendix describes the technical changes between released issues of this book.
Updated the use of the nVFIQ[3:0] and nVIRQ[3:0] interrupt signals Table 1-11 on page 1-11 All
Clarified multicycle setup path cycles that relate to the L2 data RAM read when the L2 cache Table 1-18 on page 1-15 All
is present
Updated instruction cache tag RAMs address connection information Table 4-11 on page 4-12 All
No changes - -
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. A-1
ID041213 Confidential
Revisions
Added instructions on how to measure power consumption. Measuring power consumption on page 8-12 All
Clarified that you use EDA tool vendor documentation for Recommended for sign-off on page 9-4 All
the DRC and LVS recommended sign-off stages
Clarified the clock signals description Clock signals on page 1-4 All
Clarified the note about which configuration file you use to specify the L1 Configuring the L1 cache sizes on page 4-9 All
cache sizes
Added a note that states SCU duplicate tag RAMs must always be SCU duplicate tag RAMs on page 4-16 All
instantiated
Added CTI functional test The test vector classes on page 5-10 r0p5 onwards
Removed inverter on the output of the AND gate Figure 7-1 on page 7-3 All
Updated the test vector summary report to include Confirmation of dynamic verification r0p5 onwards
ca7_cross_trigger_functional on page 8-11
ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. A-2
ID041213 Confidential