0% found this document useful (0 votes)

346 views116 pages

Cortex-A7 Configuration and Signoff Guide

Cortex-A7_Configuration_and_Signoff_Guide

Uploaded by

ubwen49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

346 views116 pages

Cortex-A7 Configuration and Signoff Guide

Cortex-A7_Configuration_and_Signoff_Guide

Uploaded by

ubwen49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 116

Cortex -A7 MPCore

™ ™

Revision: r0p5

Configuration and Sign-off Guide

Confidential

Copyright © 2011-2013 ARM. All rights reserved.

ARM DII 0256F (ID041213)
Cortex-A7 MPCore
Configuration and Sign-off Guide

Copyright © 2011-2013 ARM. All rights reserved.

Release Information

The following changes have been made to this book.

Change history

Date Issue Confidentiality Change

28 September 2011 A Confidential First release for r0p0

09 November 2011 B Confidential First release for r0p1

10 January 2012 C Confidential First release for r0p2

15 May 2012 D Confidential First release for r0p3

19 November 2012 E Confidential First release for r0p4

10 April 2013 F Confidential First release for r0p5

Proprietary Notice

Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM® in the EU and other countries,
except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the
trademarks of their respective owners.

Neither the whole nor any part of the information contained in, or the product described in, this document may be
adapted or reproduced in any material form except with the prior written permission of the copyright holder.

The product described in this document is subject to continuous developments and improvements. All particulars of the
product and its use contained in this document are given by ARM in good faith. However, all warranties implied or
expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded.

This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or
damage arising from the use of any information in this document, or any error or omission in such information, or any
incorrect use of the product.

Where the term ARM is used it means “ARM or any of its subsidiaries as appropriate”.

Confidentiality Status

This document is Confidential. This document may only be used and distributed in accordance with the terms of the
agreement entered into by ARM and the party that ARM delivered this document to.

Product Status

The information in this document is final, that is for a developed product.

Web Address

http://www.arm.com

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. ii

ID041213 Confidential
Contents
Cortex-A7 MPCore Configuration and Sign-off Guide

Preface
About this book ................................................................................................................ vi
Feedback .......................................................................................................................... x

Chapter 1 Introduction
1.1 About implementation ................................................................................................... 1-2
1.2 Implementation resources ............................................................................................ 1-3
1.3 Implementation controls and constraints ...................................................................... 1-4
1.4 Implementation inputs ................................................................................................. 1-16
1.5 Implementation flow .................................................................................................... 1-17
1.6 Implementation outputs .............................................................................................. 1-19
1.7 Implementation reference data ................................................................................... 1-20

Chapter 2 Key Implementation Points

2.1 About key implementation points .................................................................................. 2-2
2.2 Key implementation tasks ............................................................................................. 2-3

Chapter 3 Configuration Guidelines

3.1 About configuration guidelines ...................................................................................... 3-2
3.2 Configuration options .................................................................................................... 3-3

Chapter 4 Memory Integration

4.1 About memory integration ............................................................................................. 4-2
4.2 Resource requirements for memory integration ........................................................... 4-4
4.3 Controls and constraints for memory integration .......................................................... 4-5
4.4 Blocks for memory integration ...................................................................................... 4-7
4.5 Flow for memory integration ....................................................................................... 4-20
4.6 Confirmation of memory integration ............................................................................ 4-22

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. iii
ID041213 Confidential
Contents

4.7 Outputs from memory integration ............................................................................... 4-26

Chapter 5 RTL Validation

5.1 About RTL validation .................................................................................................... 5-2
5.2 Resource requirements for RTL validation ................................................................... 5-3
5.3 Controls and constraints for RTL validation .................................................................. 5-4
5.4 Inputs for RTL validation ............................................................................................... 5-5
5.5 Flow for RTL validation ................................................................................................. 5-7
5.6 Outputs from RTL validation ......................................................................................... 5-9
5.7 Reference data for RTL validation .............................................................................. 5-10

Chapter 6 Floorplan Guidelines

6.1 About floorplanning ....................................................................................................... 6-2
6.2 Resource requirements for floorplans ........................................................................... 6-3
6.3 Controls and constraints for floorplans ......................................................................... 6-4
6.4 Inputs for floorplans ...................................................................................................... 6-5
6.5 Considerations for floorplans ........................................................................................ 6-6
6.6 Outputs from floorplans ................................................................................................ 6-8
6.7 Reference data for floorplans ....................................................................................... 6-9

Chapter 7 Design for Test

7.1 About design for test features ....................................................................................... 7-2
7.2 Reference data for DFT ................................................................................................ 7-3

Chapter 8 Dynamic Verification

8.1 About dynamic verification ............................................................................................ 8-2
8.2 Resource requirements for dynamic verification ........................................................... 8-3
8.3 Controls and constraints for dynamic verification ......................................................... 8-4
8.4 Inputs for dynamic verification ...................................................................................... 8-5
8.5 Flow for dynamic verification ........................................................................................ 8-7
8.6 Outputs from dynamic verification .............................................................................. 8-10
8.7 Confirmation of dynamic verification ........................................................................... 8-11
8.8 Measuring power consumption ................................................................................... 8-12

Chapter 9 Sign-off
9.1 About sign-off ................................................................................................................ 9-2
9.2 Obligations for sign-off .................................................................................................. 9-3
9.3 Requirements for sign-off ............................................................................................. 9-4
9.4 Steps for sign-off ........................................................................................................... 9-5
9.5 Completion of sign-off ................................................................................................... 9-6

Appendix A Revisions

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. iv

ID041213 Confidential
Preface

This preface introduces the Cortex-A7 MPCore Configuration and Sign-off Guide. It contains
the following sections:
• About this book on page vi.
• Feedback on page x.

Note
Throughout this document, implementation_<technology> indicates a directory name that
indicates the process you are using to implement the Cortex-A7 MPCore processor. For
example, implementation_<technology> might be implementation_tsmc_cln32lp.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. v

ID041213 Confidential
Preface

About this book

This book is for the Cortex-A7 MPCore processor. The Cortex-A7 MPCore processor is a
multiprocessor device. It describes the configuration options that designers must choose before
starting implementation or simulation of the CORTEXA7INTEGRATION layer. It also provides
information for those who want to implement the CORTEXA7 layer only.

Implementation obligations

This book is designed to help you implement an ARM product. The extent to which the
deliverables can be modified or disclosed is governed by the contract between ARM and the
Licensee. There might be validation requirements which, if applicable, are detailed in the
contract between ARM and the Licensee and which, if present, must be complied with prior to
the distribution of any devices incorporating the technology described in this document.
Reproduction of this document is only permitted in accordance with the licenses granted to the
Licensee.

ARM assumes no liability for your overall system design and performance. Verification
procedures defined by ARM are only intended to verify the correct implementation of the
technology licensed by ARM, and are not intended to test the functionality or performance of
the overall system. You or the Licensee are responsible for performing system level tests.

You are responsible for applications that are used in conjunction with the ARM technology
described in this document, and to minimize risks, adequate design and operating safeguards
must be provided for by you. Publishing information by ARM in this book of information
regarding third party products or services is not an express or implied approval or endorsement
of the use thereof.

Product revision status

The rnpn identifier indicates the revision status of the product described in this book, where:
rn Identifies the major revision of the product.
pn Identifies the minor revision or modification status of the product.

Intended audience

This manual is written for experienced hardware engineers who might or might not have
experience of ARM products, but who have experience of writing Verilog and of performing
synthesis, and who want to implement a Cortex-A7 MPCore processor in an System-on-Chip
(SoC) design.

Using this book

This book is organized into the following chapters:

Chapter 1 Introduction
Read this for a description of the Cortex-A7 MPCore processor design platforms
and tools, including the supported design flow, directory structure, and design
hierarchy.

Chapter 2 Key Implementation Points

Read this for an outline of the Cortex-A7 MPCore processor implementation.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. vi

ID041213 Confidential
Preface

Chapter 3 Configuration Guidelines

Read this for a description of the configuration options available for your
Cortex-A7 MPCore processor, how to configure them, and how to check the
results.

Chapter 4 Memory Integration

Read this for a description of memory integration, associated scripts, and how to
check the results.

Chapter 5 RTL Validation

Read this for a description of how to check your RTL configuration.

Chapter 6 Floorplan Guidelines

Read this for a description of the layout recommendations when floorplanning the
Cortex-A7 MPCore processor.

Chapter 7 Design for Test

Read this for a description of the Design for Test (DFT) features of the Cortex-A7
MPCore processor.

Chapter 8 Dynamic Verification

Read this for a description of functional verification of your netlist.

Chapter 9 Sign-off
Read this for a description of the ARM verification criteria, and how to sign off
your design.

Appendix A Revisions
Read this for a description of technical changes in this document.

Glossary

The ARM glossary is a list of terms used in ARM documentation, together with definitions for
those terms. The ARM glossary does not contain terms that are industry standard unless the
ARM meaning differs from the generally accepted meaning.

See ARM Glossary, http://infocenter.arm.com/help/topic/com.arm.doc.aeg0014-/index.html.

Conventions

Conventions that this book can use are described in:

• Typographical conventions on page viii.
• Signals on page viii.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. vii
ID041213 Confidential
Preface

Typographical conventions

The following table describes the typographical conventions:

Style Purpose

italic Introduces special terminology, denotes cross-references, and citations.

bold Highlights interface elements, such as menu names. Denotes signal names. Also used for terms in descriptive
lists, where appropriate.

monospace Denotes text that you can enter at the keyboard, such as commands, file and program names, and source code.

monospace Denotes a permitted abbreviation for a command or option. You can enter the underlined text instead of the full
command or option name.

monospace italic Denotes arguments to monospace text where the argument is to be replaced by a specific value.

monospace bold Denotes language keywords when used outside example code.

<and> Encloses replaceable terms for assembler syntax where they appear in code or code fragments. For example:
MRC p15, 0 <Rd>, <CRn>, <CRm>, <Opcode_2>

SMALL CAPITALS Used in body text for a few terms that have specific technical meanings, that are defined in the ARM glossary.
For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and UNPREDICTABLE.

Signals

The signal conventions are:

Signal level The level of an asserted signal depends on whether the signal is
active-HIGH or active-LOW. Asserted means:
• HIGH for active-HIGH signals.
• LOW for active-LOW signals.

Lower-case n At the start or end of a signal name denotes an active-LOW signal.

Additional reading

This section lists publications by ARM and by third parties.

See Infocenter, http://infocenter.arm.com, for access to ARM documentation.

ARM publications

This book contains information that is specific to this product. See the following documents for
other relevant information:

• Cortex-A7 MPCore Technical Reference Manual (ARM DDI 0464).

• Cortex-A7 Floating-Point Unit Technical Reference Manual (ARM DDI 0449).

• Cortex-A7 NEON™ Media Processing Engine Technical Reference Manual

(ARM DDI 0450).

• Cortex-A7 MPCore Integration Manual (ARM DIT 0015).

• ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition

(ARM DDI 0406).

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. viii
ID041213 Confidential
Preface

• CoreSight ETM-A7 Technical Reference Manual (ARM DDI 0435).

• CoreSight Embedded Trace Macrocell™ v3.5 Architecture Specification (ARM IHI 0014).

• AMBA AXI™ and ACE® Protocol Specification, AXI3™, AXI4™, and AXI4-Lite™, ACE and
ACE-Lite™ (ARM IHI 0022).

• ARM Generic Interrupt Controller Architecture Specification (ARM IHI 0048).

• RealView ICE User Guide (ARM DUI 0155).

• CoreSight Architecture Specification (ARM IHI 0029).

• CoreSight Technology System Design Guide (ARM DGI 0012).

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. ix

ID041213 Confidential
Preface

Feedback
ARM welcomes feedback on this product and its documentation.

Feedback on this product

If you have any comments or suggestions about this product, contact your supplier and give:

• The product name.

• The product revision or version.

• An explanation with as much information as you can provide. Include symptoms and
diagnostic procedures if appropriate.

Feedback on content

If you have comments on content then send an e-mail to errata@arm.com. Give:

• The title.
• The number, ARM DII 0256F.
• The page numbers to which your comments apply.
• A concise explanation of your comments.

ARM also welcomes general suggestions for additions and improvements.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. x

ID041213 Confidential
Chapter 1
Introduction

This chapter introduces the supported design flow and structure of the deliverables for the
Cortex-A7 MPCore processor. It contains the following sections:
• About implementation on page 1-2.
• Implementation resources on page 1-3.
• Implementation controls and constraints on page 1-4.
• Implementation inputs on page 1-16.
• Implementation flow on page 1-17.
• Implementation outputs on page 1-19.
• Implementation reference data on page 1-20.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-1
ID041213 Confidential
Introduction

1.1 About implementation

Figure 1-1 shows the top-level inputs, resources, outputs, and controls and constraints for
implementation.

Controls and constraints:

Contractual requirements
Memory size
Process technology
Performance requirements
Power requirements
Area requirements
Test requirements
EDA Model requirements

Outputs:
Inputs:
Verified design, GDSII
RTL Implementation
Models
Models
Reports and logs

Resources:
EDA tools
Testbenches
Test vectors
Scripts
Documentation

Figure 1-1 Implementation process

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-2
ID041213 Confidential
Introduction

1.2 Implementation resources

This guide assumes that you have suitable EDA tools and compute resources for
implementation. See the Cortex-A7 MPCore Release Note for a list of deliverables and any
specific tool revisions necessary for implementation.

Table 1-1 shows the tools used to develop the macrocell.

Table 1-1 Development tools

<simulator>a
Purpose Vendor Tool
tool specifier

HDL simulator Synopsys VCS VCS

Cadence NC-Verilog IUS

Mentor Graphics ModelSim MTI

Synthesis Synopsys Design Compiler -

IC Compiler
Power Compiler

Place and route Synopsys IC Compiler -

ATPG Synopsys TetraMAX -

Equivalence checking Synopsys Formality -

Cadence Conformal LEC -

STA Synopsys PrimeTime-SI -

Software development ARM ARM RealView Development Suite (RVDS) armasm, -

armlink, fromelf

a. <simulator> is used as part of a command to run your chosen simulator throughout this guide.

Note
• The Cortex-A7 MPCore Release Note describes any special requirements that might affect
the flow, such as details of any special tool requirements that enable optional flows within
the implementation.

• See the supplied implementation reference methodology documentation for details of

other tools used. For EDA tool support, contact your EDA vendor.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-3
ID041213 Confidential
Introduction

1.3 Implementation controls and constraints

This section provides information about the controls and constraints of your processor
implementation. Chapter 3 Configuration Guidelines describes other features which require
configuration of the build scripts.

This section describes:

• Signal timing constraints.
• Clocking scheme on page 1-14.
• Multi-cycle constraints on page 1-15.

1.3.1 Signal timing constraints

This section describes the processor signals you must connect and the timing constraints on the
signals. It contains the following sections:
• Clock signals.
• Reset signals on page 1-5.
• AMBA4 master interface on page 1-5.
• Debug interfaces on page 1-7.
• Trace interfaces on page 1-9.
• Interrupts on page 1-11.
• Scan test and MBIST signals on page 1-12.
• Standby signals on page 1-12.
• Performance monitoring signals on page 1-13.
• Configuration pins on page 1-13.

The timing constraints for signals are classified according to the percentage of the clock period
that is available for external logic:
• For inputs this is the delay between the last register and the input port.
• For outputs this is the delay between the output port and the first register.

Note
Actual clock frequencies and input and output timing constraints vary according to application
requirements and the silicon process technologies used. The maximum operating clock
frequencies change according to the constraints and the process technology you use.

Clock signals

The Cortex-A7 MPCore processor includes a system clock, CLKIN, which drives the logic in
the processor. Table 1-2 shows the clock signals.

Table 1-2 Clock signals

Name Type Timing constraint

CLKIN Input -

ACLKENM Input 40

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-4
ID041213 Confidential
Introduction

Reset signals

Table 1-3 shows the reset signals.

Table 1-3 Reset signals

Name Type Timing constraint

nCORERESET[3:0] Input 40

nDBGRESET[3:0] Input 40

nL2RESET Input 40

nMBISTRESET Input 40

L1RSTDISABLE[3:0] Input 40

L2RSTDISABLE Input 40

nSOCDBGRESETa Input 40

nETMRESET[3:0]a Input 40

a. Not required if you are implementing at the CORTEXA7

level only.

AMBA4 master interface

The following sections describe the AMBA4 master interface signals:

• ACE master interface signals.
• ACE configuration signals on page 1-7

ACE master interface signals

Table 1-4 shows the ACE master interface signals.

Table 1-4 ACE master interface signals

Name Type Timing constraint

ARREADYM Input 40

ARVALIDM Output 50

ARADDRM[39:0] Output 60

ARLENM[7:0] Output 60

ARSIZEM[2:0] Output 60

ARBURSTM[1:0] Output 60

ARLOCKM Output 60

ARCACHEM[3:0] Output 60

ARPROTM[2:0] Output 60

ARIDM[5:0] Output 60

ARSNOOPM[3:0] Output 60

ARDOMAINM[1:0] Output 60

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-5
ID041213 Confidential
Introduction

Table 1-4 ACE master interface signals (continued)

Name Type Timing constraint

ARBARM[1:0] Input 60

RVALIDM Input 50

RLASTM Input 60

RDATAM[127:0] Input 60

RRESPM[3:0] Input 60

RIDM[5:0] Input 60

RREADYM Input 60

AWREADYM Output 40

AWVALIDM Output 60

AWADDRM[39:0] Output 60

AWLENM[7:0] Output 60

AWSIZEM[2:0] Output 60

AWBURSTM[1:0] Output 60

AWLOCKM Output 60

AWCACHEM[3:0] Output 60

AWPROTM[2:0] Output 60

AWIDM[4:0] Output 60

AWSNOOPM[2:0] Output 60

AWDOMAINM[1:0] Output 60

AWBARM[1:0] Output 60

WREADYM Input 50

WVALIDM Output 60

WLASTM Output 60

WDATAM[127:0] Output 60

WSTRBM[15:0] Output 60

WIDM[4:0 Output 60

BVALIDM Input 50

BRESPM[1:0] Input 60

BIDM[4:0] Input 60

BREADYM Output 60

ACREADYM Output 60

ACVALIDM Input 40

ACADDRM[39:0] Input 60

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-6
ID041213 Confidential
Introduction

Table 1-4 ACE master interface signals (continued)

Name Type Timing constraint

ACPROTM[2:0] Input 60

ACSNOOPM[3:0] Input 60

CRREADYM Input 40

CRVALIDM Output 50

CRRESPM[4:0] Output 60

CDREADYM Input 40

CDVALIDM Output 50

CDDATAM[127:0] Output 60

CDLASTM Output 60

RACKM Output 60

WACKM Output 60

ACE configuration signals

Table 1-5 shows the ACE configuration signals

Table 1-5 ACE configuration signals

Name Type Timing constraint

BROADCASTINNER Input 40

BROADCASTOUTER Input 40

BROADCASTCACHEMAINT Input 40

SYSBARDISABLE Input 40

nAXIERRIRQ Output 60

Debug interfaces

The following sections describe the debug interfaces:

• Authentication interface on page 1-8.
• APB interface on page 1-8.
• Miscellaneous debug signals on page 1-8.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-7
ID041213 Confidential
Introduction

Authentication interface

Table 1-6 shows the authentication interface signals.

Table 1-6 Authentication interface signals

Name Type Timing constraint

DBGEN[3:0] Input 60

SPIDEN[3:0] Input 50

NIDEN[3:0] Input 50

SPNIDEN[3:0] Input 60

APB interface

Table 1-7 shows the APB interface signals.

Table 1-7 APB interface signals

Name Type Timing constraint

PCLKENDBG Input 40

PSELDBG Input 60

PADDRDBG[16:2]a Input 60

PADDRDBG31 Input 60

PWRITEDBG Input 60

PRDATADBG[31:0] Output 60

PWDATADBG[31:0] Input 60

PENABLEDBG Input 60

PREADYDBG Output 60

PSLVERRDBG Output 60

a. PADDRDBG[14:2] if you are implementing at the

CORTEXA7 level only.

Miscellaneous debug signals

Table 1-8 shows the miscellaneous debug signals..

Table 1-8 Miscellaneous debug signals

Name Type Timing constraint

COMMRX[3:0] Output 40

COMMTX[3:0] Output 40

DBGACK[3:0] Output 40

DBGNOPWRDWN[3:0] Output 40

DBGRESTART[3:0] Input 50

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-8
ID041213 Confidential
Introduction

Table 1-8 Miscellaneous debug signals (continued)

Name Type Timing constraint

DBGRESTARTED[3:0] Output 40

DBGROMADDR[39:12] Input 50

DBGROMADDRV Input 50

DBGSELFADDR[39:17]a Input 50

DBGSELFADDRV Input 50

DBGOSUNLOCKCATCH[3:0]b Input 50

DBGHALTREQ[3:0]b Input 50

DBGLOCKSET[3:0]b Input 50

DBGHOLDRST[3:0]b Input 50

DBGSWENABLE[3:0] Input 50

DBGTRIGGER[3:0] Output 40

EDBGRQ[3:0] Input 50

APBACTIVEb Output 50

DBGPWRUPREQ[3:0]c Output 40

DBGPNOPWRDWN[3:0]c Output 40

DBGPWRDUP[3:0]c Input 50

a. DBGSELFADDR[39:15] if you are implementing at the CORTEXA7

level only.
b. CORTEXA7 level only.
c. For implementing at the CORTEXA7INTEGRATION level only.

Trace interfaces

Your design implements a set of trace interface signals for each of the cores included in the
Cortex-A7 MPCore processor. This section describes both the CORTEXA7INTEGRATION level and
CORTEXA7 level trace interface signals.

Table 1-9 shows the CORTEXA7INTEGRATION level trace interface signals.

Table 1-9 CORTEXA7INTEGRATION level trace interface signals

Namea Type Timing constraint

ATCLKEN Input 60

ATIDMx[6:0] Output 60

AFREADYMx Output 60

AFVALIDMx Output 60

ATBYTESMx[2:0] Output 60

ATDATAMx[63:0] Output 60

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-9
ID041213 Confidential
Introduction

Table 1-9 CORTEXA7INTEGRATION level trace interface signals (continued)

Namea Type Timing constraint

ATREADYMx Output 60

ATVALIDMx Output 60

CTIASICCTLx[7:0] Output 60

CTICHINACK[3:0] Output 60

CTIEXTTRIG[3:0] Output 60

CTICHOUT[3:0] Output 60

ETMASICCTLx[7:0] Output 50

ETMEN[3:0] Output 50

ETMEXTOUTx[1:0] Output 50

ETMFIFOPEEKx[7:0] Output 60

ETMPWRUP[3:0] Output 50

ETMPWRUPREQ[3:0] Output 50

ETMSTANDBYWFX[3:0] Output 60

nCTIIRQ[3:0] Output 60

MAXEXTIN[2:0] Input 60

MAXEXTOUT[1:0] Input 60

PMUEVENTx[29:0] Output 30

CISBYPASS Input 50

CIHSBYPASS[3:0] Input 60

CTICHIN[3:0] Input 50

CTICHOUTACK[3:0] Input 60

CTIEXTTRIGACK[3:0] Input 60

TSCLKCHANGE Input 60

SYNCREQ Input 60

a. x in the signal name represents processor 0, 1, 2, or 3.

Table 1-11 on page 1-11 shows the CORTEXA7 level trace interface signals.

Table 1-10 CORTEXA7 level trace interface signals

Namea Type Timing constraint

ETMICTLx[19:0] Output 60

ETMIAx[31:0] Output 60

ETMDCTLx[10:0] Output 60

ETMDAx[31:0] Output 60

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-10
ID041213 Confidential
Introduction

Table 1-10 CORTEXA7 level trace interface signals (continued)

Namea Type Timing constraint

ETMDDx[63:0] Output 60

ETMCIDx[31:0] Output 60

ETMWFXPENDINGx Output 60

ETMPWRUPx Input 60

ETMEXTOUTx[1:0] Input 60

ETMVMIDx[7:0] Output 60

a. x in the signal name represents processor 0, 1, 2, or 3.

Interrupts

Table 1-11 shows the interrupt signals. There are nVFIQ[3:0], nVIRQ[3:0], nIRQ[3:0],
nFIQ[3:0], nIRQOUT[3:0], and nFIQOUT[3:0] signals for each processor in the
multiprocessor device.

Table 1-11 Interrupt signals

Name Type Timing constraint

nFIQ[3:0] Input 50

nIRQ[3:0] Input 50

nVFIQ[3:0] Input 50

nVIRQ[3:0] Input 50

IRQS[n:0]a Input 60

nFIQOUT[3:0]b Output 60

nIRQOUT[3:0]b Output 60

a. Where n is configurable 0-479 set by NUM_SPIS.

b. Not used if a GIC is not present.

Generic timer signals

Table 1-12 show the generic timer signals

Table 1-12 Generic timer signals

Signal Type Timing constraint

nCNTPNSIRQ[3:0] Output 40

nCNTPSIRQ[3:0] Output 40

nCNTVIRQ[3:0] Output 40

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-11
ID041213 Confidential
Introduction

Table 1-12 Generic timer signals (continued)

Signal Type Timing constraint

nCNTHPIRQ[3:0] Output 40

CNTVALUEB[63:0] Input 40

TSVALUEB[63:0]a Input 40

a. Not required if you are implementing at the CORTEXA7

level only.

Scan test and MBIST signals

Table 1-13 shows the scan test and MBIST signals.

Table 1-13 Scan test and MBIST signals

Signal Type Timing constraint

DFTRSTDISABLE Input 60

DFTSE Input 20

DFTRAMHOLD Input 60

MBISTACK Output 60

MBISTADDR[13:0] Input 50

MBISTARRAY[8:0] Input 50

MBISTBE[7:0] Input 60

MBISTCFG Input 50

MBISTINDATA[85:0] Input 50

MBISTOUTDATA[85:0] Output 50

MBISTREADEN Input 60

MBISTREQ Input 60

MBISTWRITEEN Input 60

Standby signals

Table 1-14 shows the standby signals.

Table 1-14 Standby signals

Name Type Timing constraint

STANDBYWFI[3:0] Output 40

STANDBYWFE[3:0] Output 40

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-12
ID041213 Confidential
Introduction

Table 1-14 Standby signals (continued)

Name Type Timing constraint

STANDBYWFEL2 Output

EVENTI Input 60

EVENTO Output 50

Performance monitoring signals

Table 1-15 shows the performance monitoring signals.

Table 1-15 Performance monitoring signals

Name Type Timing constraint

nPMUIRQ[3:0] Output 40

Configuration pins

The majority of the signals on the configuration pins are static. That is, their values are sampled
only on startup and on restarting after a reset. Two configuration pins , CP15SDISABLE[3:0]
and CFGSDISABLE, are dynamic. Any change on these pins takes effect immediately when
the processor is active. Table 1-16 shows the configuration pins.

Table 1-16 Configuration pins

Name Type Timing constraint

ACINACTM Input 60

CFGSDISABLEa Input 60

CFGEND[3:0] Input 40

CFGTE[3:0] Input 40

CP15SDISABLE[3:0] Input 50

VINITHI[3:0] Input 40

CLUSTERID[3:0] Input 40

PERIPHBASE[39:15] Input 40

SMPnAMP[3:0] Output 40

a. Not used if a GIC is not present.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-13
ID041213 Confidential
Introduction

1.3.2 Clocking scheme

Table 1-17 shows the clock enable signals included in the processor, the associated processor
signals and signal groups, and the corresponding synchronous system clock domain.

Table 1-17 Clock enable signals

Clock enable signal Processor signals/group System clock domain

ACLKENMa AXI Master interface AMBA AXI clock

ATCLKENb ATB interface AMBA ATB clock

PCLKENDBGa APB interface AMBA APB clock

a. Tie HIGH if not used in the system.

b. Not required if you are implementing at the CORTEXA7 level only.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-14
ID041213 Confidential
Introduction

1.3.3 Multi-cycle constraints

Certain paths require multi-cycle timing constraints. Table 1-18 shows the multi-cycle paths
according to the direction point.

Table 1-18 multi-cycle path timing constraints

Direction point multi-cycle path

To DFTSO*

Through *u_ca7caches_tlb_rams*/SO[*]

*u_ca7caches_tlb_rams*/SI[*]

*u_ca7_scu_l1d_tagrams/g_l1d_cpu*_rams*u_l1d_tagram_cpu*_way*/SO[*]

*u_ca7_scu_l1d_tagrams/g_l1d_cpu*_rams*u_l1d_tagram_cpu*_way*/SI[*]

*g_l2_rams*u_ca7_l2_*rams*/SO[*]a

*g_l2_rams*u_ca7_l2_*rams*/SI[*]a

From DFTSE

DFTRAMHOLD

DFTRSTDISABLE

DFTSI*

DFTRAMBYP

PERIPHBASE[*]

CLUSTERID[*]

CFGTE[*]

VINITHI[*]

DBGROMADDR[*]

DBGROMADDRV

DBGSELFADDR[*]

DBGSELFADDRV

BROADCASTINNER

BROADCASTOUTER

BROADCASTCACHEMAINT

u_cortexa7/u_cortexa7l2/g_l2_rams.u_ca7_l2_datarams/u_l2_dataram_*_low/CLKa

u_cortexa7/u_cortexa7l2/g_l2_rams.u_ca7_l2_datarams/u_l2_dataram_*_high/CLKa

a. These paths relate to the L2 data RAM read when the L2 cache is present. When L2_LATENCY is set to:
0 A multicycle setup path of 2 cycles must be specified.
1 A multicycle setup path of 3 cycles must be specified.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-15
ID041213 Confidential
Introduction

1.4 Implementation inputs

The Cortex-A7 MPCore Release Note describes deliverables that are inputs to the
implementation flow. These deliverables include:
• Design source code
• Models
• Documentation.

In addition to the deliverables, you require the following for implementation:

• EDA tools.
• Standard cell libraries.
• Libraries for any required custom cells.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-16
ID041213 Confidential
Introduction

1.5 Implementation flow

Figure 1-2 shows a generalized implementation flow.

Start

Validate unpacked RTL deliverable

Configure RTL

Build
Integrate memories
RTL

Validate RTL

Determine optimum floorplan

Perform synthesis

Create layout

Perform DRC and LVS checks

Reference
Perform logical verification Methodology
flow
Perform timing verification

Perform characterization

Generate production test patterns

Perform dynamic verification

Final
qualification
Sign-off

Complete

Figure 1-2 Implementation flow

Key implementation tasks on page 2-3 gives details of the steps in the implementation flow.

Note
Your contract requires you to complete sign-off as part of the completed flow. See
Implementation obligations on page vi.

Because the Cortex-A7 MPCore processor RTL is highly configurable, you must validate your
design at a number of points during implementation. Figure 1-3 on page 1-18 shows a simplified
view of the implementation process, indicating where you must test or validate your design, and
the additional validation recommended by ARM.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-17
ID041213 Confidential
Introduction

Start

5 Unpack delivered RTL, run vectors

Correct? No

Yes

3 Configure RTL

4 Integrate required RAM blocks

4 Run RAM integration testbench

For more 3 Chapter number Correct? No

information see E EDA tools documentation
Yes

5 Run check of RTL configuration

Correct? No

Yes
Optional, but recommended by ARM

Modify vector capture template 5 E Perform synthesis and place and route

E Perform LEC and STA

Capture vectors in system environment
5
using vector source code
Correct? No
Modify vector replay testbench
5 Yes
to match configuration

Replay captured vectors on

5
post-layout netlist

No Correct?

9 Sign-off

Figure 1-3 Implementation process showing testing and validation

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-18
ID041213 Confidential
Introduction

1.6 Implementation outputs

The outputs from the implementation flow are:

• Logs and reports:

— Implemented RTL vector replay logs and reports.
— Synthesis and place and route logs and reports.
— Post-layout Static Timing Analysis (STA) logs and reports.

• Logs and reports showing logical equivalence of post-layout netlist with configured RTL.

• Components:
— post-layout netlist.
— Synthesis timing model.
— Graphic Data System II (GDS II) data.
— Standard Delay Format (SDF) data.

• Test:
— Automatic Test Pattern Generation (ATPG) vectors.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-19
ID041213 Confidential
Introduction

1.7 Implementation reference data

The following sections give reference data for your implementation:
• Release directory structure.
• RTL hierarchy on page 1-21.

1.7.1 Release directory structure

Table 1-19 shows the top-level directories for each stage of implementation.

Table 1-19 Implementation stages and directories required

Implementation stage Directories required

Floor planning logical, implementation_<technology>/CORTEXA7INTEGRATION_<vendor>/

Synthesis logical, implementation_<technology>/CORTEXA7INTEGRATION_<vendor>/

Clock Tree Synthesis (CTS) and routing implementation_<technology>/CORTEXA7INTEGRATION_<vendor>/

Static Timing Analysis (STA) implementation_<technology>/CORTEXA7INTEGRATION_<vendor>/

Functional verification of netlist and RTL logical, implementation_<technology>/CORTEXA7INTEGRATION_vectors/

Production coverage test logical, implementation_<technology>/CORTEXA7INTEGRATION_<vendor>/

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-20
ID041213 Confidential
Introduction

1.7.2 RTL hierarchy

The logical/ directory contains the RTL hierarchy. Figure 1-4 shows the Cortex-A7 MPCore
RTL hierarchy.

<release_directory>/
logical/
ca7biu/
ca7dcu/
ca7dpu/
ca7icu/
ca7pfu/
ca7scu/
ca7stb/
ca7tlb/
cortexa7/
cortexa7integration/
cscti/
gic400/
models/
cells/
rams/

Figure 1-4 Cortex-A7 MPCore RTL hierarchy

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 1-21
ID041213 Confidential
Chapter 2
Key Implementation Points

This chapter describes the key implementation points you must consider when you implement
the Cortex-A7 MPCore processor. It contains the following sections:
• About key implementation points on page 2-2.
• Key implementation tasks on page 2-3.

Note
Some of the implementation steps listed in this chapter are EDA tool specific and are not
described in this document. See the supplied implementation reference methodology
documents.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 2-1
ID041213 Confidential
Key Implementation Points

2.1 About key implementation points

This chapter contains a list of the main points to consider when you implement the macrocell.
You must read this chapter in conjunction with the rest of the information in this guide, and the
information in Additional reading on page viii.

You can use this chapter to check that you have covered the implementation steps described in
the other chapters.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 2-2
ID041213 Confidential
Key Implementation Points

2.2 Key implementation tasks

Table 2-1 shows the key tasks for implementation.

Table 2-1 Implementation tasks

Key task Description

1. Validate delivered RTL using source code. How to validate the Cortex-A7 MPCore processor. See
Chapter 5 RTL Validation.

2. Configure design parameters. How to configure the implementation to the specific

requirements of the target application. See Chapter 3
Configuration Guidelines.

3. Perform RAM integration and run the testbench. How to integrate your RAM blocks into the Cortex-A7
MPCore processor. See Chapter 4 Memory Integration.

4. Confirm RTL configuration using source code. How to validate the Cortex-A7 MPCore processor using test
vectors. See Chapter 5 RTL Validation.

5. Determine optimum floorplan. What to consider when placing RAM blocks and what other
recommendations for optimizing performance. See Chapter 6
Floorplan Guidelines.

6. Perform synthesis. See the supplied implementation reference methodology

documents.
7. Create layout, based on your floorplan.

8. Perform LVS and DVC checks.

9. Perform timing verification.

10. Perform characterization.

11. Use the standard implementation flow to implement a DFT Reference data for production testing for the processor. See
solution. Chapter 7 Design for Test and the supplied implementation
reference methodology documents.

12. Perform dynamic verification and Logical Equivalence Reference data dynamic verification process. See Chapter 8
Checking (LEC). Dynamic Verification and the supplied implementation
reference methodology documents.

13. Perform sign-off in accordance with the required criteria. What are the verification criteria before you sign off in the
macrocell design in addition to your normal SoC flow sign-off
checks. See Chapter 9 Sign-off.

Note
You must complete the implementation process to produce complete and verified deliverables,
see Requirements for sign-off on page 9-4.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 2-3
ID041213 Confidential
Chapter 3
Configuration Guidelines

This chapter describes the guidelines for RTL configuration. These enable you to configure the
implementation to the specific requirements of the target application. It contains the following
sections:
• About configuration guidelines on page 3-2.
• Configuration options on page 3-3.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-1
ID041213 Confidential
Configuration Guidelines

3.1 About configuration guidelines

Before synthesizing the Cortex-A7 MPCore processor you must ensure that the configuration
options are set to your specific requirements.

Caution
For successful configuration of the RTL you must:
• Set the configurable options, see Configuration options on page 3-3.
• Integrate the memory, see Chapter 4 Memory Integration.
• Validate your configured RTL, see Validating RTL configuration on page 5-8.

If you do not complete and validate your configuration correctly, your synthesized design might
malfunction.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-2
ID041213 Confidential
Configuration Guidelines

3.2 Configuration options

The following sections describe the configuration options:
• About the Cortex-A7 MPCore.
• Global configuration on page 3-4.
• Individual processor configuration on page 3-5.
• How to configure your Cortex-A7 MPCore RTL on page 3-5.
• Global configuration options on page 3-7.
• Processor-level configuration options on page 3-9.
• Additional configuration requirements on page 3-9.

3.2.1 About the Cortex-A7 MPCore

The Cortex-A7 MPCore is a multiprocessor device that can be configured with between one and
four individual processors, and is implemented using either the top-level CORTEXA7INTEGRATION
Verilog module or the top-level CORTEXA7 Verilog module. Figure 3-1 shows a block diagram of
a Cortex-A7 MPCore processor configured with:
• Four processors at the CORTEXA7 level.
• Trace, CTI, APBROM, and APB decoder at the CORTEXA7INTEGRATION level.

CORTEXA7INTEGRATION

CORTEXA7
CTI
CTI
CTI
Trace L1 L1 L1 L1 L1 L1 L1 32KB L1
instruction data instruction data instruction data instruction data
cache† cache† cache† cache† cache† cache† cache† cache†

CTI
CTI
CTI
CTI
Processor 0 Processor 1 Processor 2 Processor 3

APBROM

Optional Optional
Interrupt Snoop Control Unit (SCU) L2 cache
APB decoder Controller controller††

Processor power
down Master interface

†
Configurable L1 cache size 8KB, 16KB, 32KB, or 64KB
††
Configurable L2 cache size None, 128KB, 256KB, 512KB, 1024KB

Figure 3-1 Example Cortex-A7 MPCore configuration

Note
If required, you can implement the Cortex-A7 MPCore processor without the
CORTEXA7INTEGRATION level.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-3
ID041213 Confidential
Configuration Guidelines

Verilog parameters control the configuration of your Cortex-A7 MPCore implementation. There
are two levels of parameter:
• Global parameters control your implementation of the Cortex-A7 MPCore processor as
a whole.
• Processor-level parameters control your implementation of each individual processor in
the multiprocessor device. For example, they control whether each processor in the
multiprocessor device is configured with a Floating Point Unit (FPU) or NEON Media
Processing Engine (MPE).
You must define a complete set of processor-level configuration options for each
processor in your multiprocessor implementation.

3.2.2 Global configuration

Table 3-1 shows the global configuration options for the Cortex-A7 MPCore processor. See
Global configuration options on page 3-7 for descriptions of each of the configuration options.

Table 3-1 Global configuration settings

Configuration
Feature Permitted values Comment
parameter

Number of processors in NUM_CPUS 1, 2, 3, 4 You must select a permitted

the multiprocessor device value

Number of interrupts NUM_SPISa 0 - 480 in steps of 32 You must select a permitted

value

Integrated Generic GIC_PRESENT 0 = GIC not present -

Interrupt Controller 1 = GIC present
(GIG)

L2 cache controller L2_CACHE_PRESENT 0 = L2 cache not present -

1 = L2 cache present

L1 instruction cache size L1_ICACHE_SIZE 000 = 8KB $ You must select a permitted
001 =16KB $ value
L1 data cache size L1_DCACHE_SIZE
011 = 32KB $
111 = 64KB $

L2 cache size L2_CACHE_SIZE 000 = 128KB $ You must select a permitted

001 = 256KB $ value
011 = 512KB $
111 = 1024KB $

L2 data RAM cycle L2_LATENCY 0 = 2 cycles -

latency 1 = 3 cycles

Trace for each processorb ETM_PRESENT 0 = ETM and CTI for each processor not present -
1 = ETM and CTI for each processor present

ROM APB base addressb APBADDR_A7ROM - The default value is 17'h0_0000

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-4
ID041213 Confidential
Configuration Guidelines

Table 3-1 Global configuration settings (continued)

Configuration
Feature Permitted values Comment
parameter

CPU APB debug base APBADDR_CPU - The default value is 17'h1_0000

addressb

CTI APB debug base APBADDR_CTI - The default value is 17'h1_8000

addressb

ETM APB debug base APBADDR_ETM - The default value is 17'h1_C000

addressb
a. If the integrated GIC is not present NUM_SPIS should be 0.
b. This feature is not available in the CORTEXA7 top-level Verilog module.

3.2.3 Individual processor configuration

The configuration options for each implemented processor in the multiprocessor device include
which floating-point support you require, chosen from the following options:
• Implement the NEON MPE and the FPU.
• Implement the FPU only.
• Do not implement any floating-point support.

Processor-level configuration options on page 3-9 describes each of these options.

Table 3-2 shows the processor-level configuration options and the permitted parameter settings
for those options. <n> denotes processor number (0-3).

Table 3-2 Individual processor configuration options

Feature Configuration parameter Permitted values Comment

FPU FPU_<n> 0, 1 You can select either permitted value. 1 indicates the
feature is included. Must be 1 if NEON_<n> is 1.

NEON NEON_<n> 0, 1 You can select either permitted value. 1 indicates the
feature is included.

3.2.4 How to configure your Cortex-A7 MPCore RTL

Global configuration on page 3-4 introduces the global configuration options and lists the
global configuration parameters. Individual processor configuration gives the same information
for the processor level configuration. Using the information in those sections, you can set your
chosen configuration options for the CORTEXA7INTEGRATION top-level Verilog module as follows:

1. Select the configuration file:

logical/cortexa7integration/verilog/CORTEXA7INTEGRATION_CONFIG.v

2. Edit the file to match the required implementation configuration.

3. Build the Verilog using the top-level module:

logical/cortexa7integration/verilog/CORTEXA7INTEGRATION.v.

If you are implementing at the CORTEXA7 top-level Verilog module or you are performing RAM
integration and using the testbench:

1. Select the configuration file:

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-5
ID041213 Confidential
Configuration Guidelines

logical/cortexa7/verilog/CORTEXA7_CONFIG.v

2. Edit the file to match the required implementation configuration.

3. Build the Verilog using the top-level module:

logical/cortexa7/verilog/CORTEXA7.v

If you are implementing at the cortexa7core top-level Verilog module:

1. Select the configuration file:

logical/cortexa7/verilog/cortexa7core_config.v

2. Edit the file to match the required implementation configuration.

3. Build the Verilog using the top-level module:

logical/cortexa7/verilog/cortexa7core.v

Example 3-1 shows the configuration file for a Cortex-A7 MPCore implementation with:
• Trace.
• L2 cache.
• 32 interrupts.
• Two uniform processors, each without FPU and NEON.

Example 3-1 Cortex-A7 MPCore example configuration file

// ------------------------------------------------------
// Integration layer parameters
// ------------------------------------------------------

parameter ETM_PRESENT = 0, // Include CTI and ETM for each core in the cluster

parameter [16:0] APBADDR_A7ROM = 17'h0_0000, // ROM Table configuration: ROM APB debug base address
parameter [16:0] APBADDR_CPU = 17'h1_0000, // ROM Table configuration: CPU APB debug base address
parameter [16:0] APBADDR_CTI = 17'h1_8000, // ROM Table configuration: CTI APB debug base address
parameter [16:0] APBADDR_ETM = 17'h1_C000, // ROM Table configuration: ETM APB debug base address

// ------------------------------------------------------
// Cluster Parameters
// ------------------------------------------------------

parameter L2_CACHE_PRESENT = 1, // Include L2 Cache

parameter NUM_CPUS = 2, // The number of cores in the cluster

parameter GIC_PRESENT = 0, // Include the GIC v2

parameter NUM_SPIS = 0, // The Number of Interrupt Distributor Interrupt Lines 0 <= x <= 480 in steps of
// 32

// L1 Instruction and Data Cache Size Encoding

//
// 000 : 8kB $
// 001 : 16kB $
// 011 : 32kB $
// 111 : 64kB $

parameter [2:0] L1_ICACHE_SIZE = 3'b011,

parameter [2:0] L1_DCACHE_SIZE = 3'b011,

// L2 Cache Size Encoding

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-6
ID041213 Confidential
Configuration Guidelines

// 000 : 128kB $
// 001 : 256kB $
// 011 : 512kB $
// 111 : 1024kB $

parameter [2:0] L2_CACHE_SIZE = 3'b001,

// L2 Latency Encoding
//
// 0 : 2 cycles
// 1 : 3 cycles

parameter [0:0] L2_LATENCY = 1'b0,

// ------------------------------------------------------
// Core 0
// ------------------------------------------------------

parameter FPU_0 = 1, // Include support for VFPv4 operations

parameter NEON_0 = 1, // Include support for SIMDv2 operations

// ------------------------------------------------------
// Core 1 (if present)
// ------------------------------------------------------

parameter FPU_1 = 1, // Include support for VFPv4 operations

parameter NEON_1 = 1, // Include support for SIMDv2 operations

// ------------------------------------------------------
// Core 2 (if present)
// ------------------------------------------------------

parameter FPU_2 = 1, // Include support for VFPv4 operations

parameter NEON_2 = 1, // Include support for SIMDv2 operations

// ------------------------------------------------------
// Core 3 (if present)
// ------------------------------------------------------

parameter FPU_3 = 1, // Include support for VFPv4 operations

parameter NEON_3 = 1 // Include support for SIMDv2 operations

3.2.5 Global configuration options

Global configuration on page 3-4 introduces the global configuration options. The following
sections describe each of these options:
• L2 cache.
• Number of processors in the multiprocessor device.
• Number of interrupts on page 3-8.
• CORTEXA7INTEGRATION level component base addresses on page 3-8.

L2 cache

To include L2 cache, set the Verilog parameter L2_CACHE_PRESENT to 1.

Number of processors in the multiprocessor device

You must configure the number of processors in the multiprocessor device to a value from one
to four by setting the NUM_CPUS Verilog parameter.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-7
ID041213 Confidential
Configuration Guidelines

Number of interrupts

Set the Verilog parameter NUM_SPIS to define the number of interrupts in your design. You can
set NUM_SPIS from 0-480 in steps of 32. If the integrated GIC is not present, NUM_SPIS should
be 0.

Integrated GIC

To include an integrated GIC, set the Verilog parameter GIC_PRESENT to 1.

CORTEXA7INTEGRATION level component base addresses

You must set the CORTEXA7INTEGRATION level ROM and components base addresses. Table 3-3
shows the default values.

Table 3-3 Component default base address

Component Verilog parameter Default base address

APB ROM APBADDR_A7ROM 17'h0_0000

CPU APBADDR_CPU 17'h1_0000

CTI APBADDR_CTI 17'h1_8000

ETM APBADDR_ETM 17'h1_C000

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-8
ID041213 Confidential
Configuration Guidelines

3.2.6 Processor-level configuration options

You must set the processor-level options for each processor included in the multiprocessor
device. This section describes the options:
• Implementing the Floating Point Unit (FPU).
• Implementing the NEON Media Processing Engine (MPE).

Implementing the Floating Point Unit (FPU)

To configure the FPU in a processor included in the multiprocessor device, set the Verilog
parameter FPU_<n> to 1, where <n> is the processor number (0-3). If a processor in the
multiprocessor device does not require the FPU, set the Verilog parameter FPU_<n> to 0.

Note
For each processor included in the multiprocessor device, set the value of the FPU_<n> parameter
to either 1 or 0.

Implementing the NEON Media Processing Engine (MPE)

To configure the NEON MPE in a processor included in the multiprocessor device, set the
Verilog parameter NEON_<n> to 1, where <n> is the processor number 0-3. If a processor in the
multiprocessor device does not require the NEON Media Processing Engine, set the Verilog
parameter NEON_<n> to 0.

Note
• For each processor in the multiprocessor device, set the value of the NEON_<n> parameter
to either 1 or 0.

• For any processor in the multiprocessor device, if you set the value of the NEON_<n>
parameter to 1 you must also set the value of the FPU_<n> parameter to 1.

3.2.7 Additional configuration requirements

The following section describes additional configuration requirements that might apply to your
implementation:
• Implementation defined cells.

Implementation defined cells

In the deliverables supplied by ARM, the directory logical/models/cells/generic/ includes

generic cells for clock gating and synchronizers. You can replace any of these generic cells by
implementation defined cells, as follows:

1. Install your implementation defined cells at the same level in the hierarchy, for example
in the directory logical/models/cells/my_cells/.

2. Configure your synthesis tools to point to this location for these cells.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 3-9
ID041213 Confidential
Chapter 4
Memory Integration

This chapter describes the RAM organization and how to integrate your RAM blocks into the
processor. It contains the following sections:
• About memory integration on page 4-2.
• Resource requirements for memory integration on page 4-4.
• Controls and constraints for memory integration on page 4-5.
• Blocks for memory integration on page 4-7.
• Flow for memory integration on page 4-20.
• Confirmation of memory integration on page 4-22.
• Outputs from memory integration on page 4-26.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-1
ID041213 Confidential
Memory Integration

4.1 About memory integration

Caution
For successful configuration of the RTL you must:
• Set any configurable options, see Configuration options on page 3-3.
• Integrate the memory.
• Validate your configured RTL, see Validating RTL configuration on page 5-8.

Failure to complete all the necessary configuration can result in malfunction.

A Cortex-A7 MPCore processor includes the following memory modules that you must
implement:
• Instruction cache data RAM.
• Instruction cache tag RAM.
• Data cache data RAM.
• Data cache tag RAM.
• Data cache dirty RAM.
• Translation Lookaside Buffer (TLB) RAM.
• Snoop Control Unit (SCU) duplicate tag RAM.

You must implement these modules for each processor in the multiprocessor device, and your
implementation as either:
• Uniform, meaning it has the same RAM configuration for each processor in the
multiprocessor device.
• Non-uniform, meaning the RAM configurations are not the same for all processors in the
multiprocessor device.

If L2_CACHE_PRESENT is set, a Cortex-A7 MPCore processor also includes the following memory
modules that you must implement:
• L2 cache tag RAM.
• L2 cache data RAM.

You must ensure you have set the L1 instruction and data cache size parameters, L1_ICACHE_SIZE
and L1_DCACHE_SIZE , and if L2 cache is present, L2_CACHE_SIZE.

See Blocks for memory integration on page 4-7.

Figure 4-1 on page 4-3 shows the memory integration process.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-2
ID041213 Confidential
Memory Integration

Controls and constraints:

Access timing
Setup and hold times
Total memory size
Memory block size and width
Control and addressing
Organization
Clocking
Clock gating
Power

Outputs:
Inputs: Memory
Configured RTL
RTL Integration
Reports and logs

Resources:
RAM models, standard cell libraries
HDL simulators
Memory integration testbench

Figure 4-1 Memory integration process

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-3
ID041213 Confidential
Memory Integration

4.2 Resource requirements for memory integration

To perform memory integration, you require a RAM generator that can generate RAMs that
meet the constraints listed in Controls and constraints for memory integration on page 4-5. You
also require the supplied memory integration testbench to validate your memory integration, see
Confirmation of memory integration on page 4-22.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-4
ID041213 Confidential
Memory Integration

4.3 Controls and constraints for memory integration

You must ensure your library RAM satisfies the following requirements:

• All timings must be synchronous to the rising clock edge.

• All accesses must be single-cycle.

• Chip select, RAM enable, control signals must be used.

• RAM outputs must always be valid.

• RAM outputs must not be tristate.

• Byte-write, or bit-write, control signals must be used for some RAM blocks. If byte-write
or bit-write RAM blocks are not available, you must use blocks of narrower RAM. This
increases the overall RAM area.

4.3.1 L1 and SCU duplicate tag RAMs

Table 4-1 shows the access timing requirements for the L1 and SCU duplicate tag RAMs

Table 4-1 L1 and SCU duplicate tag RAM timing requirements

RAM type Setup time as a percentage of clock cycle Access time as a percentage of clock cycle

Data 40 50

Tag 40 35

Dirty 40 50

TLB 40 35

SCU tag 40 35

Figure 4-2 shows the L1 and SCU duplicate tag RAMs access timings.

Write access Read access

Clock

External chip select

Write enable

Index Write address Read address

Write data Write data

Read data Read data

Write accept/ Read accept/
Write setup Read setup
complete complete

Figure 4-2 L1 and SCU duplicate tag RAMs access timings

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-5
ID041213 Confidential
Memory Integration

4.3.2 L2 tag and data RAM

Table 4-2 shows the timing requirements for the L2 tag and data RAMs.

Table 4-2 L2 tag and data RAMs timing requirements

RAM type Setup time as a percentage of clock cycle Access time as a percentage of clock cycle

L2 tag 20 50

L2 data 20 50

Figure 4-3 shows the L2 tag RAMs access timings.

Write access Read access

Clock

External chip select

Write enable

Index Write address Read address

Write data Write data

Read data Read data

Write accept/ Read access/
Write setup Read setup
complete complete

Figure 4-3 L2 tag RAMs access timings

Figure 4-4 shows the L2 data RAMs access timings when latency is 2 cycles.

Read access
Write access

Clock

CLKEN

Clk_bank

Chip select

Write enable

Index Write address Read address

Write data Write data

Read data Read data

Write complete/
Write setup Write accept Read accept Read complete
read setup

Figure 4-4 L2 data RAMs access timings

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-6
ID041213 Confidential
Memory Integration

4.4 Blocks for memory integration

The following sections describe the blocks for memory integration:
• Location of memory arrays.
• RAM block instantiation.
• RAM block implementations on page 4-10.

4.4.1 Location of memory arrays

Table 4-3 shows the locations of RAM arrays.

Table 4-3 RAM arrays

Location Contents

logical/models/rams/generic/ca7caches_tlb_rams.v RAM arrays for all the processors in the

multiprocessor device

logical/models/rams/generic/ca7_scu_l1d_tagrams.v All RAM arrays for the SCU duplicate tags

logical/models/rams/generic/ca7_l2_datarams.v All RAM arrays for L2 data

logical/models/rams/generic/ca7_l2_tagrams.v All RAM arrays for L2 tags

4.4.2 RAM block instantiation

Table 4-4 shows example RAM blocks instantiations in the logical/model/rams directory for
each processor in the Cortex-A7 MPCore processor and the SCU.

Note
Items prefixed with u_ indicate an instance name of a module.

Table 4-4 Instantiated RAM blocks

RAM array Instance names Description

ca7caches_tlb_rams.v a u_idata_bank0 to u_idata_bank1 Instruction cache data RAMs

u_itag_ram0 to u_itag_ram1 Instruction cache tag RAMs

u_ddata_bank0 to u_ddata_bank7 Data cache data RAMs

u_dtag_bank0 to u_dtag_bank3 Data cache tag RAMs

u_ddirty_ram Data cache dirty RAM

u_tlb_bank0 to u_tlb_bank1 TLBRAMs

ca7_scu_l1d_tagrams.v u_l1d_tagram_cpu0_way0 to SCU duplicate tag RAMs

u_l1d_tagram_cpu<m>_way3b

ca7_l2_datarams u_l2_dataram_0 to u_l2_dataram_7 L2 Data RAMs

ca7_l2_tagrams u_l2_tagram_way0 to L2 Tag RAMs

u_l2_tagram_way7

a. For each processor in the multiprocessor device.

b. The value of <m> is (p-1), where p is the number of processors in the multiprocessor device.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-7
ID041213 Confidential
Memory Integration

The size of the instruction cache and data cache RAM instances and the SCU duplicate tag RAM
instances depend on your configured cache sizes. You can configure different cache sizes for
each processor in the multiprocessor device. For more information, see RAM instance sizes.

Note
• The Cortex-A7 MPCore design provides a write enable signal to each RAM, and byte or
bit enables for RAM blocks that require byte or bit enables. You can ignore the write
enable signal if your RAM only requires the byte-write or bit-write enable inputs.

• If you instantiate a larger RAM than required, for example if your RAM generator cannot
produce a RAM of the required size, you must tie the redundant upper address bit or bits
LOW.

• In the ideal case, you can produce a single block of compiled RAM for each block of
RAM. This might not be possible if:
— Your RAM does not have the required byte-write control. In this case you must
construct the RAM out of multiple blocks of byte-wide RAM. See Producing
byte-write memory from word-write RAM on page 4-20.
— Your compiler cannot produce a single RAM block that is the required size, or a
single RAM block might not meet the timing requirements. In this case, you must
produce the RAM out of two or more blocks of smaller RAM. See Producing a
large memory from smaller RAM blocks on page 4-20.

RAM instance sizes

This section describes the RAM instances in a Cortex-A7 MPCore design:

• Table 4-5 shows the L1 cache and SCU RAMs. For these RAMs, the size of the RAM
instances depends on the cache size you have configured.

• Table 4-6 on page 4-9 shows the TLB RAMs.

• Table 4-7 on page 4-9 shows the L2 cache data and tag RAMs.

Table 4-5 L1 cache and SCU RAM types and sizes

Write enable type Instance size, for cache size Number of

instances per
RAM block name processor in the
Bit Subword Global 8KB 16KB 32KB 64KB multiprocessor
device

Instruction cache data RAM - 18 - 512x72 1024x72 2048x72 4096x72 2

Instruction cache tag RAM - - Yes 128x31 256x31 512x31 1024x31 2

Data cache data RAM - 8 - 256x32 512x32 1024x32 2048x32 8

Data cache tag RAM - - Yes 32x32 64x31 128x30 256x29 4

Data cache dirty RAM Yes - - 32x20 64x20 128x20 256x20 1

Snoop control unit tag RAM - - Yes 32x32 64x31 128x30 256x29 4

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-8
ID041213 Confidential
Memory Integration

Table 4-6 TLB RAM sizes

Write enable type Number of instances

RAM block name Instance size per processor in the
Bit Subword Global multiprocessor device

TLB RAM - - Yes 192x86 2

Table 4-7 L2 RAM types and sizes

Write enable type Instance size, for cache size Number of

instances in the
RAM block name
multiprocessor
Bit Subword Global 128KB 256KB 512KB 1024KB
device

Data RAM - - Yes 2048X64 4096X64 8192X64 16384X64 8

Tag RAM - - Yes 256X33 512x32 1024x31 2048x30 8

4.4.3 Configuring the L1 cache sizes

The multiprocessor device must be configured to specify what cache sizes have been
implemented. To do this, you must set the following parameters to the correct value:
• L1_ICACHE_SIZE for the instruction cache.
• L1_DCACHE_SIZE for the data cache.

Note
Regardless which level of integration you use, you must always update the CORTEXA7_CONFIG.v
configuration file.

Table 4-8 shows the encoding values. The instruction cache can be a different size from the data
cache. All processors have the same sizes of L1 cache and TLB RAMs.

Each processor has the same size of RAMs by default. It is possible to create a non-uniform
configuration, where the processors in a multiprocessor device have different cache sizes.

Table 4-8 L1 cache sizes encoding values

Cache size Signal encoding

8KB 0b000

16KB 0b001

32KB 0b011

64KB 0b111

4.4.4 Configuring the L2 cache sizes

If L2_CACHE_PRESENT is set the L2 cache must be configured to specify what cache sizes have been
implemented. To do this set the L2_CACHE_SIZE parameter to the correct value. See Table 4-9 on
page 4-10.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-9
ID041213 Confidential
Memory Integration

Table 4-9 shows L2 cache sizes encoding values.

Table 4-9 L2 cache sizes encoding values

Cache size Signal encoding

128KB 0b000

256KB 0b001

512KB 0b011

1024KB 0b111

4.4.5 RAM block implementations

The following sections describe the implementation of each of the RAM blocks:
• Instruction cache data RAM on page 4-11.
• Instruction cache tag RAMs on page 4-11.
• Data cache data RAM on page 4-12.
• Data cache tag RAM on page 4-13.
• Data cache dirty RAM on page 4-14.
• TLB RAM on page 4-15.
• SCU duplicate tag RAMs on page 4-16.
• L2 tag on page 4-17.
• L2 data on page 4-18.

Note
These sections describe the implementation of each RAM block. Before implementing your
RAM blocks, ARM strongly recommends that you look at the files:
• logical/models/rams/generic/ca7caches_tlb_rams.v.
• logical/models/rams/generic/ca7_scu_l1d_tagrams.v.
• logical/models/rams/generic/ca7_l2_datarams.v.
• logical/models/rams/generic/ca7_l2_tagrams.v.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-10
ID041213 Confidential
Memory Integration

Instruction cache data RAM

All the instruction cache data RAM signals are active HIGH. If any of the signals in your
instruction cache data RAMs are active LOW, you must reverse the polarity.

The instruction cache data RAMs are two 72-bit wide RAM blocks with 18-bit word enables.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The RAM is byte enabled, divided into two ways. The following describes how to connect the
Instruction cache data RAM enable, address, write, and data signals:

Enable Connect ic_dataram_en_i[<n>] to the chip enable pin of way n of your

RAM. Where <n> is 0-1. If more than one RAM block is used per way
use this for each block.

Write enable Connect ic_dataram_wr_i to the write enable pin of each RAM used. The
write enable pin is also known as the global write enable.

Byte write enable Connect ic_dataram_strb_i[3:0] to the byte write enable pin of each
RAM. Each address bit represents 18 bits of data. If a bit writable RAM is
used you must replicate each bit 18 times.

Address Connect ic_dataram_addr_i to the address pins of each RAM. The

number of address bits used is dependent on the cache size.
Table 4-10 shows which bits of the address bus, ic_dataram_addr_i,
connect to the RAM blocks for each cache size.

Table 4-10 Instruction cache RAM address bus connections

Cache size Address connections

8KB ic_dataram_addr_i[8:0]

16KB ic_dataram_addr_i[9:0]

32KB ic_dataram_addr_i[10:0]

64KB ic_dataram_addr_i[11:0]

Read data Connect ic_dataram_rdata<n>_o[71:0] to the output data pins of way n

of your RAM. Where <n> is 0-1.

Write data Connect ic_dataram_wdata_i[71:0] to the input data pins of each RAM.

Instruction cache tag RAMs

All the instruction cache tag RAMs signals are active HIGH. If any of these signals are active
LOW you must reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-11
ID041213 Confidential
Memory Integration

The RAM is word enabled, divided into two ways. The following describes how to connect the
Instruction cache tag RAM enable, address, write, and data signals:

Enable Connect ic_tagram_en_i[<n>] to the chip enable pin of way <n> of your
RAM. Where <n> is 0-1. If more than one RAM block is used per way use
this for each block.

Write enable Connect ic_tagram_wr_i to the write enable pin of each RAM used. The
write enable pin is also known as the global write enable.
If a bit writable RAM is used replicate each bit 31 times.

Address Connect ic_tagram_addr_i to the address pins of each RAM. The number
of bits to be used is dependent on the cache size, see Table 4-11.
Table 4-11 shows which bits of the address bus, ic_tagram_addr_i,
connect to the RAM blocks for each cache size.

Table 4-11 Instruction cache tag RAMs address connection

Cache size Address connection

8KB ic_tagram_addr_i[6:0]

16KB ic_tagram_addr_i[7:0]

32KB ic_tagram_addr_i[8:0]

64KB ic_tagram_addr_i[9:0]

Read data Connect ic_tagram_rdata<n>_o[30:0] to the output data pins of way<n>

of your RAM. Where <n> is 0-1.

Write data Connect ic_tagram_wdata_i[30:0] to the input data pins of each RAM.

Data cache data RAM

All the data cache data RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The RAM is byte enabled, divided into four ways each containing two banks. The following
describes how to connect the data cache data RAM enable, write, and data signals:

Enable Connect to dc_dataram_en_i[<n>] to the chip enable pin of bank <n> of

your RAM. Where <n> is 0-7. If more than one RAM block is used per
bank use this for each block.

Write enable Connect dc_dataram_wr_i to the write enable pin of each RAM used.
The write enable pin is also known as the global write enable.

Byte write enable Connect dc_dataram_strb<n>_i[3:0] to the byte write enable pin of each
bank. Where <n> is 0-7. Each bit represents a byte of data hence if a bit
writable RAM is used replicate each bit 4 times.

Address Connect dc_dataram_addr<n>_i[<msb>:0] to the address pins of each

bank. Where <n> is 0-7 and <msb> is 7,8, 9, or 10. The number of bits to
be used is dependent on the cache size, see Table 4-12 on page 4-13.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-12
ID041213 Confidential
Memory Integration

Table 4-12 shows which bits of the address bus, dc_dataram_addr<n>_i,

connect to the RAM blocks for each cache size.

Table 4-12 Data cache data RAM address connection

Cache size Address connection

8KB dc_dataram_addr<n>_i[7:0]

16KB dc_dataram_addr<n>_i[8:0]

32KB dc_dataram_addr<n>_i[9:0]

64KB dc_dataram_addr<n>_i[10:0]

Read data Connect dc_dataram_rdata<n>_o[31:0] to the output data pins of

bank<n> of your RAM. Where <n> is 0 -7.

Write data Connect dc_dataram_wdata<n>_o[31:0] to the input data pins of

bank<n> of your RAM. Where <n> is 0-7.

Data cache tag RAM

All the data cache tag RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The RAM is word enabled, divided into 4 ways. The following describes how to connect the
data cache tag RAM enable, write, address, and data signals:

Enable Connect dc_tagram_en_i[<n>] to the chip enable pin of way <n> of your
RAM. Where <n> is 0-3. If more than one RAM block is used per way use
this for each block.

Write enable Connect dc_tagram_wr_i to the write enable pin of each RAM used.
Write enable is also known as global write enable. If bit writable RAMs
are used, replicate each bit N times, where N depends on cache size, see
Table 4-13.Table 4-13 shows the number of bits for different bit writable
data cache tag RAM sizes.

Table 4-13 Bit writable data cache tag RAM sizes

Number
Cache size
of bits, N

8KB 32

16KB 31

32KB 30

64KB 29

Address Connect dc_tagram_addr_i to the address pins of each RAM. The

number of bits to be used is dependent on the cache size, see Table 4-15
on page 4-14.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-13
ID041213 Confidential
Memory Integration

Table 4-15 shows which bits of the address bus, dc_tagram_addr_i,

connect to the RAM blocks for each cache size.

Table 4-14 Data cache tag RAM address connection

Cache size Address connection

8KB dc_tagram_addr_i[4:0]

16KB dc_tagram_addr_i[5:0]

32KB dc_tagram_addr_i[6:0]

64KB dc_tagram_addr_i[7:0]

Read data Connect dc_tagram_rdata<n>_o to the output data pins of way<n> of

your RAM. Where <n> is 0-3. The size of the RAM depends on the cache
size. The bigger the cache size the fewer bits stored in physical memory,
see Table 4-15

Table 4-15 Data cache tag RAM read data connection

Cache size Read data connection

8KB dc_tagram_rdata<n>_i[31:0]

16KB dc_tagram_rdata<n>_i[31:1]

32KB dc_tagram_rdata<n>_i[31:2]

64KB dc_tagram_rdata<n>_i[31:3]

Write data Connect dc_tagram_wdata_i to the input data pins of each RAM. The
size of the RAM depends on the cache size. The bigger the cache size the
fewer bits stored in physical memory, see Table 4-16.

Table 4-16 Data cache tag write read data connection

Cache size Write data connection

8KB dc_tagram_wdata_i[31:0]

16KB dc_tagram_wdata_i[31:1]

32KB dc_tagram_wdata_i[31:2]

64KB dc_tagram_wdata_i[31:3]

Data cache dirty RAM

All the data cache dirty RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-14
ID041213 Confidential
Memory Integration

The RAM is bit enabled. To connect the data cache dirty RAM enable, address, write, and data
signals:

Enable Connect to dc_dirtyram_en_i to the chip enable pin of your RAM. If

more than one RAM block is used per bank use this for each block.

Write enable Connect dc_dirtyram_wr_i to the write enable pin of each RAM used.
The write enable pin is also known as the global write enable.

Bit write enable Connect dc_dirtyram_strb_i[19:0] to the bit write enable pin of each
bank.

Address Connect dc_dirtyram_addr_i to the address pins of each bank. The

number of bits to be used is dependent on the cache size, Table 4-17.
Table 4-17 shows the data cache dirty RAM connections.

Table 4-17 Data cache dirty RAM address connection

Cache size Address connection

8KB dc_dirtyram_addr_i[4:0]

16KB dc_dirtyram_addr_i[5:0]

32KB dc_dirtyram_addr_i[6:0]

64KB dc_dirtyram_addr_i[7:0]

Read data Connect dc_dirtyram_rdata_o[19:0] to the output data pins of your

RAM.

Write data Connect dc_dirtyram_wdata_o[19:0] to the input data pins of your

RAM.

TLB RAM

All the TLB RAM signals are active HIGH. If any of these signals are active LOW you must
reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The RAM is word enabled, divided into two ways. The following describes how to connect the
TLB RAM enable, write, address, and data signals:

Enable Connect tlb_ram_en_i[<n>] to the chip enable pin of way <n> of your
RAM. Where <n> is 0-1. If more than one RAM block is used per way use
this for each block.

Write enable Connect tlb_ram_wr_i to the write enable pin of each RAM used. The
Write enable is also known as the global write enable. If a bit writable
RAM is used replicate each bit 86 times.

Address Connect tlb_ram_addr_i[7:0] to the address pins of each RAM.

Read data Connect tlb_ram_rdata<n>_o[85:0] to the output data pins of way<n> of

your RAM. Where <n> is 0-1

Write data Connect tlb_ram_wdata_i[85:0] to the input data pins of each RAM.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-15
ID041213 Confidential
Memory Integration

SCU duplicate tag RAMs

All the SCU duplicate tag RAM signals are active HIGH. If any of these signals are active LOW
you must reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The SCU requires four SCU duplicate tag RAM arrays for each processor included in the
Cortex-A7 MPCore processor.

Note
SCU duplicate tag RAMs must always be instantiated. This includes implementations where
NUM_CPUS is set to 1 and ACVALID is tied LOW.

SCU duplicate tag RAMs must have the same number of indexes as the related Data tag RAM
which is defined by the size of the data cache configured for the processor.

The RAM is word enabled, divided into 4 ways. The following describes how to connect the
SCU duplicate tag RAM enable, write, address, and data signals:

Enable Connect l1d_tagram_cpu<m>_en_i[<n>] to the chip enable pin of way <n> of

your RAM, where <n> is 0-3 and m is between 0 and one less than the number of
processors in the cluster. If more than one RAM block is used per way use this for
each block.

Write enable Connect l1d_tagram_cpu<m>_wr_i to the write enable pin of each RAM used.
The write enable is also known as the global write enable. If bit writable If bit
writable RAMs are used, replicate each bit N times, where N depends on cache
size, see Table 4-13 on page 4-13.

Address Connect l1d_tagram_cpu<m>_addr_i to the address pins of each RAM. The

number of bits to be used is dependent on the cache size, see Table 4-18.
Table 4-18 shows the SCU duplicate tag RAM connections.

Table 4-18 SCU duplicate tag RAM address connections

Cache size Address connections

8KB l1d_tagram_cpu<m>_addr_i[4:0]

16KB l1d_tagram_cpu<m>_addr_i[5:0]

32KB l1d_tagram_cpu<m>_addr_i[6:0]

64KB l1d_tagram_cpu<m>_addr_i[7:0]

Read data Connect l1d_tagram_cpu<m>_rdata_way<n>_o to the output data pins of way

n of your RAM, where <m> and <n> is 0-3. The size of the RAM depends on the
cache size. The bigger the size the fewer bits stored in physical memory, see
Table 4-15 on page 4-14.

Write data Connect l1d_tagram_cpum_wdata_i to the input data pins of each RAM. The
size of the RAM bigger the size the fewer bits stored in physical memory, see
Table 4-16 on page 4-14.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-16
ID041213 Confidential
Memory Integration

L2 tag

All the L2 tag RAM signals are active HIGH. If any of these signals are active LOW you must
reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The RAM is word enabled, divided into 8 ways. The following describes how to connect the L2
tag RAM enable, write, address, and data signals:

Enable Connect l2_tagram_en_i[<n>] to the chip enable pin of way <n> of your RAM,
where <n> is 0-7. If more than one RAM block is used per way use this for each
block.

Write enable Connect l2_tagram_wr_i to the write enable pin of each RAM used.
Write enable is also known as global write enable.

Address Connect l2_tagram_addr_i to the address pins of each RAM.

The number of bits to be used is dependent on the cache size, see Table 4-19.
Table 4-19 shows the L2 tag RAM address connections.

Table 4-19 L2 tag RAM address connections

Cache size Address connections

128KB l2_tagram_addr_i[7:0]

256KB l2_tagram_addr_i[8:0]

512KB l2_tagram_addr_i[9:0]

1024KB l2_tagram_addr_i[10:0]

Read data Connect l2_tagram_rdata_way<n>_o to the output data pins of way <n> of
your RAM, where <n >is 0-7. The size of the RAM depends on the cache size.
The bigger the size the fewer bits stored in physical memory, see Table 4-20.
Table 4-20 shows the L2 tag RAM read data connections.

Table 4-20 L2 tag RAM read data connections

Cache size Read data connection

128KB l2_tagram_rdata_way<n>_i[32:0]

256KB l2_tagram_rdata_way<n>_i[32:1]

512KB l2_tagram_rdata_way<n>_i[32:2]

1024KB l2_tagram_rdata_way<n>_i[32:3]

Write data Connect l2_tagram_wdata_i to the input data pins of each RAM. The size of the
RAM depends on the cache size. The bigger the size the fewer bits stored in
physical memory, see Table 4-21 on page 4-18.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-17
ID041213 Confidential
Memory Integration

Table 4-21 shows the L2 tag write data connections.

Table 4-21 L2 tag write data connection

Cache size Write data connection

128KB l2_tagram_wdata_i[32:0]

256KB l2_tagram_wdata_i[32:1]

512KB l2_tagram_wdata_i[32:2]

1024KB l2_tagram_wdata_i[32:3]

L2 data

All the L2 data RAM signals are active HIGH. If any of these signals are active LOW you must
reverse the polarity.

If you use RAMs with wider data input or output buses, you must:
• Tie the unused inputs LOW.
• Leave the unused outputs unconnected.

The RAM is word enabled, divided into 8 ways. The L2 data RAMs is enabled at most once
every two cycles, to support longer-latency RAMs. If the RAM modules you are using do not
support being clocked at the full processor frequency, then a clock gate must be instantiated to
produce a gated clock that can be used with the RAMs. The following describes how to connect
the L2 data RAM enable, write, address, and data signals:

Clock enable Connect l2_dataram_clken_i to the enable pin of all of the clock gates
driving the RAM modules.

Enable Connect l2_dataram_en_i[<n>] to the chip enable pin of way <n> of

your RAM, where <n> is 0-7. If more than one RAM block is used per way
use this for each block.

Write enable Connect l2_dataram_wr_i to the write enable pin of each RAM used.
Write enable is also known as global write enable.

Address Connect l2_dataram_addr_i to the address pins of each RAM.

The number of bits to be used is dependent on the cache size, see
Table 4-22.
Table 4-22 show the L2 data RAM address connections.

Table 4-22 L2 data RAM address connections

Cache size Address connections

128KB l2_dataram_addr_i[10:0]

256KB l2_dataram_addr_i[11:0]

512KB l2_dataram_addr_i[12:0]

1024KB l2_dataram_addr_i[13:0]

Read data Connect l2_dataram_rdata_o[(n64)-1:(n-1)64] to the output data pins

of way n of your RAM, where n is 0-7.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-18
ID041213 Confidential
Memory Integration

Write data Connect l2_dataram_wdata_i[(n64)-1:(n-1)64] to the input data pins

of way n of your RAM, where n is 0-7.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-19
ID041213 Confidential
Memory Integration

4.5 Flow for memory integration

You perform memory integration at the Cortex-A7 MPCore top level. The following sections
describe memory integration:
• Producing byte-write memory from word-write RAM.
• Producing a large memory from smaller RAM blocks.
• Memory integration on page 4-21.

Note
• Before you perform memory integration, you must configure your RTL, as described in
Chapter 3 Configuration Guidelines.

• When you have integrated your RAM blocks you must validate your memory integration.
See Confirmation of memory integration on page 4-22.

4.5.1 Producing byte-write memory from word-write RAM

If you do not have memories with byte-write control, you must construct these blocks using, for
example, four byte-wide RAM blocks to achieve a RAM word size of 32 bits. The rules for
connecting the four RAM blocks are:

• Each byte-wide RAM has the same address and chip select controls as the word-wide
RAM.

• The word write enable signal is left unconnected.

• One bit of the byte-write control signal connects to the write-enable pin of each of the
byte-wide RAM blocks. For example, bit 0 connects to the RAM representing byte 0.

• Data input and output signals [7:0] connect to the data input and output pins of the RAM
representing byte 0, and data input and output signals [15:8] connect to the data input and
output pins of the RAM representing byte 1, for example.

4.5.2 Producing a large memory from smaller RAM blocks

You might have to create a large memory out of smaller RAM blocks, for one or more of the
following reasons:
• Your RAM compiler cannot produce a RAM of the required size.
• A single large RAM is too slow for your performance requirements.
• A single large RAM does not fit into your floorplan.

The rules for producing a memory out of smaller RAM blocks are:

• The number of RAM blocks, b, must be to a power of 2, for example b = 2, 4, or 8. If the

address width of the required memory size is n bits, the following equation gives the width
of the address port of the smaller RAM blocks, m:
log b
m=n-
log 2

For example, if you create a RAM out of two smaller RAM blocks, b =2, and the required
address width for that memory size is 10 bits, n =10, then the address width, m, of the two
smaller RAM blocks is 9 bits. Address bits [(m-1):0] apply to all the RAM blocks.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-20
ID041213 Confidential
Memory Integration

• You must ensure that only the addressed RAM is enabled, by performing a b bit decode
of the [(n-1):m] address bits and ANDing these with the RAM enable control signal. In
the above example, you must apply address bits [8:0] to the two RAM blocks, and AND
a 2-bit decode of address bit [9] with the RAM enable to create two RAM enable signals,
that is:
assign RAMEnable_0 = ~Addr[9] & RAMEnable;
assign RAMEnable_1 = Addr[9] & RAMEnable;

• You must connect RAMEnable_0 to the RAM enable port of the first RAM block, and you
must connect RAMEnable_0 to RAMEnable_1 to the RAM enable port of the second RAM
block.

The approach is exactly the same for any memory that you construct from smaller RAM blocks.

4.5.3 Memory integration

To perform memory integration:

1. Go to the logical/models/rams/ directory:

cd logical/models/rams

2. Copy the generic RAM interface module and the RAM arrays into a new directory where
you will integrate your RAMs, and go to this directory:
cp -r generic <my_ram_dir>
cd <my_ram_dir>

3. Blocks for memory integration on page 4-7 describes how you connect the RAM for each
cache size. Use this to identify the RAM blocks that you require and then generate them
using your library RAM generator.

4. Integrate your RAM blocks into each of the modules, using the organization described in
Blocks for memory integration on page 4-7. All RAM control signals are driven active
HIGH. If your RAM blocks have active LOW control inputs you must invert the RAM
write enable pin, WE, and all the other control signals.

5. Check that the memory integration is correct by running the RAM integration testbench
described in RAM integration testbench on page 4-22.

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-21
ID041213 Confidential
Memory Integration

4.6 Confirmation of memory integration

This section describes how to validate your memory integration. A specific RAM integration
testbench is located in the implementation_<technology>/CORTEXA7_RAMtestbench/ directory.

Note
You must run the RAM validation testbench described in RAM integration testbench as part of
the sign-off criteria.

4.6.1 RAM integration testbench

ARM provides a testbench that checks that your RAM blocks are correctly integrated. Before
running the testbench you must have successfully written all your RAM models and configured
your RTL.

A Cortex-A7 MPCore design requires:

• One ca7caches_tlb_rams module. Edit the file to set the required size as described in
Configuring the L1 cache sizes on page 4-9.
• One ca7_scu_l1d_tagrams module.

If L2_CACHE_PRESENT is set, a Cortex-A7 MPCore design also requires:

• One ca7_l2_tagrams module.
• One ca7_l2_datarams module.

4.6.2 Running the RAM integration testbench

To run the RAM integration testbench:

1. Go to implementation_<technology>/CORTEXA7_RAMtestbench.

2. Edit CORTEXA7_cache_ram_testbench.vc to point to the directory in which you performed

RAM integration. Change generic in the following -y line to that directory:
-y ../../../logical/models/rams/generic

3. Edit CORTEXA7_cache_ram_testbench.vc to point to the directory containing the Verilog

models for your library RAMs if this is different to the directory in step 2. Change
<ram_views> in the following line to that directory:
-y <ram_views>

Note
The default directory settings for steps 2 and 3 simulate the example ARM RAMs. If you
want to debug any errors with your RAM integration, you can simulate the ARM RAMs
as a golden reference, otherwise change these paths to point to your modified source.

4. Ensure that you set the path to your simulator.

5. Running the RAM integration testbench depends on the simulator and the version used.
The testbench can be run using a common Makefile, available in the current directory:
• make mti
• make vcs
• make ius
The Makefile provides a clean command to remove any unwanted directories:
make clean

ARM DII 0256F Copyright © 2011-2013 ARM. All rights reserved. 4-22
ID041213 Confidential
Memory Integration

6. Check the simulation report.

Make sure that the configuration was correctly imported by checking the first lines of the
report.

Two RAM integration tests are performed for each RAM. Either or both of these tests will fail
if any integration errors occur. The status of each of the tests is reported in the testbench
summary. To assist you in debugging, the functionality of these tests are as follows:

Word test This test writes unique data to each RAM location, then reads back each location
and checks that the read data matches against that expected. The purpose of this
test is to check that the instantiated RAM is the correct size, and to check that the
address connections are correct. All RAM enables are driven HIGH throughout
this test, and all write enables and byte write enables are driven HIGH during
writes and LOW during reads, so the test does not detect shorts between these
signals, only opens.
If the read data is x, this indicates that one or more of the RAM signals is either
open or miswired.

Bit test This test walks a '1' across the RAM width, checking that all data in and out
connections are correct. This is done in two passes:
• The first pass drives all enables HIGH so that data in and out connectivity
can be tested.
• The second pass only asserts the enable for the portion that is being written
to or read from. This detects shorts between enables. A failure pattern is
driven to the RAM portions not being written to.
The read address is driven to '0' during this test. The write address is driven to a
value of 2 greater than the RAM size, which aliases to '0' if the RAM is the correct
size. If the RAM is too large, the read and write addresses differ and the test
correctly fails.

Testbench report examples

The testbench is self-checking. If your RAM integration is successful, the simulation completes
with the following message. This is a snapshot running the golden reference.
Summary
=======
CortexA7 Configuration
----------------------
Num CPU = 2
L2 Present = YES

SCU RAM Integration Tests

-------------------------
L1D SCU Processor 0 = 32kB
L1D SCU Processor 1 = 32kB
L2 SCU = 256K

L1D Processor 0 word test passed

L1D Processor 0 bit test passed
L1D Processor 1 word test passed
L1D Processor 1 bit test passed
L2 TAG word test passed
L2 TAG bit test passed
L2 DATA word test passed
L2 DATA bit test passed

SCU RAM integration passed.

Processor 0 Cache Integration Tests

------------------------------
I-Cache size = 32kB
D-Cache size = 32kB

D-Cache Data word test passed

Processor 0 Cache RAM integration passed.

Processor 1 Cache Integration Tests

------------------------------
I-Cache size = 32kB
D-Cache size = 32kB

D-Cache Data word test passed

D-Cache Data bit test passed
D-Cache Tag word test passed
D-Cache Tag bit test passed
D-Cache Dirty word test passed
D-Cache Dirty bit test passed
I-Cache Data word test passed
I-Cache Data bit test passed
I-Cache Tag word test passed
I-Cache Tag bit test passed
TLB word test passed
TLB bit test passed
Processor 1 Cache RAM integration passed.
========================================
Test completed with no errors
========================================

If your RAM integration is unsuccessful, the simulation completes and reports which RAMs
failed integration. For example, if the DDataRAM fails, the summary shows:
Summary
=======

CortexA7 Configuration
----------------------
Num CPU = 2
L2 Present = YES

SCU RAM Integration Tests

-------------------------
L1D SCU Processor 0 = 32kB
L1D SCU Processor 1 = 32kB
L2 SCU = 512K

L1D Processor 0 word test passed

L1D Processor 0 bit test passed
L1D Processor 1 word test passed
L1D Processor 1 bit test passed

L2 TAG word test passed

L2 TAG bit test passed
L2 DATA word test passed
L2 DATA bit test passed

SCU RAM integration passed.

Processor 0 Cache Integration Tests

------------------------------
I-Cache size = 32kB
D-Cache size = 32kB

D-Cache Data word test passed

D-Cache Data bit test FAILED
D-Cache Tag word test passed
D-Cache Tag bit test passed
D-Cache Dirty word test passed
D-Cache Dirty bit test passed
I-Cache Data word test passed
I-Cache Data bit test passed
I-Cache Tag word test passed
I-Cache Tag bit test passed
TLB word test passed
TLB bit test passed

!!! Processor 0 Cache RAM integration FAILED. Test completed with errors !!!

The simulation also reports expected and actual RAM read data for the failing RAM to assist
you in debugging any errors.

Note
For D-Cache Data types of error, the expected and actual results are those from the testbench,
not necessarily those written to the RAM.

4.7 Outputs from memory integration

The outputs from memory integration are:

• Your configured RTL.

• The reports from the memory integration testbench. See Confirmation of memory
integration on page 4-22.

This chapter describes how to validate the Cortex-A7 MPCore processor using test vectors. It
contains the following sections:
• About RTL validation on page 5-2.
• Resource requirements for RTL validation on page 5-3.
• Controls and constraints for RTL validation on page 5-4.
• Inputs for RTL validation on page 5-5.
• Flow for RTL validation on page 5-7.
• Outputs from RTL validation on page 5-9.
• Reference data for RTL validation on page 5-10.

5.1 About RTL validation

This chapter describes using test vectors in the supplied simulation environment to provide
limited validation of your RTL. You must validate the RTL supplied by ARM, before you start
your RTL configuration, to check that you have unpacked it correctly. ARM recommends that
you also use the simulation environment to validate your configured RTL. Figure 5-1 shows the
top-level inputs, resources, outputs, and controls and constraints for this RTL validation.

Controls and constraints:

Configuration options and files
Simulation options

Inputs: Outputs:
Delivered RTL, or RTL Validation Validated delivered RTL or configured RTL
configured RTL Logs and reports

Resources:
Simulation environment
Simulation testbench
Compute resources
Test suites
Scripts
Test vectors

Figure 5-1 RTL validation process

Caution
The validation described in this chapter only checks that you have successfully unpacked the
RTL delivered by ARM, and enables you to check your configuration of the RTL. It does not
validate your synthesized RTL, which must pass:
• Logical verification.
• Timing verification.
• Characterization.

See the reference methodology documents supplied by your EDA tool vendor for information
on these processes.

Using the supplied simulation environment, you can:

• Compile and run code on your configured RTL, while capturing the vectors. ARM
supplies tests you can run in the simulation environment, and you can also write your own
tests.

• Replay captured vectors.

5.2 Resource requirements for RTL validation

To run the RTL validation, the Cortex-A7 MPCore integration kit testbench must be present.
The integration kit is located in the cortexa7_intkit directory. See the Cortex-A7 MPCore
Integration Manual for more information about the integration kit. These are in the following
subdirectories of implementation_<technology>/CORTEXA7INTEGRATION_vectors:

vectors.cfg Edit this file before running the testbench to correctly set up everything.

Makefile Standard makefile to run the replay script.

tools/ Contains the script to replay the crf files.

verilog/ Contains the vector replay testbench.

crf/ Contains the test vector files in CRF format, compressed using the gzip command
line utility tool. The CRF vectors contain stimulus and expected response values
for the input and output ports of the macrocell respectively.

The RTL validation methodology consists of:

Capture Creates test vectors in accordance with the settings displayed in the configuration
list.

Replay Runs the vectors on the RTL using the same configuration list.

5.3 Controls and constraints for RTL validation

You must edit two files, vectors.cfg and dotcshrc, and source dotcshrc to create various
environment variables. To do this:

1. Go to the correct location:

cd implementation_<technology>/CORTEXA7INTEGRATION_vectors

2. Edit the configuration file, vectors.cfg, to include the correct configuration settings. See
vectors.cfg on page 5-5.

3. Edit dotcshrc to set the environment variables for the Cortex-A7 MPCore integration kit
correctly, and then source the file. See dotcshrc on page 5-6.
source dotcshrc

4. Ensure that you set the path to your simulator.

5.4 Inputs for RTL validation

There are two inputs for RTL validation:
• vectors.cfg.
• dotcshrc on page 5-6.

5.4.1 vectors.cfg

Use the options in vectors.cfg to set the correct configuration. The options are described in:
• Generic configuration.
• Replay configuration on page 5-6.

Generic configuration

Table 5-1 shows the options used by the capture stage. The replay stage uses these options when
MODEL is set to RTL. When MODEL is set to NETLIST, make sure the configuration variables match
those used for the netlist.

Table 5-1 Generic configuration options

Option Description

L2_CACHE_PRESENT Include L2 cache

L2_CACHE_SIZE Selects L2 cache size

L2_LATENCY Selects L2 data RAM latency

ETM_PRESENT Include CTI and ETM for each processor in the multiprocessor device.a

NUM_CPUS Number of processors included in the multiprocessor device..

NUM_SPIS Number of Interrupts. Distributor Interrupt Lines 0 <= NUM_INTS <= 480 in steps of 32.

GIC_PRESENT Include the integrated GIC

L1_ICACHE_SIZE Selects I-Cache size for any processor in the multiprocessor device.

L1_DCACHE_SIZE Selects D-Cache size for any processor in the multiprocessor device.

FPU FPU included for all processors present.

NEON NEON included for all processors present.

TESTS List of tests in array format with comma separator.

a. The Cortex-A7 MPCore processor vector capture and replay facility does not test any ETM functionality. See the
CoreSight SOC User Guide for more information.

For more information on each option except TESTS, see the Cortex-A7 MPCore Technical
Reference Manual. For more information on the tests in the TESTS variable, see the Cortex-A7
MPCore Integration Manual.

Replay configuration

Table 5-2 shows the options used by replay.

Table 5-2 Variables in vectors.cfg

Variable name Default value Comment

MODEL "RTL" Specifies replay. You must set this to RTL for RTL validation.

RTL_LOCATION "../../logical" Specifies location of the etma7 logical directory.

NETLISTa "<add netlist path>/<add netlist name>" Specifies netlist location.

STD_CELLSa "<add Standard cells path>" Specifies Standard cell library location.

RAMa "<add RAM path>" Specifies RAM library location.

SDF_PRESENTa “FALSE” Specifies if a back-annotated timing file is used.

SDFa "<sdf file name>" Specifies the .sdf file name.

DUMPVCD "FALSE" Specifies if a VCD file is dumped during replay.

DUMPVCD_START “<Start time>” Specifies the VCD file dump start time during replay. If DUMPVCD is
set to TRUE, set this variable to the cycle you want the VCD file to
start to populate during replay.

DUMPVCD_CYCLES “<Number of cycles>” Specifies the number of cycles the VCD file is populated for. If
DUMPVCD is set to TRUE, set this variable to the number of cycles the
VCD file is populated for.

a. Not used during RTL validation.

5.4.2 dotcshrc

The dotcshrc file is primarily used during the capture stage. You must source this file as
described in Controls and constraints for RTL validation on page 5-4 before execution to set
various environment variables. Table 5-3 shows the options.

Table 5-3 dotcshrc options

Option Description

IK_LOCATION Only used by the capture stage to set the correct location of the integration kit. It must be an absolute path.

IK_SIMULATOR Three simulators are supported:

MTI Mentor Modelsim.
VCS Synopsys VCS.
IUS Cadence IUS.
This option is used by both the capture stage and the replay stage.

IK_PLATFORM_64 Used by capture stage to set ARMBST libraries correctly.

IK_OS Used by the capture stage to set ARMBST libraries correctly. Only change this option if the OS of the machine used
to build the integration kit is different from the OS of the machine used to run the integration kit.

5.5 Flow for RTL validation

This section describes how to build the simulation environment and capture or replay vectors
on your Cortex-A7 MPCore RTL. It contains the following sections:
• Capture.
• Replay.
• Makefile.
• Debug options in the Makefile on page 5-8.
• Validating RTL configuration on page 5-8.

5.5.1 Capture

To capture the test vectors, use the Cortex-A7 MPCore integration kit.

The perl script ./tools/capture reads vectors.cfg. It runs all the checks necessary to ensure the
environment is correct. The script also edits the files in the integration kit to prepare the
environment for ikvalidate. After the environment is prepared, ikvalidate is run from the
integration kit directory.

All the commands available to ikvalidate are also available here. For more information on
ikvalidate, see the Cortex-A7 MPCore Integration Manual.

At the end of the ikvalidate run, it moves the crf files from the integration kit location to the
crf directory. It also reinstates the original files in the integration kit.

Running capture depends on the simulator and the version used.

The Makefile enables you to run capture or replay separately, or both together.

You can add verbosity with a -v option.

perl -w ./tools/capture -v

For information on how to debug failures that result from running ikvalidate, see the Cortex-A7
MPCore Integration Manual.

5.5.2 Replay

The perl script ./tools/replay provided reads the vectors.cfg file to set all the necessary
variables. It replays all the crf set in the test list available from vectors.cfg.

Running replay depends on the simulator and the version used.

The Makefile enables you to run capture or replay separately, or both together.

To add verbose information, you can edit the Makefile to include a -v option to the perl script,
for example:
perl -w ./tools/replay -v

5.5.3 Makefile

The Makefile enables you to run capture or replay separately, or both together.

To run capture:
make capture

To run replay:

make replay

To run both capture and replay:

make all

To remove working files and directories:

make clean

To remove working files and directories, generated crf files, and logs:
make cleanall

5.5.4 Debug options in the Makefile

Note
Only use these debug options if the replay stage is not simulating correctly.

During the replay stage, various debug options are available in the Makefile.

By default, the replay stage creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run replay_novc:
make replay_novc

Note
A file called replay.vc must exist when this option is used.

By default, the replay stage compiles and simulates. If you only require a compile stage, you
can run compile instead:
make compile

By default, the compile option creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run compile_novc:
make compile_novc

Note
A file called replay.vc must exist when this option is used.

5.5.5 Validating RTL configuration

The validation testbench automatically detects your RTL configuration. Therefore, using the
testbench to validate your configuration ensures you have a safe database before you start your
implementation.

5.6 Outputs from RTL validation

The log files output are specific to your verification tool. By default, the results of test vector
replay are output to the terminal display. You can redirect this output into a log file if required.

5.7 Reference data for RTL validation

This section describes test vector files compression.

5.7.1 The test vector classes

Table 5-4 shows the classes of test vectors supported by the simulation environment.

Table 5-4 Test vector classes

Vector class Description

ca7_dbg_functional Checks the basic functionality of integer operations on the multiprocessor device

ca7_vfp_functional Checks the basic functionality of floating point operations on the multiprocessor device

ca7_advsimd_functional Checks the basic functionality of NEON Advanced SIMD operations on the multiprocessor device

ca7_cross_trigger_functional Checks the basic functionality of the CTI interface

ca7_max_power Vector designed to draw maximum power from the multiprocessor device under test

ca7_power_indicative Runs the Dhrystone 2.1 open source benchmark program on a single processor in a multiprocessor
device

The tests are compatible with all valid configurations of the Cortex-A7 MPCore processor apart
from:

• ca7_vfp_functional which requires the processor to be configured with either the Floating
Point Unit or NEON Media Processing Engine.

• ca7_advsimd_functional which requires the processor to be configured with the NEON

Media Processing Engine.

• ca7_cross_trigger_functional which requires the processor to be configured with the

ETM.

When these tests are run on a processor without the appropriate functionality, the tests are
skipped although the test still reports a pass.

See the Cortex-A7 MPCore Integration Manual for a detailed description of these tests.

Note
The number of vectors applied can vary according to the configuration of the Cortex-A7
MPCore processor.

5.7.2 Test vector files compression

All test vector files produced by the capture phase of the flow are compressed using the gzip
command line utility tool. The scripts provided automatically decompress these files during the
replay phase of the flow.

This chapter describes the floorplan used as a starting point for your design. It contains the
following sections:
• About floorplanning on page 6-2.
• Resource requirements for floorplans on page 6-3.
• Controls and constraints for floorplans on page 6-4.
• Inputs for floorplans on page 6-5.
• Considerations for floorplans on page 6-6.
• Outputs from floorplans on page 6-8.
• Reference data for floorplans on page 6-9.

6.1 About floorplanning

Figure 6-1 shows the top-level inputs, resources, outputs, and controls and constraints for
floorplanning.

Controls and constraints:

Pin placement
Area
Aspect ratio
Power distribution
Process requirements

Inputs: Outputs:
Example floorplans Floorplanning Floorplan
Block placement guidelines Reports and logs

Resources:
Floorplanning tool

Figure 6-1 Floorplanning process

6.2 Resource requirements for floorplans

This guide assumes that you have suitable EDA tools and compute resources for floorplanning.

6.3 Controls and constraints for floorplans

The following are controls and constraints that can influence floorplanning:
• Pin placement
• Area
• Aspect ratio
• Power distribution
• Process requirements.

6.4 Inputs for floorplans

Inputs are specific to your floorplanning tool, and can include the following:
• Example floorplans.
• Block placement guidelines.
• Pin placement.
• Power distribution.
• Placement blockages.

6.5 Considerations for floorplans

Your processor floorplan is strongly influenced by the size of the RAM blocks, and by the
position of the pins on these RAM blocks. You must take care in the placement of the RAM
blocks because they occupy a large area of the processor floorplan and require significant
routing resource. The placement of the RAM blocks also affects the optimum standard cell
placement. ARM recommends you use a hierarchical approach to Cortex-A7 MPCore
floorplanning and implementation.

Figure 6-2 shows an example Cortex-A7 MPCore floorplan of a single processor.

Processor
Data
Tag Data
Dirty Instruction L1TLB
RAM TAG
RAM
RAM

L1 Data Cache L1 Instruction Cache

Data RAM Data RAM

Figure 6-2 Cortex-A7 MPCore floorplan of a single processor

Figure 6-3 on page 6-7 shows an example Cortex-A7 MPCore floorplan including two
processors in the multiprocessor device.

Processor 0

SCU and
L2 tag SCU
L2 data RAM processor Pins
RAM tag
integration layer (PIL)

Processor 1

Figure 6-3 Example Cortex-A7 MPCore floorplan

Note
• The aspect ratios shown for memories are arbitrary and you might not be able to achieve
these using your memory compiler.

• You might have to adjust the floorplan if your technology prevents you from routing over
the RAM blocks.

• Clock modules must have placement constraints that prevent these modules being spread
around the design.

6.6 Outputs from floorplans

Although output files are specific to your floorplanning these can be:
• Floorplan for replay.
• Fixed pin locations.
• Route guides or blockages.
• Placement blockages.
• Initial cell placement.
• Propriety format for your synthesis tool.

6.7 Reference data for floorplans

The following sections provide reference information for floorplanning:
• Standard cell floorplanning.
• RAM inputs and memory block floorplanning.
• Interface floorplanning on page 6-10.
• Power grid design on page 6-10.
• Power-gated design on page 6-11.

6.7.1 Standard cell floorplanning

Floorplanning does not place the standard cells. ARM expects the synthesis tool to place the
standard cells using a flat, physically unconstrained, placement methodology.

The SCU communicates with the L2 cache, so ARM recommends you place these modules
close together. See Figure 6-3 on page 6-7.

You must place the data cache tag RAM and data cache data RAM close to the data cache logic.
In particular, data cache tag RAM must be close to the ca7dcu module.

ARM recommends you put some bounds on clock modules to avoid too much dispersion of the
clock module logic.

6.7.2 RAM inputs and memory block floorplanning

The following subsections provides floorplanning guidelines for:

• RAM inputs.
• Instruction and data cache RAM blocks.
• Tag and dirty cache RAM blocks.
• TLB RAM blocks on page 6-10.

RAM inputs

The RAM blocks that are most critical to floorplan are:

• Larger RAM blocks with a larger address set-up.
• RAM blocks that are further away from the standard cell area.

Instruction and data cache RAM blocks

Because the timing paths from these blocks are critical, these blocks must be placed as close as
possible to the standard cell area.

Tag and dirty cache RAM blocks

ARM recommends you:

• Place the tag RAM blocks close to corresponding cache data RAM blocks.
• Place the data cache dirty RAM block close to the data cache data RAM block.

Although the address set-up and data out time on the tag and dirty RAM blocks is small
compared to the times for the main memories, the output is required earlier.

If necessary, you can place the dirty RAM blocks further from the standard cell area than the tag
RAM blocks because:

• They are logically smaller so the address set-up time is smaller

• Data returning from dirty RAM blocks does not go through comparators, unlike the tag
RAM blocks.

TLB RAM blocks

The TLB blocks interface with the instruction cache side, data cache side and the Bus Interface
Unit (BIU). Place these as Figure 6-3 on page 6-7 shows. This keeps the associated logic close
to the BIU and does not effect other critical paths in the design.

6.7.3 Interface floorplanning

The following subsections provide floorplanning guidelines and pin information:

• Placement of clocks, resets, and interrupts.
• Trace interface signals.
• Wide pins.

Note
All other signals in the design are not timing-critical. You can place them at locations optimal
for the SoC floorplan.

Placement of clocks, resets, and interrupts

You must place the clock, resets, and interrupts in the center of the floorplan. For details of the
clock, reset and interrupt signals see:
• Clock signals on page 1-4.
• Reset signals on page 1-5.
• Trace interfaces on page 1-9.

Trace interface signals

Ports relating to the ETM are not timing critical. However, the placement of the ports influences
the placement of the standard cells in the processor. See Trace interfaces on page 1-9.

Note
When placing the ETM ports, you must not compromise the location of the timing-critical ports.

Wide pins

For purposes of rotating the macrocell when the processor is included as a black box in a design,
ARM recommends that you either:
• Make the pins wider, so the router can drop vias on top of them or next to them.
• Create multi-layer pins.

6.7.4 Power grid design

ARM recommends that you incorporate the power grid in the floorplan passed to your synthesis
tool so that the tool has a more accurate representation of the available routing resource.

You must design the grid to meet the requirements of your library. However, ARM recommends
a grid that satisfies an IR drop of 2%, VDD and VSS combined.

6.7.5 Power-gated design

Power-gating a design creates additional requirements for the power grid and standard cell
placement. A power-gated floorplan has physical regions for each of the power domains. The
floorplanning implementation stage inserts power switch cells to supply the power rails of each
power-gated domain. Where the implementation reference methodology includes power-gating,
a UPF file and consistent floorplanning scripts are provided. Any change in the specification for
power intent requires you to update the floorplan and power grid.

This chapter describes production testing for the processor. It contains the following sections:
• About design for test features on page 7-2.
• Reference data for DFT on page 7-3.

7.1 About design for test features

The Cortex-A7 MPCore processor is a fully synchronous design with all registers clocked off
the rising clock edge, therefore you can achieve a very high test coverage.

Use your standard implementation reference methodology to generate test patterns. See the
supplied implementation reference methodology documents for more information.

See the documentation from your EDA tool vendor for information about:
• The requirements for DFT.
• The DFT controls and constraints.
• The DFT features.
• Confirmation of DFT feature operation, and test coverage.
• Solving DFT feature problems.

7.2 Reference data for DFT

The following sections give reference data for DFT:
• Scan chain insertion.
• Test wrapper insertion.

7.2.1 Scan chain insertion

Table 7-1 shows the scan test ports that access the internal scan chains in the macrocell.

Table 7-1 Scan test ports

Name Direction Description

DFTSE Input Enables Scan chains. High fan out of this signal means this must be a false path, and cannot be
switched at-speed. This signal must be tied LOW during functional mode.

DFTRSTDISABLE Input Enables test tools that might not understand a pipelined reset to bypass reset repeaters.
This makes the internal resets single cycle signals, although only at low frequencies.
The DFTRSTDISABLE signal blocks internally generated resets when set to 1. When set to 0
this signal enables generated resets to propagate. During test, this signal only has to be blocked
during scan shift. This prevents resets propagating while scan chains are shifted, and permits
ATPG to test the related logic when not shifting.

DFTRAMHOLD Input Enables the RAMs to hold data by disabling the chip select to the RAMs when the signal is
asserted. This permits:
• RAMs to maintain values during tests like IddQ.
• Testing shadow logic of the RAMs by testing through the RAMs.
ATPG tools prefer the data in the RAMs to be static during shift. In this scenario,
DFTRAMHOLD must be enabled during shift and disabled during capture.

Figure 7-1 shows how to use DFTRAMHOLD to disable chip selects to the RAMs.

DFTRAMHOLD
RAM chip select
Functional chip select

Figure 7-1 RAM chip select

Note
See the supplied implementation reference methodology documentation for details of the scan
ports.

7.2.2 Test wrapper insertion

The synthesis tools supplied by your EDA tool vendor might enable you to insert a test wrapper
that you can use to gain access to the inputs and outputs of the processor in production test. If
your tools support test wrapper insertion you can choose whether or not to implement this
wrapper.

A test wrapper gives increased test coverage when access to the primary inputs and outputs is
impossible because the macrocell is deeply embedded in your SoC design.

Most of the macrocell inputs and outputs are registered.

Note
See the supplied implementation reference methodology documentation for details of the test
wrapper ports.

This chapter describes the dynamic verification process. It contains the following sections:
• About dynamic verification on page 8-2.
• Resource requirements for dynamic verification on page 8-3.
• Controls and constraints for dynamic verification on page 8-4.
• Inputs for dynamic verification on page 8-5.
• Flow for dynamic verification on page 8-7.
• Outputs from dynamic verification on page 8-10.
• Confirmation of dynamic verification on page 8-11.
• Measuring power consumption on page 8-12.

8.1 About dynamic verification

This chapter describes how you can verify your netlist using vector replay.

Note
Typically, your contract with ARM requires you to use vector replay and a Logical Equivalence
Checking (LEC) tool to validate your implementation, see Chapter 9 Sign-off. Equivalence
checking tools use formal mathematical techniques to verify logic functions between two
implementations of a design. This shows whether the design functionality is consistent between
the two implementations. You must maintain the functionality of the configured macrocell at
each stage of the design process. You can use equivalence checkers to verify that the RTL
functionality is maintained through successive iterations of the netlist, by a process of building,
mapping, and comparing the design. See the reference methodology documents supplied by
your EDA tool vendor for details of the LEC tools.

Verification of the netlist by vector replay requires running CRF test vectors on your netlist
captured from an RTL reference. This provides a quick method of checking the netlist, and can
be used in addition to formal equivalence checking. The supplied vectors are captured using the
RTL as a reference, so replaying the vectors checks that your netlist matches the cycle-by-cycle
of the RTL reference. Dynamic verification of the netlist uses the same vectors, flow, and tools
as the RTL validation process.

Figure 8-1 shows the top-level inputs, resources, outputs, and controls and constraints for
dynamic verification of your netlist.

Controls and constraints:

Testbench options

Dynamic
Inputs: Outputs:
Verification
Netlist Reports and logs
of netlist

Resources:
CRF test vectors
HDL Simulator
Scripts
Testbench
Gate-level library

Figure 8-1 Dynamic verification of netlist process

The test vectors do not completely cover the functionality of the processor. Therefore, dynamic
verification of the netlist is not an adequate check for design sign-off.

Note
• In addition to passing dynamic verification, your netlist must pass:
— LEC.
— Timing verification.
See the reference methodology documents supplied by your EDA tool vendor for
information about these processes.
• Successful validation of your RTL is a sign-off requirement, see Chapter 9 Sign-off.

8.2 Resource requirements for dynamic verification

To run the dynamic verification of the netlist, the Cortex-A7 MPCore integration kit testbench
must be present. The capture methodology relies on the integration kit to create all the necessary
test vectors. The IK is located in the cortexa7_intkit directory. See the Cortex-A7 MPCore
Integration Manual for more information about the integration kit. The dynamic verification
methodology consists of:

Capture Creates test vectors in accordance with the settings displayed in the configuration
list using the RTL as a reference.

Replay Runs the vectors on the netlist using the same configuration list.

Note
If you have run the the capture stage as part of the RTL validation described in Chapter 5 RTL
Validation using the same configuration as the one used for the netlist under test, you can ignore
the capture stage.

8.3 Controls and constraints for dynamic verification

You must edit two files, vectors.cfg and dotcshrc, and source dotcshrc to create various
environment variables. To do this:

1. Go to the correct location:

cd implementation_<technology>/CORTEXA7INTEGRATION_vectors

2. Edit the configuration file, vectors.cfg, to correct the configuration settings. The
configuration settings must be the same as those used to generate the netlist under test.
See vectors.cfg on page 8-5.

3. Edit dotcshrc to set the environment variables for the integration kit correctly, and then
source the file. See dotcshrc on page 8-6.
source dotcshrc

4. Ensure that you set the path to your simulator.

8.4 Inputs for dynamic verification

There are two inputs for RTL validation:
• vectors.cfg.
• dotcshrc on page 8-6.

8.4.1 vectors.cfg

Use the options in vectors.cfg to set the correct configuration. The options are described in:
• Generic configuration.
• Replay configuration on page 8-6.

Generic configuration

Table 8-1 shows the options used by capture. Replay uses these options when MODEL is set to RTL.
When MODEL is set to NETLIST, make sure the configuration variables match those used for the
netlist.

Table 8-1 Generic configuration options

Option Description

L2_CACHE_PRESENT Include L2 cache

L2_CACHE_SIZE Selects L2 cache size

L2_LATENCY Selects L2 data RAM latency.

ETM_PRESENT Include CTI and ETM for each processor in the multiprocessor device.a

NUM_CPUS Number of processors included in the multiprocessor device..

NUM_SPIS Number of Interrupts. Distributor Interrupt Lines 0 <= NUM_INTS <= 480 in steps of 32.

GIC_PRESENT Include the integrated GIC

L1_ICACHE_SIZE Selects L1 instruction cache size for any processor in the multiprocessor device.

L1_DCACHE_SIZE Selects L1 data cache size for any processor in the multiprocessor device.

FPU_PRESENT FPU included for Processor <n> where <n> is 0-3.

NEON_PRESENT NEON included for Processor <n> where <n> is 0-3.

TESTS List of tests in array format with comma separator.

a. The Cortex-A7 MPCore processor vector capture and replay facility does not test any ETM functionality. See
the CoreSight SOC User Guide for more information.

Replay configuration

Table 8-2 shows the options used by replay.

Table 8-2 Replay configuration options

Variable name Default value Comment

MODEL "RTL" Specifies replay. You must set this to NETLIST for dynamic
verification.

RTL_LOCATIONa "../../logical" Specifies location of the etma7 logical directory.

NETLIST "<add netlist path>/<add netlist name>" Specifies netlist location.

STD_CELLS "<add Standard cells path>" Specifies Standard cell library location.

RAM "<add RAM path>" Specifies RAM library location.

SDF_PRESENT “FALSE” Specifies if a back-annotated timing file is used.

SDF "<sdf file name>" Specifies the .sdf file name.

DUMPVCD "FALSE" Specifies if a VCD file is dumped during replay.

a. Not used during dynamic verification.

8.4.2 dotcshrc

The dotcshrc file is primarily used during the capture stage. You must source this file as
described in Controls and constraints for dynamic verification on page 8-4 before execution to
set various environment variables. Table 8-3 shows the options.

Table 8-3 dotcshrc options

Option Description

IK_LOCATION Only used by the capture stage to set the correct location of the integration kit. It must be an absolute path.

IK_SIMULATOR Three simulators are supported:

MTI Mentor Modelsim.
VCS Synopsys VCS.
IUS Cadence IUS.
This option is used by both the capture stage and the replay stage.

IK_PLATFORM_64 Used by the capture stage to set ARMBST libraries correctly.

8.5 Flow for dynamic verification

This section describes how to build the simulation environment and capture or replay vectors
on your Cortex-A7 MPCore RTL and netlist. It contains the following sections:
• Capture.
• Replay.
• Makefile on page 8-8.
• Debug options in the Makefile on page 8-8.

8.5.1 Capture

Note
You only have to run the capture stage if you have not run RTL validation as described in
Chapter 5 RTL Validation, or validation has been done in a different configuration.

If you have run RTL validation and the configuration has not changed, you can replay the
available crf files directly.

To capture the test vectors, use the Cortex-A7 MPCore integration kit.

All the commands available to ikvalidate are also available here. For more information on
ikvalidate, see the Cortex-A7 MPCore Integration Manual.

At the end of the ikvalidate run, it moves the crf files from the integration kit location to the
crf directory. It also reinstates the original files in the integration kit.

Running capture depends on the simulator and the version used.

The Makefile enables you to run capture or replay separately, or both together.

You can add verbosity with a -v option.

perl -w ./tools/capture -v

8.5.2 Replay

The perl script ./tools/replay provided reads the vectors.cfg file to set all the necessary
variables. It replays all the crf set in the test list available from vectors.cfg.

The replay testbench contains a define ARM_NETLIST which is used to replay a netlist. The define
is automatically set in the replay.vc file when the MODEL variable is set to NETLIST. As a default,
16 scan chain pins are defined when you instantiate the unit under test (uut). If you have a netlist
with a different number of scan chains, you can edit the netlist as required. The define is also
used to recognize an RTL simulation, and adds the correct set of parameters in the uut.

Running replay depends on the simulator and the version used.

The Makefile enables you to run capture or replay separately, or both together.

To add verbose information, you can edit the Makefile to include a -v option to the perl script,
for example:
perl -w ./tools/replay -v

Note
If you get x-propagation problems when simulating your netlist you might have to:

• Run a two-state simulation.

• Modify your gate-level simulation library models to initialize sequential elements to a

non-x value.

• Use simulator commands to initialize all sequential elements in your netlist to a non-x
value.

8.5.3 Makefile

The Makefile enables you to run capture or replay separately, or both together.

To run capture:
make capture

To run replay:
make replay

To run both capture and replay:

make all

To remove working files and directories:

make clean

To remove working files and directories, generated crf files, and logs:
make cleanall

8.5.4 Debug options in the Makefile

Note
Only use these debug options if the replay stage is not simulating correctly.

During the replay stage, various debug options are available in the Makefile.

By default, the replay stage creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run replay_novc:
make replay_novc

Note
A file called replay.vc must exist when this option is used.

By default, the replay stage compiles and simulates. If you only require a compile stage, you
can run compile instead:
make compile

By default, the compile option creates an automated vc file called replay.vc. If this file is not
suitable and you require a customized version, you can run compile_novc:
make compile_novc

Note
A file called replay.vc must exist when this option is used.

8.6 Outputs from dynamic verification

8.7 Confirmation of dynamic verification

To confirm that all the supplied crf test vectors ran successfully on your netlist, check that each
simulation report ends with a 0 errors found message.

Note
The number of vectors applied can vary according to the configuration of the Cortex-A7
MPCore processor.

8.8 Measuring power consumption

This section describes how to measure the power consumption using ca7_power_indicative and
ca7_max_power tests.

8.8.1 ca7_power_indicative.s

To measure the power consumption you have to identify the repeated Dhrystone loops:

1. In the logical/cortexa7_intkit/validation/ikvalidate.cfg file, enable Tarmac

disassembly by setting the $Tarmac_disass parameter to TRUE:
$Tarmac_disass = "TRUE";

2. Use the make capture command to execute the power_indicative test. This generates the
logical/cortexa7_intkit/validation/logs/ca7_power_indicative/tarmac_cluster0_cpu0.
log file.

3. In the tarmac_cluster0_cpu0.log file, search the tarmac log file for 0xc9 to find an
instruction of the form MOV r0,#0xc9.

4. After the MOV r0,#0xc9 instruction, search for the next BL instruction, for example
BL {pc}-0xc4. This branch indicates the start and endpoint for each Dhrystone loop.

Note
ARM recommends you use the fourth iteration for the measurement. Do not use the first
one or two loop iterations for the measurements, since the caches are still loading.

5. Place the starting timestamp and loop length in the vectors.cfg file. Use the parameters:
DUMPVCD = "TRUE" ;
DUMPVCD_START = "" ; This time is in nanoseconds
DUMPVCD_CYCLES = "" ; This is the number of cycles for the power loop

6. For the power measurement, you can use vector replay to generate a .vcd file.

The current drawn during the power_indicative vector is constant with any number of
processors because only one processor is active.

8.8.2 ca7_max_power.s

To measure the power consumption you have to identify the repeated max_power loops:

1. In the logical/cortexa7_intkit/validation/ikvalidate.cfg file, enable Tarmac

disassembly by setting the $Tarmac_disass parameter to TRUE:
$Tarmac_disass = "TRUE";

2. Use the make capture command to execute the max_power test. This generates the
logical/cortexa7_intkit/validation/logs/ca7_max_power/tarmac_cluster0_cpu0.log
file. Log files are also generated for other processors in the multiprocessor device.

3. In the tarmac_cluster0_cpu0.log file, search for the SUBS r10, r10, #1 instruction. The
loop length should be 18 cycles after the fifth iteration.

4. Place the starting timestamp and loop length in the vectors.cfg file. Use the parameters:
DUMPVCD = "TRUE" ;
DUMPVCD_START = "" ; This time is in nanoseconds
DUMPVCD_CYCLES = "" ; This is the number of cycles for the power loop

5. For the power measurement, you can use vector replay to generate a .vcd file

The expected relative value for max_power consumption is 1.9x power_indicative figure when
there is a single processor.

The current drawn during the max_power vector scales up with an increase in the number of
processors.

In addition to your normal SoC flow sign-off checks, you must satisfy additional verification
criteria before you sign off the macrocell design. This chapter describes the sign-off criteria. It
contains the following sections:
• About sign-off on page 9-2.
• Obligations for sign-off on page 9-3.
• Requirements for sign-off on page 9-4.
• Steps for sign-off on page 9-5.
• Completion of sign-off on page 9-6.

9.1 About sign-off

Figure 9-1 shows the top-level inputs, resources, outputs, and controls and constraints for
sign-off.

Controls and constraints:

Contractual requirements
Partner sign-off requirements

Inputs:
Validation reports and logs
Outputs:
Logical Equivalence Check reports and logs Sign-off
Signed off macrocell
Timing Verification reports and logs
Dynamic Verification reports and logs

Resources:
Signatories

Figure 9-1 Sign-off process

9.2 Obligations for sign-off

The appropriate authority must approve the sign-off of the design in accordance with:
• The terms of the contract with ARM.
• Any other partner sign-off requirements.

See Implementation obligations on page vi for more information.

9.3 Requirements for sign-off

Caution
When you sign-off your design you must fulfil the terms of your contract with ARM. Typically,
your contract stipulates the mandatory sign-off requirements given in this section. However, you
must check your contract for any variation to the mandatory requirements.

The following sections describe the requirements for sign-off:

• Mandatory for sign-off.
• Recommended for sign-off.

9.3.1 Mandatory for sign-off

You must complete the following implementation stages successfully for sign-off:

• Logical Equivalence Check (LEC). See the supplied implementation reference

methodology documents.

• Timing verification, by Static Timing Analysis (STA) of the post-layout netlist. See the
supplied implementation reference methodology documents.

Reports and logs from each of these stages are required for sign-off.

A certain minimum set of deliverable outputs is required at the end of the implementation. See
Completion of sign-off on page 9-6.

All ARM partners must fulfill the terms of their contract with ARM to complete sign-off.

Note
You can change the timing constraints to suit your design provided it still meets all the
mandatory requirements for sign-off.

9.3.2 Recommended for sign-off

The following stages are recommended for sign-off:

• Design Rule Check (DRC). See the documents supplied by your EDA tool vendor.

• Layout Versus Schematic (LVS). See the documents supplied by your EDA tool vendor.

• Power Analysis. See the supplied implementation reference methodology documents.

• Back-annotated netlist simulation of the ATPG vectors. See the supplied implementation
reference methodology documents.

9.4 Steps for sign-off

To sign off the Cortex-A7 MPCore processor you must meet the criteria in each of the following
stages in the design flow:
1. RTL integration.
2. Post-layout.
3. Post-place-and-route timing.

9.4.1 RTL integration

You must run the supplied test vectors on the configured RTL to verify the Cortex-A7 MPCore
RTL deliverables before you begin the synthesis stage. See Chapter 5 RTL Validation. This
confirms that you have successfully installed the Cortex-A7 MPCore processor RTL.

9.4.2 Pre-layout

You must verify the functionality of the compiled netlist before you sign off the macrocell. This
verification consists of Proving logical equivalence between the validated RTL and the
compiled netlist using formal verification tools. See the supplied implementation reference
methodology documents.

9.4.3 Post-layout

You must verify the functionality of the final placed-and-routed netlist before you sign off the
macrocell. This verification consists of proving logical equivalence between the validated RTL
and the final place-and-routed netlist using formal verification tools. See the supplied
implementation reference methodology documents.

Optionally, you can also run vector capture and replay on the compiled netlist. For more
information, see:
• Capture on page 8-7.
• Replay on page 8-7.

9.4.4 Post-place-and-route timing

You must use Static Timing Analysis (STA) to verify the timing of the final place-and-routed
netlist before you sign off your netlist. You must also run some or all of the supplied test vectors,
and run the supplied validation tests on a netlist with back-annotated timing as a final check.

9.5 Completion of sign-off

For successful completion of sign-off, you must have a number of completed and verified
ARM-related deliverables from the implementation process. These include:
• GDS II output.
• ATPG test vectors.
• Extracted timing model.
• All required reports and logs.

This appendix describes the technical changes between released issues of this book.

Table A-1 Issue A

Change Location Affects

No changes, first release - -

Table A-2 Differences between Issue A and Issue B

Change Location Affects

Updated the use of the nVFIQ[3:0] and nVIRQ[3:0] interrupt signals Table 1-11 on page 1-11 All

Clarified multicycle setup path cycles that relate to the L2 data RAM read when the L2 cache Table 1-18 on page 1-15 All
is present

Updated instruction cache tag RAMs address connection information Table 4-11 on page 4-12 All

Table A-3 Differences between Issue B and Issue C

Change Location Affects

No changes - -

Table A-4 Differences between Issue C and Issue D

Change Location Affects

Added instructions on how to measure power consumption. Measuring power consumption on page 8-12 All

Clarified that you use EDA tool vendor documentation for Recommended for sign-off on page 9-4 All
the DRC and LVS recommended sign-off stages

Table A-5 Differences between Issue D and Issue E

Change Location Affects

Clarified the clock signals description Clock signals on page 1-4 All

Clarified the note about which configuration file you use to specify the L1 Configuring the L1 cache sizes on page 4-9 All
cache sizes

Table A-6 Differences between Issue E and Issue F

Change Location Affects

Added a note that states SCU duplicate tag RAMs must always be SCU duplicate tag RAMs on page 4-16 All
instantiated

Added CTI functional test The test vector classes on page 5-10 r0p5 onwards

Updated DFTRAMHOLD description Table 7-1 on page 7-3 All

Removed inverter on the output of the AND gate Figure 7-1 on page 7-3 All

Updated the test vector summary report to include Confirmation of dynamic verification r0p5 onwards
ca7_cross_trigger_functional on page 8-11

IEEE 1149.1 JTAG Boundary Scan Standard
100% (1)
IEEE 1149.1 JTAG Boundary Scan Standard
18 pages
Fundamentals of I3C Protocol 1702799759
100% (1)
Fundamentals of I3C Protocol 1702799759
13 pages
Jetson Xavier NX Data Sheet v1.3
No ratings yet
Jetson Xavier NX Data Sheet v1.3
40 pages
Phy Ip For Pcie 3.0
No ratings yet
Phy Ip For Pcie 3.0
2 pages
I210-AS/IS Design for Engineers
No ratings yet
I210-AS/IS Design for Engineers
13 pages
DDR Sdram: A 1.8V, 700mb/s/pin, 512Mb DDR-II SDRAM With On-Die Termination and Off-Chip Driver Calibration
No ratings yet
DDR Sdram: A 1.8V, 700mb/s/pin, 512Mb DDR-II SDRAM With On-Die Termination and Off-Chip Driver Calibration
36 pages
DWC Mipi D-Phy
100% (1)
DWC Mipi D-Phy
2 pages
2-3 SSUSB DevCon LinkLayer Vining
No ratings yet
2-3 SSUSB DevCon LinkLayer Vining
54 pages
AXI Inter Connect
No ratings yet
AXI Inter Connect
32 pages
PCI Express Base 11
No ratings yet
PCI Express Base 11
508 pages
Banana Pi: Specification For PCIE-DVR: Root Complex Architecture Specification
No ratings yet
Banana Pi: Specification For PCIE-DVR: Root Complex Architecture Specification
32 pages
PCIe Base r5 0 Errata 2019-09-05
No ratings yet
PCIe Base r5 0 Errata 2019-09-05
12 pages
PCI Express: A Technical Overview
No ratings yet
PCI Express: A Technical Overview
5 pages
Exploring The Use of IP-XACT in A TLM Environment
No ratings yet
Exploring The Use of IP-XACT in A TLM Environment
69 pages
Efabless Caravel "Harness" Soc: Preliminary
No ratings yet
Efabless Caravel "Harness" Soc: Preliminary
30 pages
DDR SDRAM Presentation
No ratings yet
DDR SDRAM Presentation
21 pages
Chi A
No ratings yet
Chi A
272 pages
Axi Prot
No ratings yet
Axi Prot
273 pages
X HCI
100% (1)
X HCI
13 pages
Axi4-Stream Infrastructure Ip Suite V3.0: Logicore Ip Product Guide
No ratings yet
Axi4-Stream Infrastructure Ip Suite V3.0: Logicore Ip Product Guide
83 pages
Microchip - Ac164127-9 - Demo Board, Graphic Display, TFT LCD
No ratings yet
Microchip - Ac164127-9 - Demo Board, Graphic Display, TFT LCD
3 pages
Video Display Unit (VDU) : Analogue TV & Monitors Computer Monitors
No ratings yet
Video Display Unit (VDU) : Analogue TV & Monitors Computer Monitors
7 pages
Metastability and Synchronizers - IEEEDToct2011 PDF
No ratings yet
Metastability and Synchronizers - IEEEDToct2011 PDF
13 pages
Pcie Aer
No ratings yet
Pcie Aer
10 pages
DDR5 Vs DDR4 - All The Design Challenges & Advantages - Rambus
No ratings yet
DDR5 Vs DDR4 - All The Design Challenges & Advantages - Rambus
4 pages
Embedded System On Pci: Abstract
No ratings yet
Embedded System On Pci: Abstract
12 pages
Coresight v3 0 Architecture Specification IHI0029E
No ratings yet
Coresight v3 0 Architecture Specification IHI0029E
280 pages
Signal Intagrity Simulation of PCIE PDF
No ratings yet
Signal Intagrity Simulation of PCIE PDF
5 pages
Alif E1 Datasheet v2.5-1
No ratings yet
Alif E1 Datasheet v2.5-1
161 pages
Aquabolt-XL: Samsung HBM2-PIM With In-Memory Processing For ML Accelerators and Beyond
No ratings yet
Aquabolt-XL: Samsung HBM2-PIM With In-Memory Processing For ML Accelerators and Beyond
26 pages
The PCI Bus
No ratings yet
The PCI Bus
25 pages
PCI Express Compiler User Guide (Altera)
No ratings yet
PCI Express Compiler User Guide (Altera)
342 pages
Memory Controller
No ratings yet
Memory Controller
26 pages
Notes - Unit 5
No ratings yet
Notes - Unit 5
12 pages
PCIE PHY Test Specification 3.0
No ratings yet
PCIE PHY Test Specification 3.0
33 pages
AHB Lite Specification
No ratings yet
AHB Lite Specification
72 pages
ThunderboltDevice Driver Programming Guide by Apple
100% (2)
ThunderboltDevice Driver Programming Guide by Apple
29 pages
Eisa Bus
No ratings yet
Eisa Bus
225 pages
Introduction I2c and SPI
100% (3)
Introduction I2c and SPI
14 pages
DDR3 800-2133 Derating Theory and Implementation 11ww24.5
No ratings yet
DDR3 800-2133 Derating Theory and Implementation 11ww24.5
40 pages
PCI Express Solutions
No ratings yet
PCI Express Solutions
8 pages
High-Speed 8B/10B Encoder Design Using A Simplified Coding Table
100% (1)
High-Speed 8B/10B Encoder Design Using A Simplified Coding Table
5 pages
SR MR Iov
No ratings yet
SR MR Iov
63 pages
M2 80mm PCIeNVMe Phison PS5007 PDF
No ratings yet
M2 80mm PCIeNVMe Phison PS5007 PDF
54 pages
DDR3 Write & Read Leveling Guide
No ratings yet
DDR3 Write & Read Leveling Guide
3 pages
Xilinx Answer 73361 PCIe Link Training Debug Guide For US and US Plus
No ratings yet
Xilinx Answer 73361 PCIe Link Training Debug Guide For US and US Plus
54 pages
02 - 01 PCIe 6.0 Electrical - Update
No ratings yet
02 - 01 PCIe 6.0 Electrical - Update
46 pages
5468.amba Axi Ahb Apb
No ratings yet
5468.amba Axi Ahb Apb
2 pages
Phy Interface Pci Express Sata Usb31 Architectures PIPE - Rev6 - 2 - 1
No ratings yet
Phy Interface Pci Express Sata Usb31 Architectures PIPE - Rev6 - 2 - 1
187 pages
Arm Cortex-A77 Software Optimization Guide
No ratings yet
Arm Cortex-A77 Software Optimization Guide
68 pages
DDI0408I Cortex A9 Fpu r4p1 TRM
No ratings yet
DDI0408I Cortex A9 Fpu r4p1 TRM
27 pages
ARM Cortex-R4 Technical Reference Manual
No ratings yet
ARM Cortex-R4 Technical Reference Manual
570 pages
DDI0464F Cortex A7 Mpcore r0p5 TRM
No ratings yet
DDI0464F Cortex A7 Mpcore r0p5 TRM
269 pages
DAI0386C Cortex m4 On v2m Mps2
No ratings yet
DAI0386C Cortex m4 On v2m Mps2
24 pages
DIT0034H Cortex m7 r1p2 Iim
No ratings yet
DIT0034H Cortex m7 r1p2 Iim
344 pages
Armv8 M Processor Debug 100734 0100 0100 en
No ratings yet
Armv8 M Processor Debug 100734 0100 0100 en
20 pages
Cortex M 3 Integration and Implementation
No ratings yet
Cortex M 3 Integration and Implementation
142 pages
DDI0363G Cortex r4 r1p4 TRM
No ratings yet
DDI0363G Cortex r4 r1p4 TRM
436 pages
Cortex A9 Mpcore TRM 100486 0401 10 en
No ratings yet
Cortex A9 Mpcore TRM 100486 0401 10 en
131 pages
ARM Cortex-A8 Technical Reference Manual r1p1
No ratings yet
ARM Cortex-A8 Technical Reference Manual r1p1
730 pages
Code Timing and Object Orientation and Zombies
No ratings yet
Code Timing and Object Orientation and Zombies
42 pages
SDR in Direction Finding RFDesign 0105
100% (1)
SDR in Direction Finding RFDesign 0105
6 pages
FIA Stenotypist Computer MCQs
No ratings yet
FIA Stenotypist Computer MCQs
117 pages
Data Integration in Grid
No ratings yet
Data Integration in Grid
11 pages
Oracle Performance Tuning
100% (1)
Oracle Performance Tuning
18 pages
CPU Instruction & Addressing Guide
No ratings yet
CPU Instruction & Addressing Guide
9 pages
Computer Architecture OS
No ratings yet
Computer Architecture OS
38 pages
Book Capsules Demo
No ratings yet
Book Capsules Demo
43 pages
Microprocessor & Microcontroller Guide
No ratings yet
Microprocessor & Microcontroller Guide
66 pages
Manual Cpu s5 100-Manual
No ratings yet
Manual Cpu s5 100-Manual
69 pages
GE Fanuc IC695NKT002: RX3i Ethernet NIU Kit With Two Ethernet Modules. IC695N IC695NK IC695NKT
No ratings yet
GE Fanuc IC695NKT002: RX3i Ethernet NIU Kit With Two Ethernet Modules. IC695N IC695NK IC695NKT
13 pages
Aivr3k21 Datasheet
No ratings yet
Aivr3k21 Datasheet
11 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
OS Scheduler Types Explained
No ratings yet
OS Scheduler Types Explained
26 pages
Multithreading in Java. CPU - by Engineering Digest - Medium
No ratings yet
Multithreading in Java. CPU - by Engineering Digest - Medium
33 pages
Multiprocessors Shared Memory
No ratings yet
Multiprocessors Shared Memory
16 pages
CPU Overclocking Temperature Concerns
No ratings yet
CPU Overclocking Temperature Concerns
12 pages
02 Introduction To Microcontrollers
No ratings yet
02 Introduction To Microcontrollers
120 pages
In-Depth CPU Scheduling Analysis
No ratings yet
In-Depth CPU Scheduling Analysis
4 pages
Unit 1-15 Engleza Facultate
100% (1)
Unit 1-15 Engleza Facultate
173 pages
Programmable Logic Controllers 5th Edition Petruzella Solutions Manual Instant Download
No ratings yet
Programmable Logic Controllers 5th Edition Petruzella Solutions Manual Instant Download
96 pages
Et200sp System Manual en-US en-US
No ratings yet
Et200sp System Manual en-US en-US
271 pages
Intel Processor Identification and The CPUID Instruction
No ratings yet
Intel Processor Identification and The CPUID Instruction
124 pages
Lecture Notes: (R15A0529) B.Tech Iv Year - I Sem (R15) (2019 - 20)
No ratings yet
Lecture Notes: (R15A0529) B.Tech Iv Year - I Sem (R15) (2019 - 20)
223 pages
Apg 66
100% (4)
Apg 66
18 pages
Unit 1
No ratings yet
Unit 1
76 pages
Engineering Programming Quiz
No ratings yet
Engineering Programming Quiz
5 pages
ITCT 2023 Study Material
No ratings yet
ITCT 2023 Study Material
54 pages
BCS302 Unit-4 (Part-I)
No ratings yet
BCS302 Unit-4 (Part-I)
8 pages
IT602 Mcqs FinalTerm by Vu Topper RM
100% (1)
IT602 Mcqs FinalTerm by Vu Topper RM
8 pages