Arm Cortex-X1 Core Software Optimization Guide
Arm Cortex-X1 Core Software Optimization Guide
Revision: r1p2
Release information
Document history
Issue Date Confidentiality Change
1.0 25 March 2019 Confidential First release for r0p0
2.0 27 September 2019 Confidential First release for r1p0
3.0 29 May 2020 Non-Confidential First release for r1p1
4.0 28 April 2021 Non-Confidential First release for r1p2
Your access to the information in this document is conditional upon your acceptance that you will not use or
permit others to use the information for the purposes of determining whether implementations infringe any
third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES
OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A
PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no
representation with respect to, has undertaken no analysis to identify or understand the scope and content of,
patents, copyrights, trade secrets, or other rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY,
ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use,
duplication or disclosure of this document complies fully with any relevant export laws and regulations to assure
that this document or any portion thereof is not exported, directly or indirectly, in violation of such export laws.
Use of the word “partner” in reference to Arm's customers is not intended to create or refer to any partnership
relationship with any other company. Arm may make changes to this document at any time and without notice.
This document may be translated into other languages for convenience, and you agree that if there is any
conflict between the English version of this document and any translation, the terms of the English version of
the Agreement shall prevail.
The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited
(or its affiliates) in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 2 of 59
Arm® Cortex®-X1 Core Software Optimization Guide PJDOC-466751330-12804
Issue 4.0
document may be the trademarks of their respective owners. Please follow Arm's trademark usage guidelines at
https://www.arm.com/company/policies/trademarks.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
(LES-PRE-20349)
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license
restrictions in accordance with the terms of the agreement entered into by Arm and the party that Arm
delivered this document to.
Product Status
The information in this document is Final, that is for a developed product.
Web Address
developer.arm.com
This document includes terms that can be offensive. We will replace these terms in a future issue of this
document. If you find offensive terms in this document, please email terms@arm.com.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 3 of 59
Arm® Cortex®-X1 Core Software Optimization Guide PJDOC-466751330-12804
Issue 4.0
Contents
1 Introduction........................................................................................................................................................... 6
1.1 Product revision status ........................................................................................................................................... 6
1.2 Intended audience .................................................................................................................................................... 6
1.3 Scope ............................................................................................................................................................................. 6
1.4 Conventions ............................................................................................................................................................... 6
1.4.1 Glossary.................................................................................................................................................................... 6
1.4.2 Typographical conventions ............................................................................................................................... 8
1.5 Additional reading .................................................................................................................................................... 9
1.6 Feedback.................................................................................................................................................................... 10
1.6.1 Feedback on this product ................................................................................................................................. 10
1.6.2 Feedback on content ......................................................................................................................................... 10
2 Overview .............................................................................................................................................................. 11
2.1 Pipeline overview ................................................................................................................................................... 11
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 4 of 59
Arm® Cortex®-X1 Core Software Optimization Guide PJDOC-466751330-12804
Issue 4.0
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 5 of 59
Arm® Cortex®-X1 Core Software Optimization Guide PJDOC-466751330-12804
Issue 4.0
1 Introduction
1 Introduction
1.1 Product revision status
The rxpy identifier indicates the revision status of the product described in this book, for example,
r1p2, where:
rx
Identifies the major revision of the product, for example, r1.
py
Identifies the minor revision or modification status of the product, for example, p2.
1.3 Scope
This document describes aspects of the Cortex-X1 core micro-architecture that influence software
performance. Micro-architectural detail is limited to that which is useful for software optimization.
Documentation extends only to software visible behavior of the Cortex-X1 core and not to the
hardware rationale behind the behavior.
1.4 Conventions
The following subsections describe conventions used in Arm documents.
1.4.1 Glossary
The Arm Glossary is a list of terms used in Arm documentation, together with definitions for those
terms. The Arm Glossary does not contain terms that are industry standard unless the Arm meaning
differs from the generally accepted meaning.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 6 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 1 Introduction
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 7 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 1 Introduction
SMALL CAPITALS Used in body text for a few terms that have specific technical meanings, that are defined in
the Arm® Glossary. For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC,
UNKNOWN, and UNPREDICTABLE.
This represents a recommendation which, if not followed, might lead to system failure or
damage.
This represents a requirement for the system that, if not followed, might result in system
failure or damage.
This represents a requirement for the system that, if not followed, will result in system
failure or damage.
This represents a useful tip that might make it easier, better or faster to perform a task.
This is a reminder of something important that relates to the information you are reading.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 8 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 1 Introduction
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 9 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 1 Introduction
1.6 Feedback
Arm welcomes feedback on this product and its documentation.
Arm tests the PDF only in Adobe Acrobat and Acrobat Reader and cannot guarantee the quality of
the represented document when used with any other PDF reader.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 10 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 2 Overview
2 Overview
The Cortex-X1 core is a high-performance, low-power core that implements the Armv8-A
architecture with support for the Armv8.1-A extension, Armv8.2-A extension, including the RAS
extension, the Load acquire (LDAPR) instructions introduced in the Armv8.3-A extension, and the
Dot Product instructions introduced in the Armv8.4-A extension.
This document describes elements of the Cortex-X1 core micro-architecture that influence software
performance so that software and compilers can be optimized accordingly.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 11 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 2 Overview
Branch 0
Branch 1
Integer Single-Cycle 0
Issue
FP/ASIMD 0
FP/ASIMD 1
FP/ASIMD 2
FP/ASIMD 3
Load/Store 0
Load/Store 1
Load 2
Store data 0
Store data 1
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 12 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 2 Overview
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 13 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
3 Instruction characteristics
3.1 Instruction tables
This chapter describes high-level performance characteristics for most Armv8.2-A A32, T32, and A64
instructions. A series of tables summarize the effective execution latency and throughput (instruction
bandwidth per cycle), pipelines utilized, and special behaviours associated with each group of
instructions. Utilized pipelines correspond to the execution pipelines described in chapter 2.
In the tables below, Execution Latency is defined as the minimum latency seen by an operation
dependent on an instruction in the described group.
In the tables below, Execution Throughput is defined as the maximum throughput (in instructions per
cycle) of the specified instruction group that can be achieved in the entirety of the Cortex-X1
microarchitecture.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 14 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Branch, immed B 1 2 B -
Branch, register BX 1 2 B -
Branch and link, immed BL, BLX 1 2 B, S -
Branch and link, register BLX 1 2 B, S -
Compare and branch CBZ, CBNZ 1 2 B -
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 15 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 16 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 17 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 18 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 19 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 20 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Notes:
1. One reg form is when Rn==Rm or imm==0, all other forms are considered two regs.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 21 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 22 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 23 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 24 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 25 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 26 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Notes:
1. The address update op goes down pipeline ‘I’ if the store is unconditional.
2. The address update op goes down pipeline 'M' if the store is unconditional.
3. For store multiple instructions, N=floor((num_regs+3)/4).
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 27 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 28 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 29 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 30 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Notes:
1. Condition loads have an extra uop which goes down pipeline 'V' and have 2 cycle extra latency compared to their
unconditional counterparts.
2. N is (num_reg)/6 + 5.
3. N* is (num_reg)/4 + 5.
4. R is num_reg/2.
5. Writeback forms of load instructions require an extra µOP to update the base address. This update is typically
performed in parallel with or prior to the load µOP (update latency shown in parentheses).
6. The number in parenthesis represents the latency and throughput of conditional loads.
7. Conditional loads go down L01 pipe.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 31 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 33 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 34 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 35 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 37 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 38 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 39 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 40 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 41 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 42 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 43 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 44 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 45 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 46 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 48 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 49 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 3 Instruction characteristics
3.21 CRC
Table 3-36 AArch64 CRC
Instruction Group AArch64 Execution Execution Utilized Notes
Instructions Latency Throughput Pipelines
CRC checksum ops CRC32, CRC32C 2 1 M0 1
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 50 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
4 Special considerations
4.1 Dispatch constraints
Dispatch of µOPs from the in-order portion to the out-of-order portion of the microarchitecture
includes several constraints. It is important to consider these constraints during code generation to
maximize the effective dispatch bandwidth and subsequent execution bandwidth of Cortex-X1.
The dispatch stage can process up to 8 MOPs per cycle and dispatch up to 16 µOPs per cycle, with the
following limitations on the number of µOPs of each type that may be simultaneously dispatched.
• Up to 4 µOPs utilizing the S or B pipelines
• Up to 4 µOPs utilizing the M pipelines
• Up to 2 µOPs utilizing the M0 pipelines
• Up to 2 µOPs utilizing the V0 pipeline
• Up to 2 µOPs utilizing the V1 pipeline
• Up to 6 µOPs utilizing the L pipelines
In the event there are more µOPs available to be dispatched in a given cycle than can be supported by
the constraints above, µOPs will be dispatched in oldest to youngest age-order to the extent allowed
by the above.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 51 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
• Use non-writeback forms of LDP and STP instructions interleaving them like shown in the
example below:
Loop_start:
SUBS x2,x2,#96
LDP q3,q4,[x1,#0]
STP q3,q4,[x0,#0]
LDP q3,q4,[x1,#32]
STP q3,q4,[x0,#32]
LDP q3,q4,[x1,#64]
STP q3,q4,[x0,#64]
ADD x1,x1,#96
ADD x0,x0,#96
BGT Loop_start
A recommended copy routine for AArch32 would look like the sequence above but would use
LDRD/STRD instructions. Avoid load-/store-multiple instruction encodings (such as LDM and STM).
To achieve maximum performance on memset to zero, it is recommended that one use DC ZVA
instead of STP. An optimal routine might look something like the following:
Loop_start:
SUBS x2,x2,#0x80
DC ZVA,x0
ADD x0,x0,#0x40
DC ZVA,x0
ADD x0,x0,#0x40
B.GT Loop_start
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 52 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
Pairs of dependent AESE/AESMC and AESD/AESIMC instructions exhibit higher performance when
they are adjacent in the program code and both instructions use the same destination register.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 53 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
Notes:
1. Reciprocal step and estimate instructions are excluded from this region.
2. ASIMD extract narrow, saturating instructions are excluded from this region.
3. ASIMD miscellaneous instructions can only be consumers of this region.
In addition to the regions mentioned in the table above, all instructions in regions 1 and 2 can fast
forward to FP/ASIMD stores, FP/ASIMD vector to integer register transfers and ASIMD converts
that write to general purpose registers.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 54 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
• Absolute difference accumulate and pairwise add and accumulate instructions cannot be
producers (see section 3.15) in region 1.
• For floating-point producer-consumer pairs, the precision of the instructions should match
(single, double or half) in region 2.
• Pair-wise floating-point instructions cannot be producers or consumers in region 2.
It is not advisable to interleave instructions belonging to different regions. Also, certain instructions
can only be producers or consumers in a particular region but not both (see footnote 3 for table 4-1).
For example, the code below interleaves producers and consumers from regions 1 and 2. This will
result in and additional latency of 1 cycle as seen by FMUL.
For best case performance, avoid placing more than four branch instructions within an aligned 32-
byte instruction memory region.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 55 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
The table below summarizes various special-purpose register read accesses and the associated
execution constraints or side-effects.
The table below summarizes various special-purpose register write accesses and the associated
execution constraints or side-effects.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 56 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
The first instruction writes S0, which corresponds to the lowest part of Q0. The second instruction
then requires Q0 as an input operand. In this scenario, there is a RAW dependency between the first
and the second instructions. In most cases, Cortex-X1 performs slightly worse in such situations.
Cortex-X1 is able to avoid this register-hazard condition for certain cases. The following rules
describe the conditions under which a register-hazard can occur:
• The producer writes an S-register (not a D[x] scalar)
• The consumer reads an overlapping Q-register (not as a D[x] scalar)
• The consumer is a FP/ASIMD µOP (not a store or MOV µOP)
To avoid unnecessary hazards, it is recommended that the programmer use D[x] scalar writes when
populating registers prior to ASIMD operations. For example, either of the following instruction
forms would safely prevent a subsequent hazard.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 57 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
4.12 IT blocks
The Armv8-A architecture performance deprecates some uses of the IT instruction in such a way that
software may be written using multiple naïve single instruction IT blocks. It is preferred that software
instead generate multi instruction IT blocks rather than single instruction blocks.
The following instruction pairs are fused in both Aarch32 and Aarch64 modes:
1. AESE + AESMC (see Section 4.6 on AES Encryption/Decryption)
2. AESD + AESIMC (see Section 4.6 on AES Encryption/Decryption)
MOV Xd, #0
MOV Wd, #0
MOV Wd, Wn
MOV Xd, Xn
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 58 of 59
Arm® Cortex®-X1 Core Software Optimization PJDOC-466751330-12804
Issue 4.0
Guide 4 Special considerations
The last 3 instructions may not be executed with zero latency under certain conditions.
However, it is preferable to remove the indirect branch by using only Thumb-2 or Arm code for each
veneer.
Copyright © [2019-2021] Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 59 of 59