10000 [AArch64AsmParser] Allow branch target symbol to have a shift/extend modifier name by MaskRay · Pull Request #80571 · llvm/llvm-project · GitHub
[go: up one dir, main page]

Skip to content

[AArch64AsmParser] Allow branch target symbol to have a shift/extend modifier name #80571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of servic 10000 e and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member
@MaskRay MaskRay commented Feb 4, 2024

Shift and extend modifiers are parsed as separate operands. When a
symbol operand of a branch instruction has such a "bad" name,
AArch64AsmParser will report an error.

% cat a.c
void lsl(); void lsr(); void asr(); void ror(); void uxtb(); void sxtx();
void foo() { lsl(); asr(); asr(); ror(); uxtb(); sxtx(); }
% clang --target=aarch64 -c -save-temps a.c
a.s:15:8: error: expected #imm after shift specifier
        bl      lsl
                   ^
a.s:16:8: error: expected #imm after shift specifier
        bl      asr
                   ^
a.s:17:8: error: expected #imm after shift specifier
        bl      asr
                   ^
a.s:18:8: error: expected #imm after shift specifier
        bl      ror
                   ^
a.s:19:5: error: expected label or encodable integer pc offset
        bl      uxtb
                ^
a.s:20:5: error: expected label or encodable integer pc offset
        bl      sxtx
                ^

In contrast, gas correctly parses these instructions.

Fix #79729 by parsing shift/extend modifier after an immediate
value/register

Created using spr 1.3.4
@llvmbot llvmbot added backend:AArch64 mc Machine (object) code labels Feb 4, 2024
@llvmbot
Copy link
Member
llvmbot commented Feb 4, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-mc

Author: Fangrui Song (MaskRay)

Changes

Shift and extend modifiers are parsed as separate operands. When a
symbol operand of a branch instruction has such a "bad" name,
AArch64AsmParser will report an error.

% cat a.c
void lsl(); void lsr(); void asr(); void ror(); void uxtb(); void sxtx();
void foo() { lsl(); asr(); asr(); ror(); uxtb(); sxtx(); }
% clang --target=aarch64 -c -save-temps a.c
a.s:15:8: error: expected #imm after shift specifier
        bl      lsl
                   ^
a.s:16:8: error: expected #imm after shift specifier
        bl      asr
                   ^
a.s:17:8: error: expected #imm after shift specifier
        bl      asr
                   ^
a.s:18:8: error: expected #imm after shift specifier
        bl      ror
                   ^
a.s:19:5: error: expected label or encodable integer pc offset
        bl      uxtb
                ^
a.s:20:5: error: expected label or encodable integer pc offset
        bl      sxtx
                ^

In contrast, gas correctly parses these instructions.

Fix #79729 by parsing shift/extend modifier after an immediate
value/register


Full diff: https://github.com/llvm/llvm-project/pull/80571.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp (+26-7)
  • (modified) llvm/test/MC/AArch64/arm64-adr.s (+10)
  • (modified) llvm/test/MC/AArch64/arm64-branch-encoding.s (+10)
  • (modified) llvm/test/MC/AArch64/basic-a64-diagnostics.s (+8)
diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index e9d96f3b838d4..4e7c8f66d9d54 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -4809,20 +4809,30 @@ bool AArch64AsmParser::parseOperand(OperandVector &Operands, bool isCondCode,
       return parseCondCode(Operands, invertCondCode);
 
     // If it's a register name, parse it.
-    if (!parseRegister(Operands))
+    if (!parseRegister(Operands)) {
+      // Parse an optional shift/extend modifier.
+      AsmToken SavedTok = getTok();
+      if (parseOptionalToken(AsmToken::Comma)) {
+        // The operand after the register may be a label (e.g. ADR/ADRP).  Check
+        // such cases and don't report an error when <label> happens to match a
+        // shift/extend modifier.
+        ParseStatus Res = MatchOperandParserImpl(Operands, Mnemonic,
+                                                 /*ParseForAllFeatures=*/true);
+        if (!Res.isNoMatch())
+          return Res.isFailure();
+        Res = tryParseOptionalShiftExtend(Operands);
+        if (!Res.isNoMatch())
+          return Res.isFailure();
+        getLexer().UnLex(SavedTok);
+      }
       return false;
+    }
 
     // See if this is a "mul vl" decoration or "mul #<int>" operand used
     // by SVE instructions.
     if (!parseOptionalMulOperand(Operands))
       return false;
 
-    // This could be an optional "shift" or "extend" operand.
-    ParseStatus GotShift = tryParseOptionalShiftExtend(Operands);
-    // We can only continue if no tokens were eaten.
-    if (!GotShift.isNoMatch())
-      return GotShift.isFailure();
-
     // If this is a two-word mnemonic, parse its special keyword
     // operand as an identifier.
     if (Mnemonic == "brb" || Mnemonic == "smstart" || Mnemonic == "smstop" ||
@@ -4883,6 +4893,15 @@ bool AArch64AsmParser::parseOperand(OperandVector &Operands, bool isCondCode,
 
     E = SMLoc::getFromPointer(getLoc().getPointer() - 1);
     Operands.push_back(AArch64Operand::CreateImm(ImmVal, S, E, getContext()));
+
+    // Parse an optional shift/extend modifier.
+    AsmToken SavedTok = Tok;
+    if (parseOptionalToken(AsmToken::Comma)) {
+      ParseStatus Res = tryParseOptionalShiftExtend(Operands);
+      if (!Res.isNoMatch())
+        return Res.isFailure();
+      getLexer().UnLex(SavedTok);
+    }
     return false;
   }
   case AsmToken::Equal: {
diff --git a/llvm/test/MC/AArch64/arm64-adr.s b/llvm/test/MC/AArch64/arm64-adr.s
index 131e545d3bb5c..d1a91b6acf39a 100644
--- a/llvm/test/MC/AArch64/arm64-adr.s
+++ b/llvm/test/MC/AArch64/arm64-adr.s
@@ -23,6 +23,16 @@ adrp x0, foo
 // CHECK: adrp    x0, foo     // encoding: [A,A,A,0x90'A']
 // CHECK-NEXT:                //   fixup A - offset: 0, value: foo, kind: fixup_aarch64_pcrel_adrp_imm21
 
+// CHECK:      adrp    x0, lsl     // encoding: [A,A,A,0x90'A']
+// CHECK-NEXT:                //   fixup A - offset: 0, value: lsl, kind: fixup_aarch64_pcrel_adrp_imm21
+// CHECK-NEXT: adrp    x0, ror     // encoding: [A,A,A,0x90'A']
+// CHECK-NEXT:                //   fixup A - offset: 0, value: ror, kind: fixup_aarch64_pcrel_adrp_imm21
+// CHECK-NEXT: adr    x0, uxtb    // encoding: [A,A,A,0x10'A']
+// CHECK-NEXT:                //   fixup A - offset: 0, value: uxtb, kind: fixup_aarch64_pcrel_adr_imm21
+adrp x0, lsl
+adrp x0, ror
+adr x0, uxtb
+
 adr x0, #0xffffffff
 adrp x0, #0xffffffff
 adrp x0, #1
diff --git a/llvm/test/MC/AArch64/arm64-branch-encoding.s b/llvm/test/MC/AArch64/arm64-branch-encoding.s
index 48c2099012f68..17fdbd9e43fbc 100644
--- a/llvm/test/MC/AArch64/arm64-branch-encoding.s
+++ b/llvm/test/MC/AArch64/arm64-branch-encoding.s
@@ -157,3 +157,13 @@ L1:
 ; CHECK: dcps2                     ; encoding: [0x02,0x00,0xa0,0xd4]
 ; CHECK: dcps3                     ; encoding: [0x03,0x00,0xa0,0xd4]
 
+;; Test "bad" names
+  bl lsl
+  b.eq lsr
+  b.ne uxth
+; CHECK:      bl lsl     ; encoding: [A,A,A,0b100101AA]
+; CHECK-NEXT:   fixup A - offset: 0, value: lsl, kind: fixup_aarch64_pcrel_call26
+; CHECK-NEXT: b.eq lsr   ; encoding: [0bAAA00000,A,A,0x54]
+; CHECK-NEXT:   fixup A - offset: 0, value: lsr, kind: fixup_aarch64_pcrel_branch19
+; CHECK-NEXT: b.ne uxth  ; encoding: [0bAAA00001,A,A,0x54]
+; CHECK-NEXT:   fixup A - offset: 0, value: uxth, kind: fixup_aarch64_pcrel_branch19
diff --git a/llvm/test/MC/AArch64/basic-a64-diagnostics.s b/llvm/test/MC/AArch64/basic-a64-diagnostics.s
index a59861e13472a..028700f4d11df 100644
--- a/llvm/test/MC/AArch64/basic-a64-diagnostics.s
+++ b/llvm/test/MC/AArch64/basic-a64-diagnostics.s
@@ -1149,6 +1149,10 @@
 // CHECK-ERROR-NEXT:           cbz x29, #1
 // CHECK-ERROR-NEXT:                    ^
 
+/// Test "bad" names
+cbz w1, lsl
+// CHECK-ERROR: [[#@LINE-1]]:12: error: expected #imm after shift specifier
+
 //------------------------------------------------------------------------------
 // Conditional branch (immediate)
 //------------------------------------------------------------------------------
@@ -1343,6 +1347,7 @@
         csel sp, x2, x3, ne
         csel x10, x11, sp, ge
         csel x1, x2, x3, #3
+        csel x1, x2, x3, lsl #1, eq
 // CHECK-ERROR: error: invalid operand for instruction
 // CHECK-ERROR-NEXT:        csel w4, wsp, w9, eq
 // CHECK-ERROR-NEXT:                 ^
@@ -1366,6 +1371,9 @@
 // CHECK-ERROR-NEXT:                       ^
 // CHECK-ERROR-NEXT: error: expected AArch64 condition code
 // CHECK-ERROR-NEXT:        csel x1, x2, x3, #3
+// CHECK-ERROR-NEXT:                         ^
+// CHECK-ERROR-NEXT: error: expected AArch64 condition code
+// CHECK-ERROR-NEXT:        csel x1, x2, x3, lsl #1, eq
 // CHECK-ERROR-NEXT:                         ^
 
         csinc w20, w21, wsp, mi

@MaskRay
Copy link
Member Author
MaskRay commented Feb 9, 2024

Ping:)

Copy link
Collaborator
@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really an assembler expert, but this looks sensible from what I can see. LGTM

@MaskRay MaskRay merged commit 5da8013 into main Feb 12, 2024
@MaskRay MaskRay deleted the users/MaskRay/spr/aarch64asmparser-allow-branch-target-symbol-to-have-a-shiftextend-modifier-name branch February 12, 2024 04:21
MaskRay added a commit that referenced this pull request Feb 19, 2024
An immediate integer operand not prefixed with # can also be followed
by "lsl". Parse "lsl".
In the wild, edk2 ArmPkg/Include/AsmMacroIoLibV8.h has
`movz      Reg, (Val) >> 16, lsl #16`.

Note: our support for paren expression not prefixed with # is not very
good. For example, `adds x3, x4, (1024>>0), lsl 12` fails to be parsed.
@jroelofs
Copy link
Contributor

After this patch, the following no longer builds:

.macro MOVXI DST, SRC
    movk    \DST, (((\SRC) >> 16) & 0xffff), LSL #16
.endm

rdar123019110:
    MOVXI x1, 0x12345678
$ ./bin/clang movxi.S -o - -c
<instantiation>:1:47: error: unexpected token in argument list
movk x1, (((0x12345678) >> 16) & 0xffff), LSL #16
                                              ^
movxi.S:6:5: note: while in macro instantiation
    MOVXI x1, 0x12345678
    ^
movxi.S:7:1: error: assembler local symbol 'LSL' not defined

^

Works fine in clang-17 and binutils 2.38: https://clang.godbolt.org/z/ezn59Tdcx

@jroelofs
Copy link
Contributor

oh, maybe that was what was fixed in 442f066 ?

@jroelofs
Copy link
Contributor

Yes, it was. Carry on ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] Error when assembling call to function named ror
4 participants
0