Compiler Design Notes
UNIT I: Lexical Analysis & Syntax Analysis
**Language Processors:** Systems that process programs to make them executable.
Examples: compilers, interpreters, assemblers.
**Structure of a Compiler:** Phases include lexical analysis, syntax analysis, semantic
analysis, intermediate code generation, code optimization, and code generation.
**Lexical Analysis:** Converts characters to tokens. Removes whitespace/comments.
**Role of Lexical Analyzer:**
- Tokenizes input
- Removes whitespace/comments
- Passes tokens to parser
**Bootstrapping:** Writing a compiler in the source programming language it intends to
compile.
**Input Buffering:** Technique for efficient scanning using buffers with sentinel characters.
**Specification of Tokens:** Defined using regular expressions, e.g., identifier: `[a-zA-Z_][a-
zA-Z0-9_]*`
**Recognition of Tokens:** Finite Automata used to recognize token patterns.
**Lexical Analyzer Generator (LEX):** Tool that generates lexical analyzers. Example:
DIGIT [0-9]
%%
{DIGIT}+ { printf("Number"); }
%%
**Finite Automata:** DFA/NFA used to implement lexical analyzers.
**Regular Expressions and Finite Automata:** REs define languages recognized by FA.
**Design of Lexical Analyzer Generator:** Converts REs to NFA -> DFA -> minimized DFA ->
code.
**Syntax Analysis:** Checks token sequence against grammar rules.
**Role of the Parser:** Detects syntax errors, builds parse trees.
**Context-Free Grammars (CFG):** Consist of terminals, non-terminals, start symbol, and
productions.
**Derivations and Parse Trees:** Show how strings derive from grammar. Leftmost and
rightmost derivations.
**Ambiguity:** A grammar with multiple parse trees for the same string.
**Left Recursion:** Grammar with productions like A -> Aα. Must be removed for top-down
parsing.
**Left Factoring:** Removes common prefixes to aid predictive parsing.
---
UNIT II: Parsing Techniques
**Top Down Parsing:** Builds parse tree from top using CFG.
**Preprocessing Steps:** Remove left recursion, perform left factoring.
**Backtracking:** Tries multiple production rules. Inefficient.
**Recursive Descent Parsing:** Uses mutually recursive functions for grammar rules.
**LL(1) Grammars:** Can be parsed without backtracking. Use single lookahead.
**Non-recursive Predictive Parsing:** Uses parsing table and stack.
**Error Recovery in Predictive Parsing:** Techniques include panic mode and phrase-level
recovery.
**Bottom Up Parsing:** Builds tree from leaves up.
**Difference between LR and LL Parsers:** LR parsers are more powerful and can handle
left recursion.
**Types of LR Parsers:** SLR, CLR, LALR.
**Shift-Reduce Parsing:** Uses stack and input buffer. Shift moves input to stack; reduce
applies grammar.
**SLR Parsers:** Simplified LR parsers using FOLLOW sets.
**SLR Table Construction:** Compute FIRST, FOLLOW, item sets, ACTION/GOTO tables.
**CLR and LALR Parsers:** More powerful, use lookahead. LALR combines similar CLR
states.
**Dangling Else Ambiguity:** "else" may match multiple "if"s. Resolved via grammar.
**Error Recovery in LR Parsing:** Same as in LL but adapted for stack.
**Handling Ambiguous Grammar:** Use precedence and associativity rules.
---
UNIT III: Syntax Directed Translation & Intermediate Code
**Syntax Directed Definitions (SDD):** CFG + semantic rules.
**Evaluation Orders for SDDs:** Post-order traversal for bottom-up; pre-order for top-
down.
**Applications of Syntax Directed Translation:** Type checking, intermediate code
generation.
**Syntax Directed Translation Schemes (SDTS):** Grammar with semantic actions
embedded.
**Implementing L-Attributed SDDs:** Evaluate attributes during parsing.
**Intermediate Code Generation:** Converts source to intermediate representation (IR).
**Variants of Syntax Trees:** Abstract syntax trees, DAGs.
**Three Address Code (TAC):** IR using temporary variables. Example:
t1 = a + b
t2 = t1 * c
**Types and Declarations:** Managed with symbol table.
**Translation of Expressions:** Convert infix to postfix/TAC.
**Type Checking:** Ensures operands are type-compatible.
**Control Flow & Backpatching:** Used for jumps and branches.
**Intermediate Code for Procedures:** Includes prologue/epilogue, parameter passing.
---
UNIT IV: Code Optimization
**Sources of Optimization:** Redundant operations, dead code, loop inefficiencies.
**Basic Blocks:** Sequences of instructions with single entry/exit.
**Optimization of Basic Blocks:** Remove common sub-expressions, dead code elimination.
**Structure Preserving Transformations:** Maintain program structure while optimizing.
**Flow Graphs:** Represent control flow with nodes and edges.
**Loop Optimization:** Includes loop unrolling, invariant code motion.
**Data-Flow Analysis:** Gathers info on variable usage to optimize.
**Peephole Optimization:** Localized improvements like replacing instructions.
---
UNIT V: Run Time Environments & Code Generation
**Storage Organization:** Stack, heap, static, and code segments.
**Run Time Storage Allocation:** Memory assigned to variables/structures during
execution.
**Activation Records:** Store return address, parameters, local variables.
**Procedure Calls:** Manage control transfer and data passing.
**Displays:** Used for accessing non-local variables.
**Code Generation Issues:** Instruction selection, register allocation.
**Object Code Forms:** Final machine code forms.
**Code Generation Algorithm:** Converts IR to assembly.
**Register Allocation and Assignment:** Efficient use of CPU registers using graph coloring.
---
**Note:** Each unit's examples and key diagrams (like DFA for token recognition, parse
trees, TAC examples) should be practiced separately.
These notes aim to summarize core compiler design concepts with clarity.