|
| 1 | +# Parser Component Documentation |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The parser component is responsible for transforming source code text into an Abstract Syntax Tree (AST). It is implemented using the `nom` parser combinator library and follows a modular design pattern, breaking down the parsing logic into several specialized modules. |
| 6 | + |
| 7 | +## Architecture |
| 8 | + |
| 9 | +The parser is organized into the following modules: |
| 10 | + |
| 11 | +- `parser.rs`: The main entry point that coordinates the parsing process |
| 12 | +- `parser_common.rs`: Common parsing utilities and shared functions |
| 13 | +- `parser_expr.rs`: Expression parsing functionality |
| 14 | +- `parser_type.rs`: Type system parsing |
| 15 | +- `parser_stmt.rs`: Statement and control flow parsing |
| 16 | + |
| 17 | +### Module Responsibilities and Public Interface |
| 18 | + |
| 19 | +#### 1. parser.rs |
| 20 | +The main parser module that provides the entry point for parsing complete programs: |
| 21 | +```rust |
| 22 | +pub fn parse(input: &str) -> IResult<&str, Vec<Statement>> |
| 23 | +``` |
| 24 | + |
| 25 | +#### 2. parser_common.rs |
| 26 | +Common parsing utilities used across other modules: |
| 27 | +```rust |
| 28 | +pub fn is_string_char(c: char) -> bool |
| 29 | +pub fn separator<'a>(sep: &'static str) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str> |
| 30 | +pub fn keyword<'a>(kw: &'static str) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str> |
| 31 | +pub fn identifier(input: &str) -> IResult<&str, &str> |
| 32 | +``` |
| 33 | + |
| 34 | +#### 3. parser_expr.rs |
| 35 | +Expression parsing functionality: |
| 36 | +```rust |
| 37 | +pub fn parse_expression(input: &str) -> IResult<&str, Expression> |
| 38 | +pub fn parse_actual_arguments(input: &str) -> IResult<&str, Vec<Expression>> |
| 39 | +``` |
| 40 | + |
| 41 | +#### 4. parser_type.rs |
| 42 | +Type system parsing: |
| 43 | +```rust |
| 44 | +pub fn parse_type(input: &str) -> IResult<&str, Type> |
| 45 | +``` |
| 46 | + |
| 47 | +#### 5. parser_stmt.rs |
| 48 | +Statement and control flow parsing: |
| 49 | +```rust |
| 50 | +pub fn parse_statement(input: &str) -> IResult<&str, Statement> |
| 51 | +``` |
| 52 | + |
| 53 | +## Parser Features |
| 54 | + |
| 55 | +### Statement Parsing |
| 56 | +The parser supports various types of statements: |
| 57 | +- Variable declarations and assignments |
| 58 | +- Control flow (if-else, while, for) |
| 59 | +- Function definitions |
| 60 | +- Assert statements |
| 61 | +- ADT (Algebraic Data Type) declarations |
| 62 | + |
| 63 | +### Expression Parsing |
| 64 | +Handles different types of expressions: |
| 65 | +- Arithmetic expressions |
| 66 | +- Boolean expressions |
| 67 | +- Function calls |
| 68 | +- Variables |
| 69 | +- Literals (numbers, strings, booleans) |
| 70 | +- ADT constructors and pattern matching |
| 71 | + |
| 72 | +### Type System |
| 73 | +Supports a rich type system including: |
| 74 | +- Basic types (Int, Real, Boolean, String, Unit, Any) |
| 75 | +- Complex types (List, Tuple, Maybe) |
| 76 | +- ADT declarations |
| 77 | +- Function types |
| 78 | + |
| 79 | +## nom Parser Combinators |
| 80 | + |
| 81 | +The parser extensively uses the `nom` parser combinator library. Here are the key combinators used: |
| 82 | + |
| 83 | +### Basic Combinators |
| 84 | +- `tag`: Matches exact string patterns |
| 85 | +- `char`: Matches single characters |
| 86 | +- `digit1`: Matches one or more digits |
| 87 | +- `alpha1`: Matches one or more alphabetic characters |
| 88 | +- `space0/space1`: Matches zero or more/one or more whitespace characters |
| 89 | + |
| 90 | +### Sequence Combinators |
| 91 | +- `tuple`: Combines multiple parsers in sequence |
| 92 | +- `preceded`: Matches a prefix followed by a value |
| 93 | +- `terminated`: Matches a value followed by a suffix |
| 94 | +- `delimited`: Matches a value between two delimiters |
| 95 | + |
| 96 | +### Branch Combinators |
| 97 | +- `alt`: Tries multiple parsers in order |
| 98 | +- `map`: Transforms the output of a parser |
| 99 | +- `opt`: Makes a parser optional |
| 100 | + |
| 101 | +### Multi Combinators |
| 102 | +- `many0/many1`: Matches zero or more/one or more occurrences |
| 103 | +- `separated_list0`: Matches items separated by a delimiter |
| 104 | + |
| 105 | +## Example Usage |
| 106 | + |
| 107 | +Here's an example of how the parser handles a simple assignment statement: |
| 108 | + |
| 109 | +```python |
| 110 | +x = 42 |
| 111 | +``` |
| 112 | + |
| 113 | +This is parsed using the following combinators: |
| 114 | +```rust |
| 115 | +fn parse_assignment_statement(input: &str) -> IResult<&str, Statement> { |
| 116 | + map( |
| 117 | + tuple(( |
| 118 | + preceded(multispace0, identifier), |
| 119 | + preceded(multispace0, tag("=")), |
| 120 | + preceded(multispace0, parse_expression), |
| 121 | + )), |
| 122 | + |(var, _, expr)| Statement::Assignment(var.to_string(), Box::new(expr)), |
| 123 | + )(input) |
| 124 | +} |
| 125 | +``` |
| 126 | + |
| 127 | +## AST Structure |
| 128 | + |
| 129 | +The parser produces an Abstract Syntax Tree (AST) with the following main types: |
| 130 | + |
| 131 | +### Statements |
| 132 | +```rust |
| 133 | +pub enum Statement { |
| 134 | + VarDeclaration(Name), |
| 135 | + ValDeclaration(Name), |
| 136 | + Assignment(Name, Box<Expression>), |
| 137 | + IfThenElse(Box<Expression>, Box<Statement>, Option<Box<Statement>>), |
| 138 | + While(Box<Expression>, Box<Statement>), |
| 139 | + For(Name, Box<Expression>, Box<Statement>), |
| 140 | + Block(Vec<Statement>), |
| 141 | + Assert(Box<Expression>, Box<Expression>), |
| 142 | + FuncDef(Function), |
| 143 | + Return(Box<Expression>), |
| 144 | + ADTDeclaration(Name, Vec<ValueConstructor>), |
| 145 | + // ... other variants |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +### Types |
| 150 | +```rust |
| 151 | +pub enum Type { |
| 152 | + TInteger, |
| 153 | + TReal, |
| 154 | + TBool, |
| 155 | + TString, |
| 156 | + TList(Box<Type>), |
| 157 | + TTuple(Vec<Type>), |
| 158 | + TMaybe(Box<Type>), |
| 159 | + TResult(Box<Type>, Box<Type>), |
| 160 | + TFunction(Box<Option<Type>>, Vec<Type>), |
| 161 | + // ... other variants |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +## Error Handling |
| 166 | + |
| 167 | +The parser implements error handling through the `nom` error system: |
| 168 | +```rust |
| 169 | +pub enum ParseError { |
| 170 | + IndentationError(usize), |
| 171 | + UnexpectedToken(String), |
| 172 | + InvalidExpression(String), |
| 173 | +} |
| 174 | +``` |
| 175 | + |
| 176 | +## Testing |
| 177 | + |
| 178 | +The parser includes a comprehensive test suite in `tests/parser_tests.rs` that verifies: |
| 179 | +- Simple assignments |
| 180 | +- Complex expressions |
| 181 | +- Control flow structures |
| 182 | +- Type annotations |
| 183 | +- Complete programs |
| 184 | +- Error handling |
| 185 | +- Whitespace handling |
| 186 | + |
| 187 | + |
| 188 | +> **Documentation Generation Note** |
| 189 | +> This documentation was automatically generated by Claude (Anthropic), an AI assistant, through analysis of the codebase. While the content accurately reflects the implementation, it should be reviewed and maintained by the development team. Last generated: June 2025. |
0 commit comments