0% found this document useful (0 votes)

15 views83 pages

CS 480 Translators (Compilers)

Uploaded by

Juan Pablo Castañeda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views83 pages

CS 480 Translators (Compilers)

Uploaded by

Juan Pablo Castañeda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

CS 480

Translators (Compilers)

weeks 2-3: SDT (cont’d), cython, lexer and lex/yacc

Instructor: Liang Huang

(some slides courtesy of David Beazley)
HW1 caveats & coding grades
• scope: either declare all vars diff
upfront, or before for-loop

• auxiliary loop variable takes

care of lots of corner cases
• ok for changing i in the loop
• no need for i-- after loop
• ok with range(0) diff -bd
• also need to cache range limit
• generous partial grades for
mistakes in printing multi-items

2
HW1 overall grade

3
Switching to ast from compiler
• the compiler packages is deprecated; replaced by ast
• similar structure, but easier to use, and faster
>>> import compiler
>>> compiler.parse("if 3==4: print 5")
Module(None, Stmt([If([(Compare(Const(3), [('==', Const(4))]),
Stmt([Printnl([Const(5)], None)]))], None)]))

>>> compiler.parse("if 3==4: print 5").node.nodes[0].dict

{'tests': [(Compare(Const(3), [('==', Const(4))]),
Stmt([Printnl([Const(5)], None)]))], 'else_': None, 'lineno': 1}

>>> import ast

>>> ast.dump(ast.parse("if 3==4: print 5"))
'Module(body=[If(test=Compare(left=Num(n=3), ops=[Eq()],
comparators=[Num(n=4)]), body=[Print(dest=None, values=[Num(n=5)],
nl=True)], orelse=[])])’

http://greentreesnakes.readthedocs.org/en/latest/
4
Switching to ast from compiler
• the compiler packages is deprecated; replaced by ast
• similar structure, but easier to use, and faster
https://docs.python.org/2/library/ast.html

CFG for AST CFG for Python

mod = Module(stmt* body) module: (NEWLINE | stmt)* ENDMARKER

stmt = Assign(expr* targets, expr value) stmt: simple_stmt | compound_stmt

nonterminals only
nonterminals and terminals
(abstract syntax tree) 5
program : module
HW2 Grammar (P_2 subset)
module : stmt+
stmt : (simple_stmt | if_stmt | for_stmt) NEWLINE

simple_stmt : "print" expr ("," expr)*

| int_name "=" int_expr
| bool_name "=" bool_expr

expr : int_expr | bool_expr

if_stmt : "if" bool_expr ":" (simple_stmts | suite)

for_stmt : "for" name "in" "range" "(" int_expr ")" ":" (simple_stmts | suite)

simple_stmts : simple_stmt (";" simple_stmt)+

suite : NEWLINE INDENT stmt+ DEDENT • HW2: translate P_2 into C
int_expr :
|
int_name
decint • expand hw1.py
• no recursive type analysis
| "-" int_expr
| int_expr "+" int_expr
| "(" int_expr ")"

• caution on nested loops

| int_expr "if" bool_expr "else" int_expr

bool_expr : bool_name

• HW3: use PLY (lex/yacc) to

comp_op : '<' | '>' | '==' | '>=' | '<=' | '<>' | '!='

6
HW1/2 Motivations: Cython

• Cython (2007~): compile a large

Python subset into C

• gradually being replaced by PyPy

(just-in-time compilation)

cython --embed $1.py

clang -Os -I /usr/include/python2.7 -lpython2.7 -lpthread -lm -lutil -ldl $1.c -o $1.out 7
C/C++ int long long long

Python int vs C int 32-bit

64-bit 32
32
64
64

• Python has arbitrary precision integer arithmetic (<type ‘long’>)

• when Python int exceeds ‘int’ range, it becomes ‘long’ and stays ‘long’
• you can turn it off in cython by “cdef int
• much faster, but overflows
import sys .py import sys .pyx
n, m = map(int, sys.argv[1:3]) n, m = map(int, sys.argv[1:3])
cdef int a, b
for _ in xrange(m): for _ in xrange(m):
a, b = 0, 1 a, b = 0, 1
for _ in xrange(n): for _ in xrange(n):
a, b = b, a+b a, b = b, a+b
print "fib[%d] = %d" % (n, b) print "fib[%d] = %d" % (n, b)

~95% faster than Python,

~15% faster than Python
but overflows!
(this is more like your HW1/2)
8
Compiled Cython Program (.cpp)
def fib(int n):
l = []
/* Generated by Cython 0.14 on Wed Jun 1 15:38:37 2011 */
cdef int a=0, b=1
#include "Python.h" for i in range(n):
... l.append(b)
/* "/Users/lhuang/install/svector/tutorial/fib_cy.pyx":2 a, b = b, a+b
* def fib(int n): return l
* l = [] # <<<<<<<<<<<<<<
* cdef int a=0, b=1
* for i in range(n):
*/
__pyx_t_1 = PyList_New(0); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(((PyObject *)__pyx_t_1));
__Pyx_DECREF(((PyObject *)__pyx_v_l));
__pyx_v_l = __pyx_t_1;
__pyx_t_1 = 0;

/* "/Users/lhuang/install/svector/tutorial/fib_cy.pyx":3
* def fib(int n):
* l = []
* cdef int a=0, b=1 # <<<<<<<<<<<<<<
* for i in xrange(n):
* l.append(b)
*/
__pyx_v_a = 0;
__pyx_v_b = 1;

9
Compiled Cython Program (.cpp)
/* "/Users/lhuang/install/svector/tutorial/fib_cy.pyx":4
* l = []
* cdef int a=0, b=1 very clever: for loop detected!
* for i in range(n): # <<<<<<<<<<<<<<
* l.append(b) but should always use xrange in
* a, b = b, a+b your .pyx or .py!
*/
__pyx_t_2 = __pyx_v_n;
for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
__pyx_v_i = __pyx_t_3;

/* "/Users/lhuang/install/svector/tutorial/fib_cy.pyx":5
* cdef int a=0, b=1
* for i in xrange(n):
* l.append(b) # <<<<<<<<<<<<<<
* a, b = b, a+b
* return l
*/
if (unlikely(__pyx_v_l == Py_None)) {
PyErr_SetString(PyExc_AttributeError, "'NoneType' object has no attribute 'append'");
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 5; __pyx_clineno = __LINE__; goto
__pyx_L1_error;} }
__pyx_t_1 = PyInt_FromLong(__pyx_v_b); if (unlikely(!__pyx_t_1)) {__pyx_filename =
__pyx_f[0]; __pyx_lineno = 5; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_4 = PyList_Append(__pyx_v_l, __pyx_t_1); if (unlikely(__pyx_t_4 == -1))
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 5; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

10
Compiled Cython Program (.cpp)
/* "/Users/lhuang/install/svector/tutorial/fib_cy.pyx":6
* for i in xrange(n):
* l.append(b)
* a, b = b, a+b # <<<<<<<<<<<<<<
* return l
*/
__pyx_t_4 = __pyx_v_b;
__pyx_t_5 = (__pyx_v_a + __pyx_v_b); correctly handles
__pyx_v_a = __pyx_t_4; simultaneous assignment
__pyx_v_b = __pyx_t_5;
}

... static PyObject __pyx_pf_6fib_cy_0fib(PyObject __pyx_self, PyObject *__pyx_arg_n) {

__pyx_v_n = __Pyx_PyInt_AsInt(__pyx_arg_n);
__pyx_t_1 = PyList_New(0);
__pyx_v_a = 0;
__pyx_v_b = 1;

for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_n; __pyx_t_3+=1) {

__pyx_t_1 = PyInt_FromLong(__pyx_v_b);
__pyx_t_4 = PyList_Append(__pyx_v_l, __pyx_t_1);
__Pyx_DECREF(__pyx_t_1);

__pyx_t_4 = __pyx_v_b;
__pyx_t_5 = (__pyx_v_a + __pyx_v_b);
__pyx_v_a = __pyx_t_4;
__pyx_v_b = __pyx_t_5;
}

...
} 11
By Comparison... using python int
import sys
n, m = map(int, sys.argv[1:3])

for _ in xrange(m):
a, b = 0, 1
for _ in xrange(n):
a, b = b, a+b
print "fib[%d] = %d" % (n, b)

__pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_b); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0];

__pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_a); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]
__pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_4);
__pyx_t_10 = __Pyx_GetModuleGlobalName(__pyx_n_s_b); if (unlikely(!__pyx_t_10)) {__pyx_filename =
__pyx_f[0]; __pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_10);
__pyx_t_11 = PyNumber_Add(__pyx_t_4, __pyx_t_10); if (unlikely(!__pyx_t_11)) {__pyx_filename = __pyx_f[0];
__pyx_lineno = 12; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_11);
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
if (PyDict_SetItem(__pyx_d, __pyx_n_s_a, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 12;
__pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
if (PyDict_SetItem(__pyx_d, __pyx_n_s_b, __pyx_t_11) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 12;
__pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;

12
Lexer within Compiler Pipeline

regexps
INT = \d+
CLASSNAME =
[A-Z][a-zA-Z\d]*

used in HW1 CFG in

[E]BNF

simple_stmt : "print" expr ("," expr)*

| name "=" int_expr

13
Regular Expression Examples
• integers and floats
(-?)\d+ (-?) ((\d+\.\d*) | (\d*\.\d+))

• Python/Java/C variable names (starts with letter or _)

[a-zA-Z_] [a-zA-Z\d_]*

• Python variable/function naming convention: joined_lower

[a-z_]+ (_?)([a-z]+)(_[a-z]+)*(_?)

• Java variable/function naming convention: camelCase

[a-z]+([A-Z][a-z]+)+

• Python/Java class naming convention: StudlyCaps

([A-Z][a-z]+)+

14
How RegExps are implemented
• regexp => NFA => DFA

15
Two Examples: (a|b) aba
* (ab|ba) |bb
*

CS 321 - TOC 16
Writing a Lexer is Boring...
• we often use a lexer-generator
only allows BNF
INT = \d+
not EBNF!
CLASSNAME = module : stmtlist
[A-Z][a-zA-Z\d]* stmtlist : stmt | stmt stmtlist
... ...

~1974 ~1973

Writing Parsers and Compilers with PLY (David Beazley)

17
Lex & Yacc
• Programming tools for writing parsers
CEO/chairman of

• Lex - Lexical analysis (tokenizing)

• Yacc - Yet Another Compiler Compiler (parsing)
• History:
- Yacc : ~1973. Stephen Johnson (AT&T)
author of - Lex : ~1974. Eric Schmidt and Mike Lesk (AT&T)

• Variations of both tools are widely known

intern of

• Covered in compilers classes and textbooks

author of
Lex/Yacc Big Picture
lexer.l
token
specification
Lex/Yacc Big Picture
lexer.l
token */
/* lexer.l grammar
specification
%{ specification
#include “header.h”
int lineno = 1;
%}
%%
[ \t]* ; /* Ignore whitespace */
\n { lineno++; }
[0-9]+ { yylval.val = atoi(yytext);
return NUMBER; }
[a-zA-Z_][a-zA-Z0-9_]* { yylval.name = strdup(yytext);
return ID; }
\+ { return PLUS; }
- { return MINUS; }
\* { return TIMES; }
\/ { return DIVIDE; }
= { return EQUALS; }
%%
Lex/Yacc Big Picture
lexer.l
token
specification

lex

lexer.c
Lex/Yacc Big Picture
lexer.l parser.y
token grammar
specification specification

lex

lexer.c
Lex/Yacc Big Picture
lexer.l parser.y
token
/* parser.y */ grammar
specification
%{ specification
#include “header.h”
%}
lex
%union {
char *name;
lexer.c.cint val;
}
%token PLUS MINUS TIMES DIVIDE EQUALS
%token<name> ID;
%token<val> NUMBER;
%%
start : ID EQUALS expr;
expr : expr PLUS term
| expr MINUS term
| term
;
...
Lex/Yacc Big Picture
lexer.l parser.y
token grammar
specification specification

lex yacc

lexer.c parser.c
Lex/Yacc Big Picture
lexer.l parser.y
token grammar
specification specification

lex yacc

lexer.c parser.c

typecheck.c codegen.c otherstuff.c

Lex/Yacc Big Picture
lexer.l parser.y
token grammar
specification specification

lex yacc

lexer.c parser.c

typecheck.c codegen.c otherstuff.c

mycompiler
What is PLY?

• PLY = Python Lex-Yacc

• A Python version of the lex/yacc toolset
• Same functionality as lex/yacc
• But a different interface
• Influences : Unix yacc, SPARK (John Aycock)
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

REALLY AWFUL INTERFACE!!

SHOULD BE MODERNIZED!!
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’ tokens list specifies
t_TIMES = r’\*’ all of the possible tokens
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’ Each token has a matching
t_MINUS = r’-’
t_TIMES = r’\*’
declaration of the form
t_DIVIDE = r’/’ t_TOKNAME
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’ These names must match
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’ Tokens are defined by
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
regular expressions
def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’ For simple tokens,
t_TIMES = r’\*’
t_DIVIDE = r’/’
strings are used.
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
Functions are used when
t_EQUALS = r’=’
t_NAME special action code
= r’[a-zA-Z_][a-zA-Z0-9_]*’
must execute
def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

def t_NUMBER(t): docstring holds

r’\d+’ regular expression
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ Specifies
] ignored
t_ignore = ‘ \t’ characters between
t_PLUS = r’\+’
tokens (usually whitespace)
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t
Builds the lexer
lex.lex() by
# creating
Build thea master
lexer
regular expression
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’ Introspection used
t_TIMES = r’\*’ to examine contents
t_DIVIDE = r’/’ of calling module.
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’ Introspection used
t_TIMES = r’\*’ to examine contents
t_DIVIDE = r’/’ of calling module.
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
r’\d+’ __dict__ = {
t.value = int(t.value) 'tokens' : [ 'NAME' ...],
return t 't_ignore' : ' \t',
't_PLUS' : '\\+',
lex.lex() # Build the ...
lexer
't_NUMBER' : <function ...
}
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
lex.input("x = 3 * 4 + 5 * 6")
while True:
tok = lex.token()
if not tok: break

# Use token
...
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
input() feeds a string
lex.input("x = 3 * 4 + 5 * 6")
while True: into the lexer
tok = lex.token()
if not tok: break

# Use token
...
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
lex.input("x = 3 * 4 + 5 * 6")
while True:
tok = lex.token() token() returns the
if not tok: break next token or None
# Use token
...
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
lex.input("x = 3 * 4 + 5 * 6")
while True:
tok = lex.token()
if not tok: break

tok.type # Use token

...
tok.value
tok.line
tok.lexpos
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
lex.input("x = 3 * 4 + 5 * 6")
while True:
tok = lex.token()
if not tok: break

tok.type # Use token

...
tok.value
tok.line
tok.lexpos t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
lex.input("x = 3 * 4 + 5 * 6")
while True:
tok = lex.token()
if not tok: break

tok.type # Use token

...
tok.value matching text
tok.line
tok.lexpos t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
ply.lex use
• Two functions: input() and token()
...
lex.lex() # Build the lexer
...
lex.input("x = 3 * 4 + 5 * 6")
while True:
tok = lex.token()
if not tok: break

tok.type # Use token

...
tok.value
tok.line
tok.lexpos Position in input text
Actually this doesn’t work!
• key words will be mixed with identifiers (variable names...)
tokens = ('PRINT','INT','PLUS', 'NAME') $ echo -e "print a" | ./hw2a.py
t_ignore = ' \t' (NAME, 'print') (NAME, 'a')
t_PRINT = r'print'
t_PLUS = r'\+'
t_NAME = r'[a-zA-Z_][a-zA-Z\d_]*'

keywords = {'print' : 'PRINT', ...} $ echo -e "print a" | ./hw2b.py

(PRINT, 'print') (NAME, 'a')
tokens = ['NAME','INT','PLUS'] + keywords.values()
t_ignore = ' \t'
t_PLUS = r'\+'
t_PRINT = r'print'
t_NAME = r'[a-zA-Z_][a-zA-Z\d_]*'

def t_NAME(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
t.type = keywords.get(t.value, 'NAME') # Check for reserved words
return t

47
Python’s INDENT/DEDENT
• this is beyond context-free, and thus way beyond regular!

• use a stack of indentation levels; INDENT/DEDENT is produced whenever

indentation level changes for i in range(1): (FOR, 'for') [0]
# t_ignore = ‘ \t’ # can’t ignore spaces for j in range(2): (NAME, 'i') [0]
indents = [0] print i,j (IN, 'in') [0]
def t_indent(t): 6 (RANGE, 'range') [0]
r'\n[ ]*' # tab is not allowed here print 5 (LPAREN, '(') [0]
t.lexer.lineno += 1 (INT, 1) [0]
if t.lexer.lexpos >= len(t.lexer.lexdata) \ (RPAREN, ')') [0]
or t.lexer.lexdata[t.lexer.lexpos] == "\n": # empty line (COLON, ':') [0]
return None # ignore empty line (INDENT, 1) [0, 2]
width = len(t.value) - 1 # exclude \n (FOR, 'for') [0, 2]
last_pos = t.lexer.lexpos - width (NAME, 'j') [0, 2]
if width == indents[-1]: (IN, 'in') [0, 2]
return None # same indentation level (RANGE, 'range') [0, 2]
elif width > indents[-1]: # one more level (LPAREN, '(') [0, 2]
t.type = 'INDENT' (INT, 2) [0, 2]
t.value = 1 (RPAREN, ')') [0, 2]
indents.append(width) (COLON, ':') [0, 2]
return t (INDENT, 1) [0, 2, 4]
else: # try one or more DEDENTS (PRINT, 'print') [0, 2, 4]
if_stmt : "if" bool_expr ":" suite
ded = 0 (NAME, 'i') [0, 2, 4]
while len(indents) > 1: suite : NEWLINE INDENT stmt+ DEDENT (COMMA, ',') [0, 2, 4]
indents.pop() (NAME, 'j') [0, 2, 4]
ded += 1 (INT, 6) [0, 2, 4]
if width == indents[-1]: (DEDENT, 2) [0]
t.type = 'DEDENT' (DEDENT, 2) [0]
t.value = ded # remember how many dedents (PRINT, 'print') [0]
return t (INT, 5) [0]
raise Exception("bad indent level at line %d: %s" % (t.lexer.lineno - 1,
t.lexer.lines[t.lexer.lineno-1]))

def t_space(t):
r'[ ]+'
return None # ignore white space 48
Python’s INDENT/DEDENT
• this is beyond context-free, and thus way beyond regular!

• use a stack of indentation levels; INDENT/DEDENT is produced whenever

indentation level changes for i in range(1): (FOR, 'for') [0]
class MyLexer(object): for j in range(2): (NAME, 'i') [0]
def __init__(self): print i,j (IN, 'in') [0]
lex.lex() # build regexps 6 (RANGE, 'range') [0]
self.lexer = lex.lexer print 5 (LPAREN, '(') [0]
self.dedent_balance = 0 (INT, 1) [0]
(RPAREN, ')') [0]
def input(self, stream): (COLON, ':') [0]
# the initial \n is to simplify indent (INDENT, 1) [0, 2]
# the final \n is to simplify dedent (FOR, 'for') [0, 2]
stream = "\n" + stream + "\n" (NAME, 'j') [0, 2]
self.lexer.input(stream) (IN, 'in') [0, 2]
self.lexer.lines = stream.split("\n") # internal (RANGE, 'range') [0, 2]
print >> logs, "now lexing..." (LPAREN, '(') [0, 2]
(INT, 2) [0, 2]
def tokenstr(self, tok): (RPAREN, ')') [0, 2]
return "(%s, %s)" % (tok.type,
("'%s'" if type(tok.value) is str else '%s') % tok.value) (COLON, ':') [0, 2]
(INDENT, 1) [0, 2, 4]
def token(self): (PRINT, 'print') [0, 2, 4]
if_stmt : "if" bool_expr ":" suite
if self.dedent_balance != 0: (NAME, 'i') [0, 2, 4]
suite : NEWLINE INDENT stmt+ DEDENT
self.dedent_balance -= 1 (COMMA, ',') [0, 2, 4]
tok = self.last_tok # (DEDENT, 1) (NAME, 'j') [0, 2, 4]
print >> logs, self.tokenstr(tok), (INT, 6) [0, 2, 4]
else: (DEDENT, 2) [0]
tok = self.lexer.token() # lexer.token (DEDENT, 2) [0]
if not tok: (PRINT, 'print') [0]
print >> logs (INT, 5) [0]
return None
print >> logs, self.tokenstr(tok),
if tok.type == 'DEDENT':
self.dedent_balance = tok.value - 1
self.last_tok = tok
return tok 49
ply.lex Commentary

• Normally you don't use the tokenizer directly

• Instead, it's used by the parser module
ply.yacc preliminaries
• ply.yacc is a module for creating a parser
• Assumes you have defined a BNF grammar
assign : NAME EQUALS expr compare with (ambiguity):
expr : expr PLUS term
| expr MINUS term expr : expr PLUS expr
| term | expr TIMES expr
term : term TIMES factor | NUMBER
| term DIVIDE factor
| factor ---- or ----
factor : NUMBER
expr : expr PLUS expr
| term
term : term TIMES term
| NUMBER
ply.yacc example
import ply.yacc as yacc
import mylexer # Import lexer information
tokens = mylexer.tokens # Need token list

def p_assign(p):
'''assign : NAME EQUALS expr'''

def p_expr(p):
'''expr : expr PLUS term
| expr MINUS term
| term'''
def p_term(p):
'''term : term TIMES factor
| term DIVIDE factor
| factor'''
def p_factor(p):
'''factor : NUMBER'''

yacc.yacc() # Build the parser

ply.yacc example
import ply.yacc as yacc
import mylexer token
# Import information
lexer information
tokens = mylexer.tokens imported
# Need from lexer
token list

def p_assign(p):
'''assign : NAME EQUALS expr'''

def p_expr(p):
'''expr : expr PLUS term
| expr MINUS term
| term'''
def p_term(p):
'''term : term TIMES factor
| term DIVIDE factor
| factor'''
def p_factor(p):
'''factor : NUMBER'''

yacc.yacc() # Build the parser

ply.yacc example
import ply.yacc as yacc
import mylexer # Import lexer information
tokens = mylexer.tokens # Need token list

def p_assign(p):
'''assign : NAME EQUALS expr'''

def p_expr(p): grammar rules encoded

'''expr : expr PLUS term as functions with names
| expr MINUS term p_rulename
| term'''
def p_term(p):
'''term : term TIMES factor
| term DIVIDE factor Note: Name doesn't
| factor''' matter as long as it
def p_factor(p):
starts with p_
'''factor : NUMBER'''

yacc.yacc() # Build the parser

ply.yacc example
import ply.yacc as yacc
import mylexer # Import lexer information
tokens = mylexer.tokens # Need token list

def p_assign(p):
'''assign : NAME EQUALS expr'''

def p_expr(p):
'''expr : expr PLUS term
| expr MINUS term
| term''' docstrings contain
def p_term(p): grammar rules
'''term : term TIMES factor from BNF
| term DIVIDE factor
| factor'''
def p_factor(p):
'''factor : NUMBER'''

yacc.yacc() # Build the parser

ply.yacc example
import ply.yacc as yacc
import mylexer # Import lexer information
tokens = mylexer.tokens # Need token list

def p_assign(p):
'''assign : NAME EQUALS expr'''

def p_expr(p):
'''expr : expr PLUS term
| expr MINUS term
| term'''
def p_term(p):
'''term : term TIMES factor
| term DIVIDE factor
| factor'''
def p_factor(p):
'''factor : NUMBER'''
Builds the parser
yacc.yacc() # Build the parser
using introspection
ply.yacc parsing
• yacc.parse() function
yacc.yacc() # Build the parser
...
data = "x = 3*4+5*6"
yacc.parse(data) # Parse some text

• This feeds data into lexer

• Parses the text and invokes grammar rules
A peek inside

• PLY uses LR-parsing. LALR(1)

• AKA: Shift-reduce parsing
• Widely used parsing technique
• Table driven
General Idea
• Input tokens are shifted onto a parsing stack
Stack Input
X = 3 * 4 + 5
NAME = 3 * 4 + 5
NAME = 3 * 4 + 5
NAME = NUM * 4 + 5

• This continues until a complete grammar rule

appears on the top of the stack
General Idea
• If rules are found, a "reduction" occurs
Stack Input
X = 3 * 4 + 5
NAME = 3 * 4 + 5
NAME = 3 * 4 + 5
NAME = NUM * 4 + 5
reduce factor : NUM

NAME = factor

• RHS of grammar rule replaced with LHS

Rule Functions

• During reduction, rule functions are invoked

def p_factor(p):
‘factor : NUMBER’

• Parameter p contains grammar symbol values

def p_factor(p):
‘factor : NUMBER’

p[0] p[1]
Using an LR Parser

• Rule functions generally process values on

right hand side of grammar rule
• Result is then stored in left hand side
• Results propagate up through the grammar
• Bottom-up parsing
Example: Calculator
def p_assign(p):
‘’’assign : NAME EQUALS expr’’’
vars[p[1]] = p[3]

def p_expr_plus(p):
‘’’expr : expr PLUS term’’’
p[0] = p[1] + p[3]

def p_term_mul(p):
‘’’term : term TIMES factor’’’
p[0] = p[1] * p[3]

def p_term_factor(p):
'''term : factor'''
p[0] = p[1]

def p_factor(p):
‘’’factor : NUMBER’’’
p[0] = p[1]
Example: Parse Tree
def p_assign(p):
‘’’assign : NAME EQUALS expr’’’
p[0] = (‘ASSIGN’,p[1],p[3])

def p_expr_plus(p):
‘’’expr : expr PLUS term’’’
p[0] = (‘+’,p[1],p[3])

def p_term_mul(p):
‘’’term : term TIMES factor’’’
p[0] = (‘*’,p[1],p[3])

def p_term_factor(p):
'''term : factor'''
p[0] = p[1]

def p_factor(p):
‘’’factor : NUMBER’’’
p[0] = (‘NUM’,p[1])
Example: Parse Tree
>>> t = yacc.parse("x = 3*4 + 5*6")
>>> t
('ASSIGN','x',('+',
('*',('NUM',3),('NUM',4)),
('*',('NUM',5),('NUM',6))
)
)
>>>

ASSIGN

'x' '+'

'*' '*'

3 4 5 6
Why use PLY?

• There are many Python parsing tools

• Some use more powerful parsing algorithms
• Isn't parsing a "solved" problem anyways?
PLY is Informative
• Compiler writing is hard
• Tools should not make it even harder
• PLY provides extensive diagnostics
• Major emphasis on error reporting
• Provides the same information as yacc
PLY Diagnostics
• PLY produces the same diagnostics as yacc
• Yacc
% yacc grammar.y
4 shift/reduce conflicts
2 reduce/reduce conflicts

• PLY
% python mycompiler.py
yacc: Generating LALR parsing table...
4 shift/reduce conflicts
2 reduce/reduce conflicts

• PLY also produces the same debugging output

Debugging Output
Grammar state 10
Rule 1 statement -> NAME = expression (1) statement -> NAME = expression .
Rule 2 statement -> expression (3) expression -> expression . + expression
Rule 3 expression -> expression + expression (4) expression -> expression . - expression
Rule 4 expression -> expression - expression (5) expression -> expression . * expression
Rule 5 expression -> expression * expression (6) expression -> expression . / expression
Rule 6 expression -> expression / expression
Rule 7 expression -> NUMBER $end reduce using rule 1 (statement -> NAME = expression .)
+ shift and go to state 7
Terminals, with rules where they appear - shift and go to state 6
* shift and go to state 8
* : 5 / shift and go to state 9
+ : 3
- : 4
/ : 6
= : 1 state 11
NAME : 1
NUMBER : 7 (4) expression -> expression - expression .
error : (3) expression -> expression . + expression
(4) expression -> expression . - expression
Nonterminals, with rules where they appear (5) expression -> expression . * expression
(6) expression -> expression . / expression
expression : 1 2 3 3 4 4 5 5 6 6
statement : 0 ! shift/reduce conflict for + resolved as shift.
! shift/reduce conflict for - resolved as shift.
Parsing method: LALR ! shift/reduce conflict for * resolved as shift.
! shift/reduce conflict for / resolved as shift.
state 0 $end reduce using rule 4 (expression -> expression - expression .)
+ shift and go to state 7
(0) S' -> . statement - shift and go to state 6
(1) statement -> . NAME = expression * shift and go to state 8
(2) statement -> . expression / shift and go to state 9
(3) expression -> . expression + expression
(4) expression -> . expression - expression ! + [ reduce using rule 4 (expression -> expression - expression .) ]
(5) expression -> . expression * expression ! - [ reduce using rule 4 (expression -> expression - expression .) ]
(6) expression -> . expression / expression ! * [ reduce using rule 4 (expression -> expression - expression .) ]
(7) expression -> . NUMBER ! / [ reduce using rule 4 (expression -> expression - expression .) ]
NAME shift and go to state 1
NUMBER shift and go to state 2

expression shift and go to state 4

statement shift and go to state 3

state 1

(1) statement -> NAME . = expression

= shift and go to state 5

Debugging Output
... Grammar state 10
state
Rule 1
11 statement -> NAME = expression (1) statement -> NAME = expression .
Rule 2 statement -> expression (3) expression -> expression . + expression
Rule 3 expression -> expression + expression (4) expression -> expression . - expression
(4) expression -> expression
Rule
Rule
4
5
expression -> expression - expression
expression -> expression * expression
- expression . (5) expression -> expression . * expression
(6) expression -> expression . / expression
(3) expression -> expression
Rule
Rule
6
7
expression -> expression / expression
expression -> NUMBER
. + expression
(4) expression -> expression
Terminals, with rules where they appear
. - expression $end
+
reduce using
shift and go
rule 1 (statement -> NAME = expression .)
to state 7

*
(5) expression : 5
-> expression . * expression -* shift and go
shift and go
to state 6
to state 8
+ (6) expression
-
: 3
: 4
-> expression . / expression / shift and go to state 9

/ : 6
= : 1 state 11
! shift/reduce conflict for + resolved as shift.
NAME
NUMBER
:
:
1
7 (4) expression -> expression - expression .
! shift/reduce conflict for - resolved as shift.
error : (3) expression -> expression . + expression
(4) expression -> expression . - expression
!Nonterminals,
shift/reducewith rules where conflict
they appear for * resolved as shift. (5) expression -> expression . * expression
(6) expression -> expression . / expression
!statement
shift/reduce
expression :
: 0
1 2 3 conflict
3 4 4 5 5 6 6 for / resolved as shift.
! shift/reduce conflict for + resolved as shift.
$end
Parsing method: LALR
reduce using rule 4 (expression ->conflict
! shift/reduce expression - shift.
for - resolved as expression .)
! shift/reduce conflict for * resolved as shift.
+
state 0
shift and go to state 7 ! shift/reduce conflict for / resolved as shift.
$end reduce using rule 4 (expression -> expression - expression .)
-(0) S' -> . statement shift and go to state 6 + shift and go to state 7
- shift and go to state 6
*(1) shift
statement -> . NAME = expression
(2) statement -> . expression
and go to state 8 * shift and go to state 8
/ shift and go to state 9
/(4) expression -> . expression shift
(3) expression -> . expression + and
expression
- expression
go to state 9
! + [ reduce using rule 4 (expression -> expression - expression .) ]
(5) expression -> . expression * expression ! - [ reduce using rule 4 (expression -> expression - expression .) ]
(6) expression -> . expression / expression ! * [ reduce using rule 4 (expression -> expression - expression .) ]
! + (7) expression -> . NUMBER [ reduce using rule 4 (expression
! / ->[ expression
reduce using rule 4 - expression
(expression -> expression - .) ]
expression .) ]

! -NAME
NUMBER
shift and go
shift and go
to[state
reduce
to state 2
1 using rule 4 (expression -> expression - expression .) ]
! * [ reduce using rule 4 (expression -> expression - expression .) ]
! /expression
statement
[shift
reduce using
and go to state 4
shift and go to state 3
rule 4 (expression -> expression - expression .) ]
...state 1
(1) statement -> NAME . = expression

= shift and go to state 5

PLY Validation
• PLY validates all token/grammar specs
• Duplicate rules
• Malformed regexs and grammars
• Missing rules and tokens
• Unused tokens and rules
• Improper function declarations
• Infinite recursion
Error Example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
t_MINUS = r'-' example.py:12: Rule t_MINUS redefined.
t_POWER = r'\^' Previously defined on line 6

def t_NUMBER():
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

Error Example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
t_MINUS = r'-'
t_POWER = r'\^' lex: Rule 't_POWER' defined for an
unspecified token POWER
def t_NUMBER():
r’\d+’
t.value = int(t.value)
return t

lex.lex() # Build the lexer

def t_NUMBER(): example.py:15: Rule 't_NUMBER' requires

r’\d+’ an argument.
t.value = int(t.value)
return t

lex.lex() # Build the lexer

PLY is Yacc
• PLY supports all of the major features of
Unix lex/yacc
• Syntax error handling and synchronization
• Precedence specifiers
• Character literals
• Start conditions
• Inherited attributes
Precedence Specifiers
• Yacc
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc UMINUS
...
expr : MINUS expr %prec UMINUS {
$$ = -$1;
}

• PLY
precedence = (
('left','PLUS','MINUS'),
('left','TIMES','DIVIDE'),
('nonassoc','UMINUS'),
)
def p_expr_uminus(p):
'expr : MINUS expr %prec UMINUS'
p[0] = -p[1]
Character Literals
• Yacc
expr : expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
;

• PLY
def p_expr(p):
'''expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr'''
...
Error Productions
• Yacc
funcall_err : ID LPAREN error RPAREN {
printf("Syntax error in arguments\n");
}
;

• PLY
def p_funcall_err(p):
'''ID LPAREN error RPAREN'''
print "Syntax error in arguments\n"
PLY is Simple
• Two pure-Python modules. That's it.
• Not part of a "parser framework"
• Use doesn't involve exotic design patterns
• Doesn't rely upon C extension modules
• Doesn't rely on third party tools
PLY is Fast
• For a parser written entirely in Python
• Underlying parser is table driven
• Parsing tables are saved and only regenerated if
the grammar changes
• Considerable work went into optimization
from the start (developed on 200Mhz PC)
PLY Performance
• Parse file with 1000 random expressions
(805KB) and build an abstract syntax tree
• PLY-2.3 : 2.95 sec, 10.2 MB (Python)

• DParser : 0.71 sec, 72 MB (Python/C)

• BisonGen : 0.25 sec, 13 MB (Python/C)

• Bison : 0.063 sec, 7.9 MB (C)

• 12x slower than BisonGen (mostly C)

• 47x slower than pure C
• System: MacPro 2.66Ghz Xeon, Python-2.5
Class Example
import ply.yacc as yacc

class MyParser:
def p_assign(self,p):
‘’’assign : NAME EQUALS expr’’’
def p_expr(self,p):
‘’’expr : expr PLUS term
| expr MINUS term
| term’’’
def p_term(self,p):
‘’’term : term TIMES factor
| term DIVIDE factor
| factor’’’
def p_factor(self,p):
‘’’factor : NUMBER’’’
def build(self):
self.parser = yacc.yacc(object=self)
Limitations
• LALR(1) parsing
• Not easy to work with very complex grammars
(e.g., C++ parsing)
• Retains all of yacc's black magic
• Not as powerful as more general parsing
algorithms (ANTLR, SPARK, etc.)
• Tradeoff : Speed vs. Generality

CS1352 May09
100% (1)
CS1352 May09
14 pages
TCS Aspire Basics of Programming Assignment Solutions
100% (1)
TCS Aspire Basics of Programming Assignment Solutions
6 pages
Python - Notes
0% (1)
Python - Notes
128 pages
Class Three
No ratings yet
Class Three
74 pages
SPCC Viva Question PDF
100% (4)
SPCC Viva Question PDF
36 pages
Dafny Reference Manual: Richard L. Ford K. Rustan M. Leino
No ratings yet
Dafny Reference Manual: Richard L. Ford K. Rustan M. Leino
172 pages
Programming Java
No ratings yet
Programming Java
416 pages
CSC 108H1 S 2012 Test 1 Duration - 45 Minutes Aids Allowed: None
No ratings yet
CSC 108H1 S 2012 Test 1 Duration - 45 Minutes Aids Allowed: None
6 pages
# Blurs An Image: "Bridge - BMP" "Out - BMP"
No ratings yet
# Blurs An Image: "Bridge - BMP" "Out - BMP"
61 pages
phases of compiler
No ratings yet
phases of compiler
36 pages
Optimizing and Interfacing With Cython: Centre de Biophysique Moléculaire (Orléans) Synchrotron Soleil (ST Aubin)
No ratings yet
Optimizing and Interfacing With Cython: Centre de Biophysique Moléculaire (Orléans) Synchrotron Soleil (ST Aubin)
22 pages
unit1
No ratings yet
unit1
23 pages
Theory of Programming Languages
No ratings yet
Theory of Programming Languages
10 pages
SIAM Workshop Introduction To Python For Mathematicians and Scientists
No ratings yet
SIAM Workshop Introduction To Python For Mathematicians and Scientists
83 pages
1 - Introducntion To NLP
No ratings yet
1 - Introducntion To NLP
43 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
CD Lab Manual
100% (1)
CD Lab Manual
55 pages
Py3K RST
No ratings yet
Py3K RST
16 pages
052 SyntaxDirectedTranslation
No ratings yet
052 SyntaxDirectedTranslation
57 pages
Compiler Lecture 3
No ratings yet
Compiler Lecture 3
16 pages
Information Extraction On Tourism Domain Using SpaCy and BERT
No ratings yet
Information Extraction On Tourism Domain Using SpaCy and BERT
15 pages
SPCC EXP7
No ratings yet
SPCC EXP7
8 pages
Chapter 5 Intermediate Code Generaration-1
No ratings yet
Chapter 5 Intermediate Code Generaration-1
31 pages
Lecture 6
No ratings yet
Lecture 6
90 pages
Introduction To Tools - Python: 1 Assignment
No ratings yet
Introduction To Tools - Python: 1 Assignment
7 pages
23BCE10004BCEASSIGN
No ratings yet
23BCE10004BCEASSIGN
10 pages
python_history
No ratings yet
python_history
62 pages
Compiler Design LAB: Submitted by
No ratings yet
Compiler Design LAB: Submitted by
12 pages
CS Mid term Evening
No ratings yet
CS Mid term Evening
8 pages
DA1_cd_22BLC1161
No ratings yet
DA1_cd_22BLC1161
18 pages
Python C++
No ratings yet
Python C++
82 pages
Thesis
No ratings yet
Thesis
23 pages
Info1110 Cheatsheet
No ratings yet
Info1110 Cheatsheet
6 pages
Syllabus 305
No ratings yet
Syllabus 305
3 pages
Java IO
No ratings yet
Java IO
34 pages
Python cheatsheet
No ratings yet
Python cheatsheet
2 pages
PDF Python Programming Compress
No ratings yet
PDF Python Programming Compress
5 pages
PDFGallery 20240922 163532
No ratings yet
PDFGallery 20240922 163532
4 pages
Grammar
No ratings yet
Grammar
3 pages
Lec 03 TPL
No ratings yet
Lec 03 TPL
28 pages
Optimiziongpython
No ratings yet
Optimiziongpython
22 pages
Basics
No ratings yet
Basics
39 pages
Dayanand Anglo Vedic Public School, Airoli Term-I Sample Paper 2020-21 Sub: Computer Science STD: Xi M. Marks: 40 Date:30/09/2020 Duration: 2 Hrs
No ratings yet
Dayanand Anglo Vedic Public School, Airoli Term-I Sample Paper 2020-21 Sub: Computer Science STD: Xi M. Marks: 40 Date:30/09/2020 Duration: 2 Hrs
3 pages
Mariadb
No ratings yet
Mariadb
222 pages
CEI 613 Assignment 1
No ratings yet
CEI 613 Assignment 1
2 pages
Comp Sci 321: Programming Languages - Homework 5
No ratings yet
Comp Sci 321: Programming Languages - Homework 5
4 pages
Programming Fundamentals: Lecturer XXX
No ratings yet
Programming Fundamentals: Lecturer XXX
30 pages
This Is Python Tutorial: Print
No ratings yet
This Is Python Tutorial: Print
23 pages
Cython Tutorial: Cython - An Overview Installing Cython
No ratings yet
Cython Tutorial: Cython - An Overview Installing Cython
11 pages
Unit-4: IO and Multi Threading
No ratings yet
Unit-4: IO and Multi Threading
41 pages
Cs50.harvard - Edu-Lecture 6 - CS50x 2021
No ratings yet
Cs50.harvard - Edu-Lecture 6 - CS50x 2021
15 pages
Resumen Python
No ratings yet
Resumen Python
13 pages
Cython Tutorial
No ratings yet
Cython Tutorial
47 pages
Python
No ratings yet
Python
77 pages
Introduction To Python: 6th School On LHC Physics
No ratings yet
Introduction To Python: 6th School On LHC Physics
24 pages
Python
No ratings yet
Python
51 pages
Python 1 QP
No ratings yet
Python 1 QP
10 pages
Python核心编程第二版附录A
No ratings yet
Python核心编程第二版附录A
10 pages
Cython Tutorial: Release 0.28.2
No ratings yet
Cython Tutorial: Release 0.28.2
81 pages
Lecture 01 CC
No ratings yet
Lecture 01 CC
21 pages
Report
No ratings yet
Report
20 pages
Lexical Analysis-1
No ratings yet
Lexical Analysis-1
9 pages
PROGRES2015 Cours1
No ratings yet
PROGRES2015 Cours1
105 pages
Python Unit1 Solved Questions
No ratings yet
Python Unit1 Solved Questions
32 pages
W.E.F. 11 Dec 2018: Faculty: Prof. Kamal Mistry
No ratings yet
W.E.F. 11 Dec 2018: Faculty: Prof. Kamal Mistry
9 pages
Python Programming: Time: 2 HRS.) (Marks: 75
No ratings yet
Python Programming: Time: 2 HRS.) (Marks: 75
18 pages
CS_Worksheet_-_1_Class_-_12[1]
No ratings yet
CS_Worksheet_-_1_Class_-_12[1]
2 pages
Lecture 6 - CS50 Python
No ratings yet
Lecture 6 - CS50 Python
23 pages
Course Name/ Code Lab Manual: 1. Objectives
No ratings yet
Course Name/ Code Lab Manual: 1. Objectives
6 pages
WWW Tutorialspoint Com Compiler Design Compiler Design Syntax Analysis HTM
No ratings yet
WWW Tutorialspoint Com Compiler Design Compiler Design Syntax Analysis HTM
11 pages
Python-An Innovation: Chapter - 2
No ratings yet
Python-An Innovation: Chapter - 2
16 pages
Cython Tutorial - How To Speed Up Python - InfoWorld
No ratings yet
Cython Tutorial - How To Speed Up Python - InfoWorld
10 pages
Mca Syllabus
No ratings yet
Mca Syllabus
44 pages
Curs 1
No ratings yet
Curs 1
44 pages
12 CS - 22-23 (1) - 24-24
No ratings yet
12 CS - 22-23 (1) - 24-24
4 pages
Python Cheatsheet
No ratings yet
Python Cheatsheet
17 pages
Cython
No ratings yet
Cython
35 pages
Lab 1 - Introduction To Python
No ratings yet
Lab 1 - Introduction To Python
12 pages
Lab Manual-CC
No ratings yet
Lab Manual-CC
19 pages
Solution - Python Programming (203105211) Mid Sem
No ratings yet
Solution - Python Programming (203105211) Mid Sem
5 pages
Csi 2120 Midterm Cheat Sheet
No ratings yet
Csi 2120 Midterm Cheat Sheet
3 pages
Adobe Scan 08 Oct 2024
No ratings yet
Adobe Scan 08 Oct 2024
3 pages
Python Question Paper 1
No ratings yet
Python Question Paper 1
9 pages
Practical Assignment
No ratings yet
Practical Assignment
9 pages
Rev 1 _ q With Ans _python Basics and Flow Control 17.08.24
No ratings yet
Rev 1 _ q With Ans _python Basics and Flow Control 17.08.24
7 pages
System Programming Question Paper GTU
No ratings yet
System Programming Question Paper GTU
2 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

CS 480 Translators (Compilers)

Uploaded by

CS 480 Translators (Compilers)

Uploaded by

CS 480

weeks 2-3: SDT (cont’d), cython, lexer and lex/yacc

Instructor: Liang Huang

• auxiliary loop variable takes

>>> compiler.parse("if 3==4: print 5").node.nodes[0].__dict__

>>> import ast

CFG for AST CFG for Python

stmt = Assign(expr* targets, expr value) stmt: simple_stmt | compound_stmt

simple_stmt : "print" expr ("," expr)*

expr : int_expr | bool_expr

if_stmt : "if" bool_expr ":" (simple_stmts | suite)

simple_stmts : simple_stmt (";" simple_stmt)+

• caution on nested loops

• HW3: use PLY (lex/yacc) to

comp_op : '<' | '>' | '==' | '>=' | '<=' | '<>' | '!='

• Cython (2007~): compile a large

• gradually being replaced by PyPy

cython --embed $1.py

Python int vs C int 32-bit

• Python has arbitrary precision integer arithmetic (<type ‘long’>)

~95% faster than Python,

... static PyObject *__pyx_pf_6fib_cy_0fib(PyObject *__pyx_self, PyObject *__pyx_arg_n) {

for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_n; __pyx_t_3+=1) {

__pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_b); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0];

used in HW1 CFG in

simple_stmt : "print" expr ("," expr)*

• Python/Java/C variable names (starts with letter or _)

• Python variable/function naming convention: joined_lower

• Java variable/function naming convention: camelCase

• Python/Java class naming convention: StudlyCaps

Writing Parsers and Compilers with PLY (David Beazley)

• Lex - Lexical analysis (tokenizing)

• Variations of both tools are widely known

• Covered in compilers classes and textbooks

typecheck.c codegen.c otherstuff.c

typecheck.c codegen.c otherstuff.c

• PLY = Python Lex-Yacc

lex.lex() # Build the lexer

REALLY AWFUL INTERFACE!!

lex.lex() # Build the lexer

lex.lex() # Build the lexer

lex.lex() # Build the lexer

lex.lex() # Build the lexer

lex.lex() # Build the lexer

lex.lex() # Build the lexer

def t_NUMBER(t): docstring holds

lex.lex() # Build the lexer

lex.lex() # Build the lexer

lex.lex() # Build the lexer

tok.type # Use token

tok.type # Use token

tok.type # Use token

tok.type # Use token

keywords = {'print' : 'PRINT', ...} $ echo -e "print a" | ./hw2b.py

• use a stack of indentation levels; INDENT/DEDENT is produced whenever

• use a stack of indentation levels; INDENT/DEDENT is produced whenever

• Normally you don't use the tokenizer directly

yacc.yacc() # Build the parser

yacc.yacc() # Build the parser

def p_expr(p): grammar rules encoded

yacc.yacc() # Build the parser

yacc.yacc() # Build the parser

• This feeds data into lexer

• PLY uses LR-parsing. LALR(1)

• This continues until a complete grammar rule

• RHS of grammar rule replaced with LHS

• During reduction, rule functions are invoked

• Parameter p contains grammar symbol values

• Rule functions generally process values on

• There are many Python parsing tools

• PLY also produces the same debugging output

expression shift and go to state 4

(1) statement -> NAME . = expression

= shift and go to state 5

= shift and go to state 5

lex.lex() # Build the lexer

>>> compiler.parse("if 3==4: print 5").node.nodes[0].dict

... static PyObject __pyx_pf_6fib_cy_0fib(PyObject __pyx_self, PyObject *__pyx_arg_n) {