PLI Tutorial 3 Lexical analysis and context-free grammars (Lm.n refers to Louden, Exercise m.n.) Complete Tutorial 2. Part A. Scheme programming. 1. Given a list L, write a function that returns the sublist of L from position m of length n, where 0 <= m, n and m+n <= length(L). For example, (sublist '(2 3 5 7 11 13 17 19) 2 4) ==> (5 7 11 13). 2. Write a function to sort a list of integers using the mergesort algorithm. (Hint. Function sublist might be useful.) 3. Run-length encoding of a list. Implement the so-called run-length encoding data compression method. Consecutive copies of an element in a list are encoded as lists ( x), where is the number of copies of the element x. For example, (encode '(a a a a b c c a a d e e e e)) ==> ((4 a) (1 b) (2 c) (2 a) (1 d) (4 e)) 4. Modified run-length encoding. Modify the solution to the previous question so that if an element has no following copies it is transferred unchanged into the result list. Only elements with copies are transferred as ( x) lists. For example, (encode-modified '(a a a a b c c a a d e e e e)) ==> ((4 a) b (2 c) (2 a) d (4 e)) 5. Modified run-length decoding. Given a run-length code list generated as specified in the previous question, construct its uncompressed version. (Solutions to questions 3 and 5 are provided on the Resources page but please try to solve the questions yourself before reading the solutions.) Part B. Lexical analysis. 1, Modify the lex specification lex/tiny.l (and related files) in the Tiny compiler so that the function getToken() returns a structure containing a token type and a token string instead of returning a token type and assigning the token string to a global variable. Check that flex appears to accept your changes by executing flex on your modified specification. 2. Modify the Lex specification lex/tiny.l (and related files) in the Tiny compiler as follows: a. Add new reserved words "while" and "do". b. Allow digits and underscores in identifiers, but not at the start. c. Change the comment convention to the C convention /* ... */. Check that flex appears to accept your changes by executing flex on your modified specification. 3. Test that your specification is correct by writing and testing a driver program that repeatedly calls your generated scanner, reading from standard input, and writing the type and value of each successive token on a separate line. Part C. Context-free grammars. 1. Give a CFG for the language { s in (a|b)* | s = reverse(s) } of palindromes over terminal symbols a and b. 2. Give a CFG for the language {a^i b^j c^k | i=j or j=k}, where a, b and c are terminal symbols, and a^i denotes a sequence of i a's. Give two different leftmost derivations for the string a^2 b^2 c^2. 3. Argue informally that the CFG S -> e | a S b S | b S a S (where e denotes the empty sequence) generates all strings of a's and b's with an equal number of a's and b's. 4. Given the second CFG for expressions in the Lecture Notes (the one below that defines precedences and associativities of operators), exp -> exp addop term | term addop -> + | - term -> term mulop factor | factor mulop -> * | / factor -> ( exp ) | number give a leftmost derivation, a derivation tree and an abstract syntax tree for the expression 3-(4+5*6). 5. (L3.4) The following grammar generates all regular expressions over the alphabet of letters ("|" and "*" are terminal symbols in the language of regular expressions): R -> R "|" R | R R | R "*" | "(" R ")" | letter a. Give a leftmost derivation for the regular expression (aa | b)* using this grammar. b. Show that the grammar is unambiguous. c. Rewrite the grammar to establish the correct precedences between "*" (highest), concatenation (next highest), and "|" (lowest). 6. (L3.6 and L3.10a) Consider the following grammar for simplified Scheme expressions: lexp -> atom | list atom -> number | identifier list -> "(" lexp-seq ")" lexp-seq -> lexp | lexp-seq a. Write a leftmost and a rightmost derivation for the string (a 23 (m x y). Draw a derivation tree for the same string. b. Rewrite the grammar in extended Backus-Naur form (EBNF). 7. (L3.13) Consider the following simplification of the above grammar, which describes simple arithmetic expressions in Scheme-like prefix notation: lexp -> number | "(" op lexp-seq ")" op -> "+" | "-" | "*" | "/" lexp-seq -> lexp | lexp-seq lexp a. What meanings should be assigned to the expressions (+ 2), (- 2), (+ 2 3 4), (- 2 3 4)? b. Does this grammar capture the normal precedence and associativity of the arithmetic operators? c. Is the grammar ambiguous or unambiguous? 8. Consider the following grammar for balanced parenthesis strings: S -> e | ( S ) S a. What are first(S) and follow(S) in ths grammar? Is first(S) disjoint from follow(S)? Is the grammar LL(1)? b. Write a recursive descent parser for this grammar. Your parser need only recognise whether an input string is a balanced parenthesis string or not. 9. Consider the second CFG for expressions (the one above). What are first(A) and follow(A) for each nonterminal symbol A in the grammar? Show that the grammar is not LL(1). 10. Write an EBNF grammar for this language that is LL(1). (An EBNF grammar is LL(1) if its equivalent BNF grammar is LL(1).) Show that the grammar is LL(1).