Lexical Analysis
The first step is the lexical analysis. This phase will group characters into words. Those words are called tokens. After this step we get a more structured information where
the keywords of MiniJava, the identifiers, the integer constants and the boolean constants have been recognized. This is in this phase that commentaries and whitespaces are suppressed.
For the program given above, the scanner
, the program responsible of the lexical analysis phase, will produce the following stream of tokens. For example,
we can see that the keyword CLASS
has been recognized, as the integer constant INT_CONST 35
.
CLASS
IDENT ‘Print42‘
LBRACE
PUBLIC
STATIC
VOID
MAIN
LPAREN
STRING
LBRACKET
RBRACKET
IDENT ‘a‘
RPAREN
LBRACE
SYSO
LPAREN
INT_CONST ‘35‘
PLUS
INT_CONST ‘2‘
TIMES
INT_CONST ‘3‘
PLUS
INT_CONST ‘1‘
RPAREN
SEMICOLON
RBRACE
RBRACE
EOF
In the following video, we give an overview of the transpiler and we present the lexical analyzer.
Syntactic Analysis
The second step, the syntactic analysis, takes as input the stream of tokens and produces an abstract syntax tree which represents the structure of the program with a tree
data structure.
For the program given above, the parser
, the program responsible of the syntactic analysis phase, will produce the following tree. We can see
the arithmetic expression with the operators precedence explicitly stated (expressions deeper in the tree have more priority).
program
├name Print42
├main_args a
└main
└ISyso
└EBinOp OpAdd
├EBinOp OpAdd
│ ├EConst (ConstInt 35)
│ └EBinOp OpMul
│ ├EConst (ConstInt 2)
│ └EConst (ConstInt 3)
└EConst (ConstInt 1)
The syntactic analyser needs to know the structure of a MiniJava program. This structure is given in the form of a grammar.
The MiniJava grammar, in its EBNF form is given below.
Another version, certainly easier to read for us, using syntax diagrams is given here.
Program = MainClass { ClassDeclaration } 'eof' ;
MainClass = 'class' Identifier '{' 'public' 'static' 'void' 'main' '(' 'String' '[' ']' Identifier ')' '{' Statement '}' '}' ;
ClassDeclaration = 'class' Identifier [ 'extends' Identifier ] '{' { VarDeclaration } { MethodDeclaration } '}' ;
VarDeclaration = Type Identifier ';' ;
MethodDeclaration = 'public' Type Identifier '(' [ Type Identifier { ',' Type Identifier } ] ')' '{' { VarDeclaration } { Statement } 'return' Expression ';' '}' ;
Type = 'int' '[' ']'
| 'boolean'
| 'int'
| Identifier ;
Statement = '{' { Statement } '}'
| 'if' '(' Expression ')' Statement 'else' Statement
| 'while' '(' Expression ')' Statement
| 'System.out.println' '(' Expression ')' ';'
| Identifier '=' Expression ';'
| Identifier '[' Expression ']' '=' Expression ';' ;
Expression = Expression ( '&&' | '<' | '+' | '-' | '*' ) Expression
| Expression '[' Expression ']'
| Expression '.' 'length'
| Expression '.' Identifier '(' [Expression {',' Expression}] ')'
| Integer
| 'true'
| 'false'
| Identifier
| 'this'
| 'new' 'int' '[' Expression ']'
| 'new' Identifier '(' ')'
| '!' Expression
| '(' Expression ')' ;
Letter = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G'
| 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N'
| 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U'
| 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i'
| 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p'
| 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w'
| 'x' | 'y' | 'z' ;
Digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
Integer = Digit { Digit } ;
Character = Letter | Digit | '_' ;
Identifier = Letter { Letter | Digit | '_' } ;
The following video presents the syntactic analyzer of MiniJava.
Typechecker
The third step, the typechecker, takes as input the abstract syntax tree and will check if the types are correct. For example, the typechecker will check if we call
a method with the correct number of arguments, that the operators +
and *
are used with integer operands, that a class is compatible with another one using
the class hierarchy, …
The following video presents an overview of the MiniJava typechecker.
Code Generation
The last step is the code generator phase. We will again visit the abstract syntax tree to generate C code. For our running example,
we get the following C source file.
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("%d\n", ((35 + (2 * 3)) + 1));
return 0;
}
The following video briefly describes the C code generation in MiniJava.
Semantic of a MiniJava Program
The semantic of a MiniJava program is given by its semantic as a Java program.
The main restrictions are
- No class has
Object
as a superclass.
- No
super
keyword.
- Only a default constructor.
- The only types available are
int
.
boolean
.
int[]
.
- User defined classes.
- No overloading.
- All methods return a value. The
return
statement is at the end.
System.out.println()
only prints integers.
- No interfaces, no exceptions, no generic types, no lambdas.
We explain in the following video the main differences between Java and MiniJava.
We described briefly in the video the default initialization mechanism in Java (and MiniJava).
We give further details below. You can look here for all the details.
Let’s consider the following Java program Init.java
.
1class Init {
2 public static void main(String[] args) {
3 int i;
4 boolean b;
5 int[] a;
6 System.out.println(i + " " + b + " " + a);
7 }
8}
When we compile it, we get the following error messages.
$ javac Init.java
Init.java:6: error: variable i might not have been initialized
System.out.println(i + " " + b + " " + a);
^
Init.java:6: error: variable b might not have been initialized
System.out.println(i + " " + b + " " + a);
^
Init.java:6: error: variable a might not have been initialized
System.out.println(i + " " + b + " " + a);
^
3 errors
Indeed, the three variables i
, b
and a
are put on the stack when main
is called. Java does not initialize
local variables, and their values are whatever was currently in those locations on the stack.
The compiler rejects this program because of the arbitrary values that go into those variables, by telling
us that those variables have not been initialized.
On the other hand, the following program is correct, because attributes have a default value.
1class Init {
2 public static void main(String[] args) {
3 new Default().print();
4 }
5}
6
7class Default {
8 private int i;
9 private boolean b;
10 private int[] a;
11
12 public void print() {
13 System.out.println(i + " " + b + " " + a);
14 }
15}
And we get the following output.
$ javac Init.java
$ java Init
0 false null
Review of Java’s Dynamic Binding
We will review dynamic binding in Java in the following video, because the management of dynamic binding will
be a difficult point to handle in our transpiler.
Questions
Let’s have a look again at the MiniJava grammar.
We would like to add to the language
- The
==
operator.
- Constructors.
private
methods and constructors.
How to update the grammar to add those new elements?
In the previous question, what terminals did you add to the grammar (a terminal in the grammar will become a token for the scanner)?
Consider the following Java program.
1class A {
2 public int m1(int n) {
3 System.out.println("int A:m1(int n)");
4 return 0;
5 }
6}
7
8class B extends A {
9 public boolean m1(int n) {
10 System.out.println("boolean B:m1(int n)");
11 return false;
12 }
13}
Does this code compile?
This code doesn’t compile. Indeed, on line 9, the m1
method overrides the m1
method of line 2:
It has the same name and the same parameters. But to correctly overrides the m1
method of line 2,
the return type must be compatible, and boolean
is not compatible with int
.
Consider the following Java program.
1class A {
2 public int m1(int n) {
3 System.out.println("int A:m1(int n)");
4 return 0;
5 }
6 public boolean m1(int n) {
7 System.out.println("boolean A:m1(int n)");
8 return false;
9 }
10}
Does this code compile?
This code doesn’t compile. Indeed, the return type is not used to differentiate methods.
Therefore, even if the methods on lines 2 and 6 have different return types, because they have the same
name and the same parameters, this is not overloading and so we have an error.
Consider the following Java program.
1class A {
2 public int m1(A a) {
3 System.out.println("int A:m1(A a)");
4 return 0;
5 }
6 public boolean m1(B b) {
7 System.out.println("boolean A:m1(B b)");
8 return false;
9 }
10}
11
12class B extends A {
13}
Does this code compile?
This code compiles. This time, the method on line 6 overloads the method on line 2 because the parameter
is of a different type.
Ressources