C-Minus词法分析和语法分析设计编译器编译原理课程设计

编译原理课程设计报告

课题名称: C- Minus词法分析和语法分析设计

1. 课程设计目标

实验建立C-编译器。只含有扫描程序(scanner)和语法分析(parser)部分。

2. 分析与设计

C-编译器设计的整体框架,本实验实现扫描处理和语法分析程序(图中粗黑部分)。

打开源代码文件source.txt扫描处理(词法分析)记号语法分析程序语法树语义分析程序错误处理器注释树记号表源代码优化程序文字表中间代码代码生成器目标代码目标代码优化程序目标代码

2.1 、扫描程序scanner部分 2.1.1系统设计思想

设计思想:根据DFA图用switch-case结构实现状态转换。

惯用词法:

① 语言的关键字:else if int return void while ② 专用符号:+ - * / < <= > >= == != = ; , ( ) [ ] { } /* */

③ 其他标记是ID和NUM,通过下列正则表达式定义: ID = letter letter* NUM = digit digit*

letter = a|..|z|A|..|Z digit = 0|..|9

大写和小写字母是有区别的

④ 空格由空白、换行符和制表符组成。空格通常被忽略,除了它必须分开ID、NUM关键字。

⑤ 注释用通常的C语言符号/ * . . . * /围起来。注释可以放在任何空白出现的位置(即注释不能放在标记内)上,且可以超过一行。注释不能嵌套

scanner的DFA+ - * <= >= == != ; , ( ) [] { }INASSIGNWhite space \\t \\n>,<,=,!digitNUM[other]=digitSTARTletter/INIDletter[other]DONE[other]ZHU*/INCOMMENT[other][other][other]*ZZHU*

说明:当输入的字符使DFA到达接受状态的时候,则可以确定一个单词了。初始状态设置为START,当需要得到下一个token时,取得次token的第一个字符,并且按照DFA与对此字符的类型分析,转换状态。重复此步骤,直到DONE为止,输出token类型。当字符为“/”时,状态转换为SLAH再判断下一个字符,如果为“*”则继续转到INCOMMENT,最后以“*”时转到ENDCOMMENT状态,表明是注释,如果其他的则是字符停滞于当前字符,并且输出“/”。 2.1.2程序流程图

源文件读取源文件一行输出读取一个字符否是否到达终态是赋值token输出token

2.1.3 各文件或函数的设计说明

扫描程序用到:scanner.h,scanner.cpp ? scanner.h:声明词法状态,词法分析 //DFA中的状态 typedef enum {

START = 1, INNUM, INID, INDBSYM, DONE

} DFAState;

//定义的Token的类型(31种),分别对应于else、if、int、return、void、while、+、-、*、/、<、<=、>、>=、==、!=、=、;、,、(、)、[、]、{、}、/*、*/、num、id、错误、结束 typedef enum {

ELSE = 1,IF,INT,RETURN,VOID,WHILE,

PLUS,MINUS,TIMES,OVER,LT,LEQ,GT,GEQ,EQ,NEQ,ASSIGN,SEMI,COMMA,LPAREN,RPAREN,LMBRACKET,RMBRACKET,LBBRACKET,RBBRACKET,LCOMMENT,RCOMMENT, NUM,ID,ERROR,ENDFILE } TokenType;

//定义的Token结构体,包括类型、对应的串、所在代码的行号 struct Token {

TokenType tokenType; string tokenString; int lineNo; };

//每种TokenType对应的串,如tokenTypeString[ELSE]==\

const string tokenTypeString[32] = {\\\\\\\\\\\\\\\\\\\\\\\\\\\\

class Scanner:定义scanner.cpp中函数 ? scanner.cpp文件函数说明

void Scanner :: scan():设置输出结果界面以及设置各种输出状态。

if(scanSuccess==false)

cout<<\词法分析出错!\cout<<\词法分析成功了!\else

printToken();/*输出Token到文件Token.txt中*/

//正在删除注释

void Scanner :: deleteComments()

TokenType Scanner :: returnTokenType(string s)//返回Token的类型 DFAState Scanner :: charType(char c)//返回字符的类型 typedef enum { ENDFILE,ERROR,

IF,ELSE,INT,RETURN,VOID,WHILE, //关键字

ID,NUM,

ASSIGN,PLUS,MINUS,TIMES,OVER,EQ,UEQ,LT,LPAREN,RPAREN,SEMI,BT,LQ,BQ, DOU,LZGH,RZGH,LDGH,RDGH,//特殊字符:= + - * / == != < 等 } TokenType;

2.1.4 测试程序说明

根据附录A后面的例子,程序输入两个整数,计算并打印出它们的最大公因子,保存为a.txt。

/* A program to perform Eucild's Algorithm to compute gcd. */ int gcd (int u, int v) {

if (v==0) return u; else return

gcd(v,u-u/v*v); /* u-u/v*v== u mod v */ }

void main(void) { int x; int y; x=input(); y=input(); output(gcd(x,y)); }

2.2、语法分析parse部分

2.2.1系统设计思想

设计思想:parser用递归下降分析方法实现,通过调用词法分析函数getToken实现语法分析。

根据C-语言的规则,得出BNF语法如下:

1.program->declaration-list

2.declaration-list->declaration-list declaration | declaration 3.declaration->var-declaration|fun-declaration

4.var-declaration->type-specifier ID;|type-specfier ID[NUM] 5.type-specifier->int|void

6.fun-specifier ID(parans) compound-stmt 7.params->params-list|void

8.param-list->param-list,param|param

9.param->type-specifier ID|type-specifier ID [] 10.compound-stmt->{local-declarations statement-list}

11.local-declarations->local-declarations var-declaration|empty 12.statement-list->statement-list statement|empty

13.statement->expression-stmt|compound-stmt|selection-stmt|iteration-stmt|return-stmt

14.expression-stmt->expression;|;

15.selection-stmt->if(expression)statement|if(expression)statement else

statement

16.iteration-stmt->while(expression)statement 17.return-stmt->return ;|return expression; 18.expression->var=expression|simple-expression 19.var->ID|ID[expression]

20.simple-expression->additive-expression

additive-expression|additive-expression

21.relop-><=|<|>|>=|==|!=

22.additive-expression->additive-expression addop term|term 23.addop->+|-

24.term->term mulop factor|factor 25.mulop->*|/

26.factor->(expression)|var|call|NUM 27.call->ID(args) 28.args->arg-list|empty

29.arg-list->arg-list,expression|expression 2.1.2语法分析程序流程图

relop

符号表文件获取终结符和非终结符构造文法分析表分析句型句子输出结果 2.1.3 各文件或函数的设计说明

语法分析程序包括:parser.cpp,parser.h ? parser.cpp:

Parser :: Parser()//界面设计

Token Parser :: getToken()//获取scanner中保存在TokenList数组中的Token,并且每次获取完之后数组下标指向下一个

void Parser :: syntaxError(string s)//出错处理 void Parser :: match(TokenType ex)//匹配出错 TreeNode * Parser :: declaration(void)//类型匹配错误

TreeNode * Parser :: param_list(TreeNode * k)//k可能是已经被取出来的VoidK,但又不是(void)类型的参数列表,所以一直传到param中去,作为其一个子节点 ? parse.h:对parse.c的函数声明

//19种节点类型,分别表示int、id、void、数值、变量声明、数组声明、函数声明、函数声明参数列表、函数声明参数、复合语句体、if、while、return、赋值、运算、数组元素、函数调用、函数调用参数列表、未知节点

typedef enum {IntK, IdK, VoidK, ConstK, Var_DeclK, Arry_DeclK, FunK, ParamsK, ParamK, CompK, Selection_StmtK, Iteration_StmtK, Return_StmtK, AssignK, OpK, Arry_ElemK, CallK, ArgsK, UnkownK} Nodekind; typedef enum {Void,Integer} ExpType;

ofstream fout_Tree(\输出语法树到文件

//treeNode定义 包括子节点、兄弟节点、所处行号、节点类型、属性、表达式返回类型 typedef struct treeNode

TreeNode * newNode(Nodekind k);//根据节点类型新建节点

TreeNode * declaration_list(void); TreeNode * declaration(void); TreeNode * params(void);

TreeNode * param_list(TreeNode * k); TreeNode * param(TreeNode * k); TreeNode * compound_stmt(void); TreeNode * local_declaration(void); TreeNode * statement_list(void); TreeNode * statement(void);

TreeNode * expression_stmt(void); TreeNode * selection_stmt(void); TreeNode * iteration_stmt(void); TreeNode * return_stmt(void); TreeNode * expression(void); TreeNode * var(void);

TreeNode * simple_expression(TreeNode * k); TreeNode * additive_expression(TreeNode * k); TreeNode * term(TreeNode * k); TreeNode * factor(TreeNode * k); TreeNode * call(TreeNode * k); TreeNode * args(void); 2.1.4 测试程序说明

根据附录A后面的例子,程序输入两个整数,计算并打印出它们的最大公因子,保存为a.txt。

/* A program to perform Eucild's

Algorithm to compute gcd. */ int gcd (int u, int v) {

if (v==0) return u; else return

gcd(v,u-u/v*v); /* u-u/v*v== u mod v */ }

void main(void) { int x; int y; x=input(); y=input(); output(gcd(x,y)); }

3. 程序代码实现

按文件列出主要程序代码, 添加必要的注释.

Scanner.cpp:

#include #include #include #include #include \#include using namespace std; /*

Name: 词法分析器 Copyright: Author: XXX Date: 19-05-14 12:00 Description: 提取出token */

Scanner :: Scanner() {

scanSuccess = true; charIndex = 0; str = \

commentFlag = true; sourseString = \ lineCount = 0; }

void Scanner :: scan() {

cout<<\开始词法分析...\ bool doubleSym = false; int state = START; lineCount = 0; char ch; while(state<6) {

if(START==state)//初始状态和空格 {

state = charType(ch); if(state!=START) ch = getNextChar(); if('\\0'==ch) { }

Token t;

t.lineNo = lineCount; t.tokenString = \t.tokenType = ENDFILE; tokenList.push_back(t); break;

getSourseStringFromFile(\

}

str += ch;

else if(INNUM==state)//digit { } { } { }

if(DONE==state)//接收状态 {

int tp = 0; if('\\n'==ch)

tp = 1; if('='==ch) { } else

doubleSym = false; state = DONE;

str += ch; doubleSym = true; state = charType(ch); if(state!=INID) else

str += ch;

state = DONE; state = charType(ch); if(state!=INNUM) else

str += ch;

state = DONE;

else if(INID==state)//letter

else if(INDBSYM==state)//除了<>=!之外的各种符号

Token t;

t.lineNo = lineCount-tp; t.tokenString = str;

t.tokenType = returnTokenType(str); tokenList.push_back(t); if(ERROR==t.tokenType)

scanSuccess = false;

int lastState = charType(str[str.length()-1]); if(lastState==INNUM || lastState==INID

doubleSym==false))

backToLastChar();

str = \ state = START; if(doubleSym==true)

doubleSym = false;

}

}

if(scanSuccess==false) cout<<\词法分析出错!\

else cout<<\词法分析成功了!\ printToken();//输出Token到文件Token.txt中

}

Token Scanner :: getTokenAt(int tokenIndex) {

Token token;

token.lineNo = lineCount; token.tokenString = \ token.tokenType = ENDFILE; if(tokenIndex

||

(lastState==INDBSYM

&&

return token; }

void Scanner :: getSourseStringFromFile(string path) {

ifstream fin(path.c_str()); string temp; sourseString = \

while(getline(fin,temp)) { } fin.close(); }

void Scanner :: deleteComments() {

cout<<\正在删除注释...\ ofstream fout_Sourse(\ int state = 1; char ch; while(state<6) {

ch = getNextChar(); if('\\0'==ch)//文件结束 {

if('/'==ch) else {

state = 1; state = 2; break; if(1==state)

charIndex = 0;

sourseString += temp; sourseString += '\\n';

} { } { } {

}

fout_Sourse<

else if(2==state)

if('*'==ch) { } else { }

state = 1;

fout_Sourse<<\state = 3;

commentFlag = false;

else if(3==state)

if('*'==ch) else { }

state = 3; state = 4;

else if(4==state)

if('*'==ch) else {

state = 3; state = 4; state = 5; else if('/'==ch)

}

} { }

}

if(5==state)//结束状态,处理

commentFlag = true; state = 1;

if(!commentFlag) { } else }

TokenType Scanner :: returnTokenType(string s)//返回Token的类型 {

TokenType t; if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ {

t = RETURN; t = INT; t = IF; t = ELSE;

cout<<\注释已经成功删除!\cout<<\注释错误,没有结束符!\scanSuccess = false;

}

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

t = LEQ; t = LT; t = OVER; t = TIMES; t = MINUS; t = PLUS; t = WHILE; t = VOID;

else if(s==\ { }

else if(s==\ { }

else if(s==\ { { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ {

t = LPAREN; t = COMMA; t = SEMI; t = ASSIGN; t = NEQ; t = EQ; } t = GEQ; t = GT;

else if(s==\

}

t = RPAREN;

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { }

else if(s==\ { { }

else if(3==charType(s[s.length()-1])) { }

t = ID; t = NUM; t = RCOMMENT; }

t = LCOMMENT; t = RBBRACKET; t = LBBRACKET; t = RMBRACKET; t = LMBRACKET;

else if(2==charType(s[s.length()-1]))

else { } return t; }

DFAState Scanner :: charType(char c)//返回字符的类型 {

if(' '==c || '\\n'==c || '\\t'==c ||'\\r'==c) }

char Scanner :: getNextChar() {

if(charIndex

void Scanner :: backToLastChar()

return '\\0';

char ch = sourseString[charIndex]; charIndex++; if('\\n'==ch)

lineCount++; return ch; return START; return INNUM; return INID; return INDBSYM; return DONE; else if(c>='0'&&c<='9')

else if((c>='A'&&c<='Z')||(c>='a'&&c<='z')) else if(c=='<' || c=='>' || c=='=' || c=='!') else

t = ERROR;

{

if(charIndex>0) { } }

void Scanner :: printToken() {

ofstream fout_Token(\ ifstream fin(\ string temp; int lineCount = 0; int index = 0; while(getline(fin,temp)) {

fout_Token<

Token t = tokenList.at(index); if(lineCount==t.lineNo) {

fout_Token<<\ \ [\index++; int width = 10; string headS = \

if(t.tokenType>=1&&t.tokenType<=6)//关键字 {

string tp = \

for(int i = 0; i

tp += \

char ch = sourseString[charIndex-1]; charIndex--; if('\\n'==ch)

lineCount--;

}

fout_Token<<\:\

\

else if(t.tokenType>=7&&t.tokenType<=27)//符号 { }

else if(t.tokenType==28)//NUM { }

else if(t.tokenType==29)//ID { }

else if(t.tokenType==30)//错误 {

string tp = \

for(int i = 0; i

tp += \

error]

\

fout_Token<<\string tp = \

for(int i = 0; i

tp += \

ID]:\

fout_Token<<\string tp = \

for(int i = 0; i

tp += \

NUM]

:\

fout_Token<<\string tp = \

for(int i = 0; i

tp += \

\

fout_Token<<\

\

\

\

\

}

}

}

}

else if(t.tokenType==ENDFILE)//结束 { }

fout_Token<<\ \fout_Token<

\

if(lineCount

lineCount++;

fin.close(); fout_Token.close(); }

scanner.h:

#include #include using namespace std;

//定义的Token的类型(31种),分别对应于else、if、int、return、void、while、+、-、*、/、<、<=、>、>=、==、!=、=、;、,、(、)、[、]、{、}、/*、*/、num、id、错误、结束 typedef enum {

ELSE = 1,IF,INT,RETURN,VOID,WHILE,

PLUS,MINUS,TIMES,OVER,LT,LEQ,GT,GEQ,EQ,NEQ,ASSIGN,SEMI,COMMA,LPAREN,RPAREN,LMBRACKET,RMBRACKET,LBBRACKET,RBBRACKET,LCOMMENT,RCOMMENT,

NUM,ID,ERROR,ENDFILE } TokenType; typedef enum {

START = 1, INNUM, INID, INDBSYM, DONE } DFAState;

//定义的Token结构体,包括类型、对应的串、所在代码的行号

struct Token {

TokenType tokenType; string tokenString; int lineNo; };

//每种TokenType对应的串,如tokenTypeString[ELSE]==\

const string tokenTypeString[32] = {\\\\\\\T\\\\\\\\\\\AREN\\\\\\\\

class Scanner { public:

bool scanSuccess;//词法分析是否成功的标志

void getSourseStringFromFile(string s);//通过提供的文件名获取源代码 void deleteComments();//删除注释

void scan();//词法分析,将分析的Token放在tokenList数组中 Scanner();

Token getTokenAt(int);//根据下标从tokenList数组中获取Token private:

DFAState charType(char);//返回字符的类型,如:空格2:数字3:字母等 char getNextChar();//获取到下一个字符 void backToLastChar();

TokenType returnTokenType(string s);//根据字符串返回Token类型 void printToken();//将词法分析好的Token输出到文件Token.txt中 string sourseString;//获取源代码的字符串

int charIndex;//配合getNextChar(),指定要取的字符位置 string str;//在分析过程中保存Token对应的串 bool commentFlag;//标注注释开始的标志

int lineCount;//对行号计数,每次获取到'/n'就自增 vector tokenList;//保存的Token序列 };

Parser.cpp:

#include \#include \#include #include

using namespace std;

Parser :: Parser() { step = 0; tokenIndex = 0; Error = false; string path = \ cout<<\请输入文件名:\ cin>>path;

scanner.getSourseStringFromFile(path); scanner.deleteComments(); if(scanner.scanSuccess) {

scanner.scan(); if(scanner.scanSuccess) {

cout<<\开始语法分析...\syntaxTree = parse(); printTree(syntaxTree); if(Error) else

cout<<\语法分析成功!\cout<<\语法分析过程出错!\

} }

}

Token Parser :: getToken()//获取scanner中保存在TokenList数组中的Token,并且每次获取完之后数组下标指向下一个 {

lastToken = currentToken;

currentToken = scanner.getTokenAt(tokenIndex++); return currentToken; }

void Parser :: syntaxError(string s) {

fout_Tree<

\

Error = true; }

void Parser :: match(TokenType ex) {

if(currentToken.tokenType==ex) { getToken();

} else { syntaxError(\匹配\出错\ } }

void Parser :: printSpace(int n)

出错附近型

{

for(int i = 0; i

void Parser :: printTree(TreeNode * t) { int i;

while(t!=NULL) {

printSpace(step); switch (t->nodekind) { case VoidK:

fout_Tree<<\oidK\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\ar_DeclK\break;

fout_Tree<<\break;

fout_Tree<<\fout_Tree<<\ \

case IntK:

case IdK:

case ConstK:

case Var_DeclK:

case Arry_DeclK:

case FunK:

break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\

fout_Tree<attr.op]<

fout_Tree<<\break;

fout_Tree<<\break;

fout_Tree<<\

case ParamsK:

case ParamK:

case CompK:

case Selection_StmtK:

case Iteration_StmtK:

case Return_StmtK:

case AssignK:

case OpK:

case Arry_ElemK:

case CallK:

case ArgsK:

} }

}

break;

fout_Tree<<\:Unknown exp kind\break;

default:

step++;//进入子节点多输出空格 for(i = 0;i

step--;//进入兄弟节点时,由于进入子节点时n++了,所以要n--回来,从而输出一样t = t->sibling;

if(t->child[i]!=NULL)

printTree(t->child[i]);

的空格空格

TreeNode * Parser :: newNode(Nodekind kind) {

TreeNode * p = (TreeNode *)malloc(sizeof(TreeNode)); int k; if(p==NULL) { } else {

for(k = 0;k

p->sibling = NULL; p->nodekind = kind;

p->lineno = currentToken.lineNo;

p->child[k] = NULL; cout<<\内存分配出错!\

}

if(kind==OpK || kind==IntK || kind==IdK)

p->type = Integer; p->attr.name = \p->attr.val = 0; if(kind==IdK) if(kind==ConstK)

return p; }

TreeNode * Parser :: parse(void) {

TreeNode * t;

currentToken = getToken(); lastToken = currentToken; t = declaration_list();

if(currentToken.tokenType!=ENDFILE) { } return t; }

TreeNode * Parser :: declaration_list() {

TreeNode * t = declaration(); TreeNode * p = t;

//在开始语法分析出错的情况下找到int和void型,过滤掉int和void之前的所有Token,防止在开始时出错后面一错百错

while((currentToken.tokenType!=INT)&&(currentToken.tokenType!=VOID)&&(currentToken.tokenType!=ENDFILE)) {

syntaxError(\getToken();

syntaxError(\结束错误\

}

if(currentToken.tokenType==ENDFILE)

break;

//寻找语法分析的入口,即找到int和void

while((currentToken.tokenType==INT)||(currentToken.tokenType==VOID)) { }

match(ENDFILE); return t; }

TreeNode * Parser :: declaration(void) {

TreeNode * t = NULL; TreeNode * p = NULL; TreeNode * q = NULL; TreeNode * s = NULL;

if(currentToken.tokenType==INT) {

p = newNode(IntK); match(INT); TreeNode * q; q = declaration(); if(q!=NULL) { }

if(t==NULL) { } else { }

p->sibling = q; p = q; t = p = q;

}

else if(currentToken.tokenType==VOID) { } else { }

if((p!=NULL)&&(currentToken.tokenType==ID)) {

q = newNode(IdK);

q->attr.name = currentToken.tokenString.c_str(); match(ID);

if(currentToken.tokenType==LPAREN)//'(':函数情况 { }

else if(currentToken.tokenType==LMBRACKET)//'[':数组声明 {

t = newNode(Var_DeclK);

TreeNode * m = newNode(Arry_DeclK);

match(LMBRACKET); match(NUM); t = newNode(FunK); t->child[0] = p; t->child[1] = q; match(LPAREN); t->child[2] = params(); match(RPAREN);

t->child[3] = compound_stmt(); syntaxError(\类型匹配错误\p = newNode(VoidK); match(VOID);

} else { } }

}

s = newNode(ConstK);

s->attr.val = atoi(lastToken.tokenString.c_str()); m->child[0] = q; m->child[1] = s; t->child[0] = p; t->child[1] = m; match(RMBRACKET); match(SEMI);

else if(currentToken.tokenType==SEMI)//';'结尾:普通变量声明 { } else { }

syntaxError(\t = newNode(Var_DeclK); t->child[0] = p; t->child[1] = q; match(SEMI);

syntaxError(\

return t;

TreeNode * Parser :: params(void) {

TreeNode * t = newNode(ParamsK); TreeNode * p = NULL;

if(currentToken.tokenType==VOID)//开头为void,参数列表可能是(void)和(void id,[……])两种情况

{ }

else if(currentToken.tokenType==INT)//参数列表为(int id,[……]) { } else { } }

TreeNode * Parser :: param_list(TreeNode * k)//k可能是已经被取出来的VoidK,但又不是(void)类型的参数列表,所以一直传到param中去,作为其一个子节点 {

TreeNode * t = param(k); TreeNode * p = t;

k = NULL;//没有要传给param的VoidK,所以将k设为NULL while(currentToken.tokenType==COMMA) {

TreeNode * q = NULL; match(COMMA); syntaxError(\

t->child[0] = param_list(p); p = newNode(VoidK); match(VOID);

if(currentToken.tokenType==RPAREN)//参数列表为(void) { }

else//参数列表为(void id,[……]) ->void类型的变量 { }

t->child[0] = param_list(p); if(t!=NULL)

t->child[0] = p;

return t;

}

q = param(k); if(q!=NULL) { }

if(t==NULL) { } else { }

p->sibling = q; p = q; t = p = q;

return t; }

TreeNode * Parser :: param(TreeNode * k) {

TreeNode * t = newNode(ParamK);

TreeNode * p = NULL;//ParamK的第一个子节点 TreeNode * q = NULL;//ParamK的第二个子节点

if(k==NULL&¤tToken.tokenType==INT) { }

else if(k!=NULL) { }

if(p!=NULL) {

p = k;

p = newNode(IntK); match(INT);

} else { }

t->child[0] = p; { } else { }

if(currentToken.tokenType==ID)

q = newNode(IdK);

q->attr.name = currentToken.tokenString.c_str(); t->child[1] = q; match(ID);

syntaxError(\

if((currentToken.tokenType==LMBRACKET)&&(t->child[1]!=NULL)) { } else { }

return t;

match(LMBRACKET); t->child[2] = newNode(IdK); match(RMBRACKET);

syntaxError(\

return t; }

TreeNode * Parser :: compound_stmt(void) {

TreeNode * t = newNode(CompK);

match(LBBRACKET); t->child[0] = local_declaration(); t->child[1] = statement_list(); match(RBBRACKET); return t; }

TreeNode * Parser :: local_declaration(void) {

TreeNode * t = NULL; TreeNode * q = NULL; TreeNode * p = NULL;

while(currentToken.tokenType==INT || currentToken.tokenType==VOID) {

p = newNode(Var_DeclK); if(currentToken.tokenType==INT) { }

else if(currentToken.tokenType==VOID) { }

if((p!=NULL)&&(currentToken.tokenType==ID)) {

TreeNode * q2 = newNode(IdK); q2->attr.name = currentToken.tokenString.c_str(); p->child[1] = q2; match(ID);

TreeNode * q1 = newNode(VoidK); p->child[0] = q1; match(INT);

TreeNode * q1 = newNode(IntK); p->child[0] = q1; match(INT);

}

} else { }

if(currentToken.tokenType==LMBRACKET) { }

else if(currentToken.tokenType==SEMI) { } else { }

match(SEMI); match(SEMI);

TreeNode * q3 = newNode(Var_DeclK); p->child[3] = q3; match(LMBRACKET); match(RMBRACKET); match(SEMI);

syntaxError(\

if(p!=NULL) { }

if(t==NULL) else { }

q->sibling = p; q = p; t = q = p;

return t; }

TreeNode * Parser :: statement_list(void) {

TreeNode * t = statement(); TreeNode * p = t; while

(IF==currentToken.tokenType

|| ||

LBBRACKET==currentToken.tokenType

||

|| ID==currentToken.tokenType WHILE==currentToken.tokenType

RETURN

==currentToken.tokenType ||

SEMI==currentToken.tokenType

LPAREN==currentToken.tokenType || NUM==currentToken.tokenType)

{ TreeNode * q; q = statement(); if(q!=NULL) { if(t==NULL) { t = p = q;

} else { p->sibling = q; p = q; }

}

} return t; }

TreeNode * Parser :: statement(void) {

TreeNode * t = NULL; switch(currentToken.tokenType) { case IF:

||

}

t = selection_stmt(); break;

t = iteration_stmt(); break;

t = return_stmt(); break;

t = compound_stmt(); break;

t = expression_stmt(); break; syntaxError(\

currentToken = getToken(); break;

case WHILE:

case RETURN:

case LBBRACKET:

case ID: case SEMI: case LPAREN: case NUM:

default:

return t; }

TreeNode * Parser :: selection_stmt(void) {

TreeNode * t = newNode(Selection_StmtK); match(IF); match(LPAREN); if(t!=NULL) { }

match(RPAREN); t->child[1] = statement();

if(currentToken.tokenType==ELSE)

t->child[0] = expression();

{ } return t; }

TreeNode * Parser :: expression_stmt(void) {

TreeNode * t = NULL;

if(currentToken.tokenType==SEMI) { } else { } return t; }

TreeNode * Parser :: iteration_stmt(void) {

TreeNode * t = newNode(Iteration_StmtK); match(WHILE); match(LPAREN); if(t!=NULL) {

t->child[0] = expression(); t = expression(); match(SEMI); match(SEMI); return t; match(ELSE); if(t!=NULL) { }

t->child[2] = statement();

}

match(RPAREN); if(t!=NULL) { } return t; }

TreeNode * Parser :: return_stmt(void) {

TreeNode * t = newNode(Return_StmtK); match(RETURN);

if (currentToken.tokenType==SEMI) { } else { }

match(SEMI); return t; }

TreeNode * Parser :: expression(void) {

TreeNode * t = var();

if(t==NULL)//不是以ID开头,只能是simple_expression情况

if(t!=NULL) { }

t->child[0] = expression(); match(SEMI); return t;

t->child[1] = statement();

{ t = simple_expression(t);

}

else//以ID开头,可能是赋值语句,或simple_expression中的var和call类型的情况 { TreeNode * p = NULL;

if(currentToken.tokenType==ASSIGN)//赋值语句 { p = newNode(AssignK);

p->attr.name = lastToken.tokenString.c_str(); match(ASSIGN); p->child[0] = t;

p->child[1] = expression(); return p;

}

else //simple_expression中的var和call类型的情况 { t = simple_expression(t); } }

return t;

}

TreeNode * Parser :: simple_expression(TreeNode * k) {

TreeNode * t = additive_expression(k); k = NULL;

if(EQ==currentToken.tokenType || GT==currentToken.tokenType GEQ==currentToken.tokenType ||

LT==currentToken.tokenType

LEQ==currentToken.tokenType || NEQ==currentToken.tokenType)

{ TreeNode * q = newNode(OpK); q->attr.op = currentToken.tokenType;

q->child[0] = t;

|| ||

}

t = q;

match(currentToken.tokenType); t->child[1] = additive_expression(k); return t;

return t; }

TreeNode * Parser :: additive_expression(TreeNode * k) {

TreeNode * t = term(k); k = NULL;

while((currentToken.tokenType==PLUS)||(currentToken.tokenType==MINUS)) { } return t; }

TreeNode * Parser :: term(TreeNode * k) {

TreeNode * t = factor(k); k = NULL;

while((currentToken.tokenType==TIMES)||(currentToken.tokenType==OVER)) {

TreeNode * q = newNode(OpK); q->attr.op = currentToken.tokenType; q->child[0] = t; t = q;

TreeNode * q = newNode(OpK); q->attr.op = currentToken.tokenType; q->child[0] = t;

match(currentToken.tokenType); q->child[1] = term(k); t = q;

}

match(currentToken.tokenType); q->child[1] = factor(k);

return t; }

TreeNode * Parser :: factor(TreeNode * k) {

TreeNode * t = NULL;

if(k!=NULL)//k为上面传下来的已经解析出来的以ID开头的var,可能为call或var { }

else//没有从上面传下来的var {

switch(currentToken.tokenType) {

case LPAREN:

match(LPAREN); t = expression(); match(RPAREN); break; k = var();

if(LPAREN==currentToken.tokenType && k->nodekind!=Arry_ElemK) {

t = call(k);

if(currentToken.tokenType==LPAREN && k->nodekind!=Arry_ElemK) //call { } else { }

t = k; t = call(k);

case ID:

}

}

} break;

t = newNode(ConstK);

if((t!=NULL)&&(currentToken.tokenType==NUM)) { }

match(NUM); break; syntaxError(\

currentToken = getToken(); break;

t->attr.val = atoi(currentToken.tokenString.c_str());

case NUM:

default:

return t; }

TreeNode * Parser :: var(void) {

TreeNode * t = NULL; TreeNode * p = NULL; TreeNode * q = NULL; if(currentToken.tokenType==ID) {

p = newNode(IdK);

p->attr.name = currentToken.tokenString.c_str(); match(ID);

if(currentToken.tokenType==LMBRACKET) {

match(LMBRACKET); q = expression(); match(RMBRACKET);

} return t; }

TreeNode * Parser :: call(TreeNode * k) {

TreeNode * t = newNode(CallK); if(k!=NULL)

t->child[0] = k; match(LPAREN);

if(currentToken.tokenType==RPAREN) { }

else if(k!=NULL) { } return t; }

TreeNode * Parser :: args(void) {

TreeNode * t = newNode(ArgsK);

t->child[1] = args(); match(RPAREN); match(RPAREN); return t; } else { }

t = p;

t = newNode(Arry_ElemK); t->child[0] = p; t->child[1] = q;

TreeNode * s = NULL; TreeNode * p = NULL;

if(currentToken.tokenType!=RPAREN) { } { } return t; }

int main(int argc, char * argv[]) { Parser p;

t->child[0] = s; s = expression(); p = s;

while(currentToken.tokenType==COMMA) { }

TreeNode * q; match(COMMA); q = expression(); if(q!=NULL) { }

if(s==NULL) { } else { }

p->sibling = q; p = q; s = p = q;

if(s!=NULL)

system(\AUSE\ return EXIT_SUCCESS; }

parse.h:

#include #include using namespace std;

//19种节点类型,分别表示int、id、void、数值、变量声明、数组声明、函数声明、函数声明参数列表、函数声明参数、复合语句体、if、while、return、赋值、运算、数组元素、函数调用、函数调用参数列表、未知节点

typedef enum {IntK, IdK, VoidK, ConstK, Var_DeclK, Arry_DeclK, FunK, ParamsK, ParamK, CompK, Selection_StmtK, Iteration_StmtK, Return_StmtK, AssignK, OpK, Arry_ElemK, CallK, ArgsK, UnkownK} Nodekind; typedef enum {Void,Integer} ExpType;

ofstream fout_Tree(\输出语法树到文件 const int MAXCHILDREN = 4;

//treeNode定义 包括子节点、兄弟节点、所处行号、节点类型、属性、表达式返回类型 typedef struct treeNode {

struct treeNode * child[MAXCHILDREN]; struct treeNode * sibling; int lineno;

Nodekind nodekind;

union { TokenType op; int val; const char * name;} attr; ExpType type; } TreeNode;

class Parser { public: Parser();

TreeNode * parse(void);

TreeNode * syntaxTree;//语法树根节点

private: Scanner scanner;

Token currentToken;//当前获取的Token Token lastToken;//前一个Token

int tokenIndex;//配合getToken使用,每获取一次,tokenIndex自增 bool Error;//语法分析是否出错

int step;//用于节点输出时表征节点的先行空格

Token getToken();//获取保存在scanner中TokenList数组中的Token,每次获取完之后数组下标指向下一个

void printSpace(int n);//打印n个空格

void syntaxError(string s);//报错的函数,报告出错位置(行号)、出错位置附近的Token void match(TokenType ex);//与目标Token类型ex匹配,如果匹配成功则获取下一个Token(为currentToken赋值),否则报错

void printTree(TreeNode * t);//打印生成的语法树

TreeNode * newNode(Nodekind k);//根据节点类型新建节点

TreeNode * declaration_list(void); TreeNode * declaration(void); TreeNode * params(void);

TreeNode * param_list(TreeNode * k); TreeNode * param(TreeNode * k); TreeNode * compound_stmt(void); TreeNode * local_declaration(void); TreeNode * statement_list(void); TreeNode * statement(void);

TreeNode * expression_stmt(void); TreeNode * selection_stmt(void); TreeNode * iteration_stmt(void); TreeNode * return_stmt(void); TreeNode * expression(void);

TreeNode * var(void);

TreeNode * simple_expression(TreeNode * k); TreeNode * additive_expression(TreeNode * k); TreeNode * term(TreeNode * k); TreeNode * factor(TreeNode * k); TreeNode * call(TreeNode * k); TreeNode * args(void); };

4. 测试结果

4.1测试数据选择,保存a.txt文档中:

/* A program to perform Eucild's Algorithm to compute gcd. */ int gcd (int u, int v) {

if (v==0) return u; else return gcd(v,u-u/v*v); /* u-u/v*v== u mod v */ }

void main(void) { int x; int y;

x=input(); y=input(); output(gcd(x,y)); }

4.2 测试结果分析

4.2.1正确用例结果 运行程序结果:

去掉注释之后自动生成sourceFile.txt中:

正确词法分析结果输出自动生成在Token.txt中:

正确语法分析的结果输出,自动生成在tokenTree.txt中:

4.2.2错误用例结果

错误处理:(将a.txt中“return ”改为“return 0”之后再运行):

错误词法分析结果输出自动生成在Token.txt中:

错误语法分析结果输出自动生成在token.Tree.txt中:

5. 总结

?

收获

熟悉C语言的结构、指针、文件方面的使用;学会用startUml构造状态图,及用 switch-case结构实现状态转换实现词法分析;学会用递归下降方法实现EBNF文法规则,进而实现语法分析;熟悉构造编译器的步骤及程序实现的框架及部分方法。

?

特色

有良好的输出界面,通过tiny程序修改成CMinus程序,其思路和方法和tiny差不多,

方便理解。

?

不足

① 程序语法实现部分只用了递归下降分析一种方法。

② 此程序框架与教材《编译原理及实践》附录B Tiny编译器的程序框架相似。 ③ 递归下降分析方法手工实现应经验少程序逻辑错误多,调试费时久,不能保证程序的高健壮性。

联系客服:779662525#qq.com(#替换为@)