三解析器

1. 概览
2. 程序对解析器的使用
3. demo 示例
- 3.1 输入
- 3.2 输出
4. 源码解析

1. 概览

Typescript 解析器代码位于 parser.ts 中。在内部由解析器控制扫描器将源码解析成AST。

其实我们在上面一个篇章中关于扫描器中也这样解释过扫描器的入口是有解析器来控制解析器实现原理是单例模式。

2. 程序对解析器的使用

解析器由程序间接驱动简化的调用栈如下所示

程序 ->
  CompilerHost.getSourceFile ->
        (全局函数 parser.ts).createSourceFile ->
            Parser.parseSourceFile

parseSourceFile 不仅准备好解析器的状态还调用 initialState 准备好扫描器的状态然后使用 parseSourceFileWorker 继续解析代码

3. demo 示例

想了想单纯的看代码还是有些枯燥这样我们还是先写个demo 根据输入输出来判断整个解析器内部是怎样生成AST的呢

3.1 输入

import * as ts from 'ntypescript';
function printAllChildren(node: ts.Node, depth = 0) {
  console.log(new Array(depth + 1).join('----'), ts.formatSyntaxKind(node.kind), node.pos, node.end);
  depth++;
  node.getChildren().forEach(c => printAllChildren(c, depth));
}
var sourceCode = `const foo = 123;`;
var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true)
console.log(sourceFile)
printAllChildren(sourceFile);

3.2 输出

// SourceFile 0 16
// ---- SyntaxList 0 16
// -------- VariableStatement 0 16
// ------------ VariableDeclarationList 0 15
// ---------------- ConstKeyword 0 5
// ---------------- SyntaxList 5 15
// -------------------- VariableDeclaration 5 15
// ------------------------ Identifier 5 9
// ------------------------ EqualsToken 9 11
// ------------------------ NumericLiteral 11 15
// ------------ SemicolonToken 15 16
// ---- EndOfFileToken 16 16

这其实就是一个 AST 树当然了关于AST 又是一个可能比较深的话题我们先看在typescript 源码中对于这颗树的定义

4. 源码解析

4.1 createSourceFile

// parser 的入口
  // sourceText const foo = 123;
  export function createSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, setParentNodes = false, scriptKind?: ScriptKind): SourceFile {
      performance.mark("beforeParse");
      const result = Parser.parseSourceFile(fileName, sourceText, languageVersion, /*syntaxCursor*/ undefined, setParentNodes, scriptKind);
      performance.mark("afterParse");
      performance.measure("Parse", "beforeParse", "afterParse");
      return result;
  }

其中 performance 是关于性能分析的代码与主流程无关我们只需要关心 Parser.parseSourceFile 函数

4.2 Parser.parseSourceFile

// sourceText const foo = 123;
export function parseSourceFile(fileName: string, sourceText: string, languageVersion: ScriptTarget, syntaxCursor: IncrementalParser.SyntaxCursor, setParentNodes?: boolean, scriptKind?: ScriptKind): SourceFile {
    scriptKind = ensureScriptKind(fileName, scriptKind);
    // 初始化状态
    // scan 扫描器 的初始化 
    // scanner.setText(text);
    initializeState(sourceText, languageVersion, syntaxCursor, scriptKind);
    // 解析目标文件
    const result = parseSourceFileWorker(fileName, languageVersion, setParentNodes, scriptKind);
    // 清除状态
    clearState();
    return result;
}

其中 initializeState 是关于扫描器初始化的函数我们简单看以下 initializeState 代码

function initializeState(_sourceText: string, languageVersion: ScriptTarget, _syntaxCursor: IncrementalParser.SyntaxCursor, scriptKind: ScriptKind) {
    // Initialize and prime the scanner before parsing the source elements.
    scanner.setText(sourceText);
    scanner.setOnError(scanError);
    scanner.setScriptTarget(languageVersion);
    scanner.setLanguageVariant(getLanguageVariant(scriptKind));
}

4.3 parseSourceFileWorker

我们继续往下走来到了 parseSourceFileWorker

function parseSourceFileWorker(fileName: string, languageVersion: ScriptTarget, setParentNodes: boolean, scriptKind: ScriptKind): SourceFile {
    // 创建解析的目标
    sourceFile = createSourceFile(fileName, languageVersion, scriptKind);
    sourceFile.flags = contextFlags;
    // Prime the scanner.
    // 新扫描的token替换currentToken
    // 执行 scanner.scan();
    nextToken();
    // 生成每个range的各种信息(包括起点和终点)
    processReferenceComments(sourceFile);
    sourceFile.statements = parseList(ParsingContext.SourceElements, parseStatement);
    Debug.assert(token() === SyntaxKind.EndOfFileToken);
    sourceFile.endOfFileToken = addJSDocComment(parseTokenNode() as EndOfFileToken);
    setExternalModuleIndicator(sourceFile);
    sourceFile.nodeCount = nodeCount;
    sourceFile.identifierCount = identifierCount;
    sourceFile.identifiers = identifiers;
    sourceFile.parseDiagnostics = parseDiagnostics;
    if (setParentNodes) {
        fixupParentReferences(sourceFile);
    }
    return sourceFile;
}

我们看上述代码可以发现 parseSourceFileWorker 主要完成了下面四个工作

createSourceFile 为我们创建解析的目标
执行 nextToken 新扫描的 token 替换 currentToken
执行 processReferenceComments 生成每个range的各种信息(包括起点和终点)
执行 parseList

4.3.1 createSourceFile 代码如下

function createSourceFile(fileName: string, languageVersion: ScriptTarget, scriptKind: ScriptKind): SourceFile {
    // code from createNode is inlined here so createNode won't have to deal with special case of creating source files
    // this is quite rare comparing to other nodes and createNode should be as fast as possible
    const sourceFile = <SourceFile>new SourceFileConstructor(SyntaxKind.SourceFile, /*pos*/ 0, /* end */ sourceText.length);
    nodeCount++;
    sourceFile.text = sourceText;
    sourceFile.bindDiagnostics = [];
    sourceFile.languageVersion = languageVersion;
    sourceFile.fileName = normalizePath(fileName);
    sourceFile.languageVariant = getLanguageVariant(scriptKind);
    sourceFile.isDeclarationFile = fileExtensionIs(sourceFile.fileName, Extension.Dts);
    sourceFile.scriptKind = scriptKind;
    return sourceFile;
}

在这里我们正好可以看下对于 sourceFile 的 ts 定义

 let SourceFileConstructor: new (kind: SyntaxKind, pos: number, end: number) => Node;

4.3.2 nextToken

function nextToken(): SyntaxKind {
  return currentToken = scanner.scan();
}

4.3.3 processReferenceComments

对于 processReferenceComments 的分析或许我们可以先看下输入输入才能更清楚的知道其中发生了什么

执行 processReferenceComments 之前的sourceFile

bindDiagnostics:(0) []
end:16
fileName:'foo.ts'
flags:0
isDeclarationFile:false
kind:265
languageVariant:0
languageVersion:1
parent:undefined
pos:0
scriptKind:3
text:'const foo = 123;'
transformFlags:undefined

amdDependencies:(0) []
bindDiagnostics:(0) []
checkJsDirective:undefined
end:16
fileName:'foo.ts'
flags:0
isDeclarationFile:false
kind:265
languageVariant:0
languageVersion:1
moduleName:undefined
parent:undefined
pos:0
referencedFiles:(0) []
scriptKind:3
text:'const foo = 123;'
transformFlags:undefined
typeReferenceDirectives:(0) []

看起来好像没什么差别的样式那也就是说 processReferenceComments 并不是生成AST 的主力

那么我们继续往下看

4.3.4 parseList

function parseList<T extends Node>(kind: ParsingContext, parseElement: () => T): NodeArray<T> {
    const saveParsingContext = parsingContext;
    parsingContext |= 1 << kind;
    const result = createNodeArray<T>();
    while (!isListTerminator(kind)) {
        if (isListElement(kind, /*inErrorRecovery*/ false)) {
            // parseListElement 处理
            const element = parseListElement(kind, parseElement);
            result.push(element);
            continue;
        }
        if (abortParsingListOrMoveToNextToken(kind)) {
            break;
        }
    }
    result.end = getNodeEnd();
    parsingContext = saveParsingContext;
    return result;
}

其中我们可以看到其中element 生成的还是要靠 parseListElement 我们继续看代码

function parseListElement<T extends Node>(parsingContext: ParsingContext, parseElement: () => T): T {
    const node = currentNode(parsingContext);
    if (node) {
        return <T>consumeNode(node);
    }
    return parseElement();
}

其中 parseListElement 最终的生成结果还是要靠用户在最开始传入的 parseElement

好吧继续

这里的 parseElement 就是最开始的 parseStatement

它会根据扫描器返回的token 来回切换（调用相应的parseXXX函数）例如当前token是一个 SemicolonToken（分号标记）就会调用 parseEmptyStatement 为空语句创建一个AST节点

4.3.5 parseStatement

// 在 parseStatement中对token做了switch处理, 根据不同的token获取不同的Node节点。比如我们以最后的; 来做一下简单的判断。
// 首先;对应的 token值应该是  SyntaxKind.SemicolonToken刚刚好是条件判断中的第一个。
function parseStatement(): Statement {
    switch (token()) {
        case SyntaxKind.SemicolonToken:
            return parseEmptyStatement();
        case SyntaxKind.OpenBraceToken:
            return parseBlock(/*ignoreMissingOpenBrace*/ false);
        case SyntaxKind.VarKeyword:
            return parseVariableStatement(scanner.getStartPos(), /*decorators*/ undefined, /*modifiers*/ undefined);
        case SyntaxKind.LetKeyword:
            if (isLetDeclaration()) {
                return parseVariableStatement(scanner.getStartPos(), /*decorators*/ undefined, /*modifiers*/ undefined);
            }
            break;
        case SyntaxKind.FunctionKeyword:
            return parseFunctionDeclaration(scanner.getStartPos(), /*decorators*/ undefined, /*modifiers*/ undefined);
        case SyntaxKind.ClassKeyword:
            return parseClassDeclaration(scanner.getStartPos(), /*decorators*/ undefined, /*modifiers*/ undefined);
        case SyntaxKind.IfKeyword:
            return parseIfStatement();
        case SyntaxKind.DoKeyword:
            return parseDoStatement();
        case SyntaxKind.WhileKeyword:
            return parseWhileStatement();
        case SyntaxKind.ForKeyword:
            return parseForOrForInOrForOfStatement();
        case SyntaxKind.ContinueKeyword:
            return parseBreakOrContinueStatement(SyntaxKind.ContinueStatement);
        case SyntaxKind.BreakKeyword:
            return parseBreakOrContinueStatement(SyntaxKind.BreakStatement);
        case SyntaxKind.ReturnKeyword:
            return parseReturnStatement();
        case SyntaxKind.WithKeyword:
            return parseWithStatement();
        case SyntaxKind.SwitchKeyword:
            return parseSwitchStatement();
        case SyntaxKind.ThrowKeyword:
            return parseThrowStatement();
        case SyntaxKind.TryKeyword:
        // Include 'catch' and 'finally' for error recovery.
        case SyntaxKind.CatchKeyword:
        case SyntaxKind.FinallyKeyword:
            return parseTryStatement();
        case SyntaxKind.DebuggerKeyword:
            return parseDebuggerStatement();
        case SyntaxKind.AtToken:
            return parseDeclaration();
        case SyntaxKind.AsyncKeyword:
        case SyntaxKind.InterfaceKeyword:
        case SyntaxKind.TypeKeyword:
        case SyntaxKind.ModuleKeyword:
        case SyntaxKind.NamespaceKeyword:
        case SyntaxKind.DeclareKeyword:
        case SyntaxKind.ConstKeyword:
        case SyntaxKind.EnumKeyword:
        case SyntaxKind.ExportKeyword:
        case SyntaxKind.ImportKeyword:
        case SyntaxKind.PrivateKeyword:
        case SyntaxKind.ProtectedKeyword:
        case SyntaxKind.PublicKeyword:
        case SyntaxKind.AbstractKeyword:
        case SyntaxKind.StaticKeyword:
        case SyntaxKind.ReadonlyKeyword:
        case SyntaxKind.GlobalKeyword:
            if (isStartOfDeclaration()) {
                return parseDeclaration();
            }
            break;
    }
    return parseExpressionOrLabeledStatement();
}

4.3.6 createNode

我们上面可以继续往下分析比如说 parseEmptyStatement 这个入口来说

 function parseEmptyStatement() {
    var node = createNode(209 /* EmptyStatement */);
    parseExpected(25 /* SemicolonToken */);
    return finishNode(node);
}

可以发现 node 都是通过 createNode 来进行创建的

// note: this function creates only node
function createNode(kind, pos) {
    nodeCount++;
    if (!(pos >= 0)) {
        pos = scanner.getStartPos();
    }
    return ts.isNodeKind(kind) ? new NodeConstructor(kind, pos, pos) :
        kind === 71 /* Identifier */ ? new IdentifierConstructor(kind, pos, pos) :
            new TokenConstructor(kind, pos, pos);
}

通过上面的代码我们可以得出 createNode 负责创建节点设置传入的 SyntaxKind（语法类型）和初始位置（默认使用当前扫描器状态提供位置消息而 parseExpected 将会检查解析器状态中的当前 token 是否与指定的 SyntaxKind 匹配。如果不匹配将会生成错误报告。

最后一步 finishNode 将会设置节点的 end 位置并且添加上下文的标志 contextFlags 以及解析该节点之前出现的错误（如果有错的话就不能在增量解析中重用此AST节点）

三 解析器