LLVM


LLVM:模块化,可重用的编译器以及工具链技术集合.创始人: Chris LattnerLLVM不是Low Level Virtual Machine(低级虚拟机)的缩写,LLVM就是他的项目全名.### 传统编译器: GCCClang#### 传统编译器架构: LLVM - 图1

  • Frontend: 前端

    词法分析,语法分析,语义分析,生成中间代码

  • Opotimizer: 优化器

    中间代码优化

  • Backend: 后端

    生成机器码

    LLVM架构

    LLVM - 图2

  • 不同的前端后端使用统一的中间代码 LLVM Intermediate Representation(LLVM IR).
  • 如果需要支持一种新的编程语言,只需要实现一个新的前端.
  • 如果需要支持一种新的硬件设备,只需要增加一个新的后端.
  • 优化阶段是一个通用阶段,它针对的是统一的LLVM IR,无论是支持新的编程语言,还是支持新的硬件设备,都不需要对优化阶段做修改.
  • 相比之下,GCC的前端和后端没分的太开,前端后端耦合在一起.所以GCC为了来支持一门新的语言或者新的硬件设备,就变得很困难.
  • LLVM现在被用作为实现各种静态和运行时编译语言的通用基础结构.(GCC家族,Java,.net,Python等)

    Clang

  • LLVM一个子项目

  • 基于LLVM架构的C/C++/Objective-C编译器前端 优点:
  • 编译速度快,在某些平台上Clang的编译速度显著地快过GCC
  • 占用内存小,Clang生成的AST所占用的内存是GCC的五分之一左右
  • 模块化设计,基于库的模块化设计,易于IDE集成以及其他用途的重用
  • 诊断信息可读性强: 在编译过程中,Clang创建并保留了大量详细的元数据(metadata),有利于调试和错误解读.
  • 设计清晰简单,容易理解,易于扩展增强

    Clang与LLVM

    LLVM - 图3
  • 广义LLVM

    整个LLVM架构

  • 狭义LLVM

    LLVM后端(代码优化,目标代码生成等) LLVM - 图4## OC源文件编译过程

    命令行查看编译过程

    clang -ccc-print-phases main.m ➜ TestSwift clang -ccc-print-phases main.swift0: input, “main.swift”, object1: linker, {0}, image2: bind-arch, “x86_64”, {1}, image➜ TestOC clang -ccc-print-phases main.m0: input, “main.m”, objective-c1: preprocessor, {0}, objective-c-cpp-output2: compiler, {1}, ir3: backend, {2}, assembler4: assembler, {3}, object5: linker, {4}, image6: bind-arch, “x86_64”, {5}, image> Swift比OC少了4个编译阶段呐,有木有…

    查看preprocessor(预处理)的结果

    clang -E main.m //源文件print(“Hello World”)//预处理输出➜ TestSwift clang -E main.swiftclang: warning: main.swift: ‘linker’ input unused [-Wunused-command-line-argument]//源文件#define AGE 10
    int main(int argc, const char argv[]) {int a = 10;int b = 20;int c = a + b + AGE;return 0;}//预处理输出➜ TestOC clang -E main.m# 1 “main.m”# 1 ““ 1# 1 ““ 3# 373 ““ 3# 1 ““ 1# 1 ““ 2# 1 “main.m” 2# 11 “main.m”int main(int argc, const char argv[]) {
    int a = 10;int b = 20;int c = a + b + 10;
    return 0;}## 词法分析

  • 词法分析,生成Token(类似英语中主语,谓语,宾语,宾补…)

    clang -fmodules -E -Xclang -dump-tokens main.m ➜ TestSwift clang -fmodules -E -Xclang -dump-tokens main.swiftclang: warning: main.swift: ‘linker’ input unused [-Wunused-command-line-argument]clang: warning: argument unused during compilation: ‘-fmodules’ [-Wunused-command-line-argument]clang: warning: argument unused during compilation: ‘-Xclang -dump-tokens’ [-Wunused-command-line-argument]➜ TestOC clang -fmodules -E -Xclang -dump-tokens main.mint ‘int’ [StartOfLine] Loc=identifier ‘main’ [LeadingSpace] Loc=l_paren ‘(‘ Loc=int ‘int’ Loc=identifier ‘argc’ [LeadingSpace] Loc=comma ‘,’ Loc=const ‘const’ [LeadingSpace] Loc=char ‘char’ [LeadingSpace] Loc=star ‘*’ [LeadingSpace] Loc=identifier ‘argv’ [LeadingSpace] Loc=l_square ‘[‘ Loc=r_square ‘]’ Loc=r_paren ‘)’ Loc=l_brace ‘{‘ [LeadingSpace] Loc=int ‘int’ [StartOfLine] [LeadingSpace] Loc=identifier ‘a’ [LeadingSpace] Loc=equal ‘=’ [LeadingSpace] Loc=numeric_constant ‘10’ [LeadingSpace] Loc=semi ‘;’ Loc=int ‘int’ [StartOfLine] [LeadingSpace] Loc=identifier ‘b’ [LeadingSpace] Loc=equal ‘=’ [LeadingSpace] Loc=numeric_constant ‘20’ [LeadingSpace] Loc=semi ‘;’ Loc=int ‘int’ [StartOfLine] [LeadingSpace] Loc=identifier ‘c’ [LeadingSpace] Loc=equal ‘=’ [LeadingSpace] Loc=identifier ‘a’ [LeadingSpace] Loc=plus ‘+’ [LeadingSpace] Loc=identifier ‘b’ [LeadingSpace] Loc=plus ‘+’ [LeadingSpace] Loc=numeric_constant ‘10’ [LeadingSpace] Loc=>semi ‘;’ Loc=return ‘return’ [StartOfLine] [LeadingSpace] Loc=numeric_constant ‘0’ [LeadingSpace] Loc=semi ‘;’ Loc=r_brace ‘}’ [StartOfLine] Loc=eof ‘’ Loc=## 语法分析

  • 语法分析,生成语法树(AST, Abstract Syntax Tree)

    clang -fmodules -fsyntax-only -Xclang -ast-dump main.m ➜ Test clang -fmodules -fsyntax-only -Xclang -ast-dump main.swiftclang: warning: main.swift: ‘linker’ input unused [-Wunused-command-line-argument]clang: warning: argument unused during compilation: ‘-fmodules’ [-Wunused-command-line-argument]clang: warning: argument unused during compilation: ‘-Xclang -ast-dump’ [-Wunused-command-line-argument]➜ TestOC clang -fmodules -fsyntax-only -Xclang -ast-dump main.mTranslationUnitDecl 0x7ff3730298e8 <> |-TypedefDecl 0x7ff373029e60 <> implicit int128_t ‘int128’| -BuiltinType 0x7ff373029b80 '__int128'|-TypedefDecl 0x7ff373029ed0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'|-BuiltinType 0x7ff373029ba0 ‘unsigned int128’|-TypedefDecl 0x7ff373029f70 <> implicit SEL ‘SEL ‘| `-PointerType 0x7ff373029f30 ‘SEL ‘| -BuiltinType 0x7ff373029dc0 'SEL'|-TypedefDecl 0x7ff37302a058 <<invalid sloc>> <invalid sloc> implicit id 'id'|-ObjCObjectPointerType 0x7ff37302a000 ‘id’| -ObjCObjectType 0x7ff373029fd0 'id'|-TypedefDecl 0x7ff37302a138 <<invalid sloc>> <invalid sloc> implicit Class 'Class'|-ObjCObjectPointerType 0x7ff37302a0e0 ‘Class’| `-ObjCObjectType 0x7ff37302a0b0 ‘Class’|-ObjCInterfaceDecl 0x7ff37302a190 <> implicit Protocol|-TypedefDecl 0x7ff37302a4f8 <> implicit NSConstantString ‘struct NSConstantString_tag’| `-RecordType 0x7ff37302a300 ‘struct NSConstantString_tag’| -Record 0x7ff37302a260 '__NSConstantString_tag'|-TypedefDecl 0x7ff37302a590 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'|-PointerType 0x7ff37302a550 ‘char ‘| -BuiltinType 0x7ff373029980 'char'|-TypedefDecl 0x7ff373062488 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'|-ConstantArrayType 0x7ff373062430 ‘struct va_list_tag [1]’ 1| `-RecordType 0x7ff3730622a0 ‘struct va_list_tag’| -Record 0x7ff373062200 '__va_list_tag'-FunctionDecl 0x7ff373062758 line:11:5 main ‘int (int, const char )’|-ParmVarDecl 0x7ff3730624f8 col:14 argc ‘int’|-ParmVarDecl 0x7ff373062610 col:33 argv ‘const char ‘:’const char -CompoundStmt 0x7ff373062bd8 <col:41, line:18:1>|-DeclStmt 0x7ff373062928 <line:13:5, col:15>|-VarDecl 0x7ff3730628a8 col:9 used a ‘int’ cinit| -IntegerLiteral 0x7ff373062908 <col:13> 'int' 10|-DeclStmt 0x7ff3730629d8 <line:14:5, col:15>|-VarDecl 0x7ff373062958 col:9 used b ‘int’ cinit| -IntegerLiteral 0x7ff3730629b8 <col:13> 'int' 20|-DeclStmt 0x7ff373062b88 <line:15:5, col:24>|-VarDecl 0x7ff373062a08 line:15:9 c ‘int’ cinit| -BinaryOperator 0x7ff373062b60 <col:13, line:9:13> 'int' '+'| |-BinaryOperator 0x7ff373062b18 <line:15:13, col:17> 'int' '+'| | |-ImplicitCastExpr 0x7ff373062ae8 <col:13> 'int' <LValueToRValue>| | |-DeclRefExpr 0x7ff373062a68 ‘int’ lvalue Var 0x7ff3730628a8 ‘a’ ‘int’| | -ImplicitCastExpr 0x7ff373062b00 <col:17> 'int' <LValueToRValue>| |-DeclRefExpr 0x7ff373062aa8 ‘int’ lvalue Var 0x7ff373062958 ‘b’ ‘int’| -IntegerLiteral 0x7ff373062b40 <line:9:13> 'int' 10-ReturnStmt 0x7ff373062bc0 `-IntegerLiteral 0x7ff373062ba0 ‘int’ 0## LLVM IR LLVM IR有三种表示形式(本质等价,好比水的气态,液态,固态)1.text: 便于阅读的文本格式,类似于汇编语言,扩展名.ll> clang -S -emit-llvm main.m; ModuleID = ‘main.m’source_filename = “main.m”target datalayout = “e-m:o-i64:64-f80:128-n8:16:32:64-S128”target triple = “x86_64-apple-macosx10.14.0”
    ; Function Attrs: noinline nounwind optnone ssp uwtabledefine i32 @main(i32, i8
    ) #0 {%3 = alloca i32, align 4%4 = alloca i32, align 4%5 = alloca i8**, align 8%6 = alloca i32, align 4%7 = alloca i32, align 4%8 = alloca i32, align 4store i32 0, i32
    %3, align 4store i32 %0, i32 %4, align 4store i8 %1, i8 %5, align 8store i32 10, i32 %6, align 4store i32 20, i32 %7, align 4%9 = load i32, i32 %6, align 4%10 = load i32, i32 %7, align 4%11 = add nsw i32 %9, %10%12 = add nsw i32 %11, 10store i32 %12, i32* %8, align 4ret i32 0}
    attributes #0 = { noinline nounwind optnone ssp uwtable “correctly-rounded-divide-sqrt-fp-math”=”false” “disable-tail-calls”=”false” “less-precise-fpmad”=”false” “no-frame-pointer-elim”=”true” “no-frame-pointer-elim-non-leaf” “no-infs-fp-math”=”false” “no-jump-tables”=”false” “no-nans-fp-math”=”false” “no-signed-zeros-fp-math”=”false” “no-trapping-math”=”false” “stack-protector-buffer-size”=”8” “target-cpu”=”penryn” “target-features”=”+cx16,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87” “unsafe-fp-math”=”false” “use-soft-float”=”false” }
    !llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}!llvm.ident = !{!7}
    !0 = !{i32 1, !”Objective-C Version”, i32 2}!1 = !{i32 1, !”Objective-C Image Info Version”, i32 0}!2 = !{i32 1, !”Objective-C Image Info Section”, !”DATA,objc_imageinfo,regular,no_dead_strip”}!3 = !{i32 4, !”Objective-C Garbage Collection”, i32 0}!4 = !{i32 1, !”Objective-C Class Properties”, i32 64}!5 = !{i32 1, !”wchar_size”, i32 4}!6 = !{i32 7, !”PIC Level”, i32 2}!7 = !{!”Apple LLVM version 10.0.0 (clang-1000.11.45.2)”}//什么鬼东西2.memory: 内存格式3.bitcode: 二进制格式,扩展名.bc> clang -c -emit-llvm main.m

    IR基本语法

  • 注释以分号;开头

  • 全局标识符以@开头,局部标识符以%开头
  • alloca在当前函数栈帧中分配内存
  • i32,32bit,4个字节的意思
  • align,内存对齐
  • store,写入数据
  • load,读取数据