Julia 代码的 eval
学习 Julia 语言如何运行代码的最难的一部分是 学习如何让所有的小部分工作协同工作来执行一段代码。
每个代码块通常会通过许多步骤来执行,在转变为期望的结果之前(但愿如此)。并且你可能不熟悉它们的名称,例如(非特定顺序):
flisp,AST,C++,LLVM,eval
,typeinf
,macroexpand
,sysimg(或 system image),启动,变异,解析,执行,即时编译器,解释器解释,装箱,拆箱,内部函数,原始函数
!!! sidebar “Definitions”
* REPL
REPL 表示 读取-求值-输出-循环(Read-Eval-Print Loop)。 我们管这个命令行环境的简称就叫REPL。
* AST
抽象语法树(Abstract Syntax Tree)是代码结构的数据表现。在这种表现形式下代码被符号化,因此更加方便操作和执行。
Julia Execution
整个进程的千里之行如下:
- 用户打开了
julia
。 - The C function
main()
fromcli/loader_exe.c
gets called. This function processes the command line arguments, filling in thejl_options
struct and setting the variableARGS
. It then initializes 在ui/repl.c
中的 C 语言的函数main()
被调用。这个函数处理命令行参数,填充到jl_options
结构图并且设置变了ARGS
。接下来初始化 Julia (通过调用julia_init
intask.c
which may load a previously compiled sysimg). Finally, it passes off control to Julia by callingBase._start()
. - When
_start()
takes over control, the subsequent sequence of commands depends on the command line arguments given. For example, if a filename was supplied, it will proceed to execute that file. Otherwise, it will start an interactive REPL. - Skipping the details about how the REPL interacts with the user, let’s just say the program ends up with a block of code that it wants to run.
- If the block of code to run is in a file,
jl_load(char *filename)
gets invoked to load the file and parse it. Each fragment of code is then passed toeval
to execute. - Each fragment of code (or AST), is handed off to
eval()
to turn into results. eval()
takes each code fragment and tries to run it injl_toplevel_eval_flex()
.jl_toplevel_eval_flex()
decides whether the code is a “toplevel” action (such asusing
ormodule
), which would be invalid inside a function. If so, it passes off the code to the toplevel interpreter.jl_toplevel_eval_flex()
then expands the code to eliminate any macros and to “lower” the AST to make it simpler to execute.jl_toplevel_eval_flex()
then uses some simple heuristics to decide whether to JIT compiler the AST or to interpret it directly.- The bulk of the work to interpret code is handled by
eval
ininterpreter.c
. - If instead, the code is compiled, the bulk of the work is handled by
codegen.cpp
. Whenever a Julia function is called for the first time with a given set of argument types, type inference will be run on that function. This information is used by the codegen step to generate faster code. - Eventually, the user quits the REPL, or the end of the program is reached, and the
_start()
method returns. - Just before exiting,
main()
callsjl_atexit_hook(exit_code)
. This callsBase._atexit()
(which calls any functions registered toatexit()
inside Julia). Then it callsjl_gc_run_all_finalizers()
. Finally, it gracefully cleans up alllibuv
handles and waits for them to flush and close.
Parsing
The Julia parser is a small lisp program written in femtolisp, the source-code for which is distributed inside Julia in src/flisp.
The interface functions for this are primarily defined in jlfrontend.scm
.
The code in ast.c
handles this handoff
on the Julia side.
The other relevant files at this stage are julia-parser.scm
,
which handles tokenizing Julia code and turning it into an AST, and julia-syntax.scm
,
which handles transforming complex AST representations into simpler, “lowered” AST representations
which are more suitable for analysis and execution.
If you want to test the parser without re-building Julia in its entirety, you can run the frontend on its own as follows:
$ cd src
$ flisp/flisp
> (load "jlfrontend.scm")
> (jl-parse-file "<filename>")
Macro Expansion
When eval()
encounters a macro, it expands that AST node before attempting to evaluate
the expression. Macro expansion involves a handoff from eval()
(in Julia), to the parser
function jl_macroexpand()
(written in flisp
) to the Julia macro itself (written in - what
else - Julia) via fl_invoke_julia_macro()
, and back.
Typically, macro expansion is invoked as a first step during a call to Meta.lower()
/jl_expand()
,
although it can also be invoked directly by a call to macroexpand()
/jl_macroexpand()
.
Type Inference
Type inference is implemented in Julia by typeinf()
in compiler/typeinfer.jl
.
Type inference is the process of examining a Julia function and determining bounds for the types
of each of its variables, as well as bounds on the type of the return value from the function.
This enables many future optimizations, such as unboxing of known immutable values, and compile-time
hoisting of various run-time operations such as computing field offsets and function pointers.
Type inference may also include other steps such as constant propagation and inlining.
!!! sidebar “More Definitions”
* JIT
Just-In-Time Compilation The process of generating native-machine code into memory right when
it is needed.
* LLVM
Low-Level Virtual Machine (a compiler) The Julia JIT compiler is a program/library called libLLVM.
Codegen in Julia refers both to the process of taking a Julia AST and turning it into LLVM instructions,
and the process of LLVM optimizing that and turning it into native assembly instructions.
* C++
The programming language that LLVM is implemented in, which means that codegen is also implemented
in this language. The rest of Julia's library is implemented in C, in part because its smaller
feature set makes it more usable as a cross-language interface layer.
* box
This term is used to describe the process of taking a value and allocating a wrapper around the
data that is tracked by the garbage collector (gc) and is tagged with the object's type.
* unbox
The reverse of boxing a value. This operation enables more efficient manipulation of data when
the type of that data is fully known at compile-time (through type inference).
* generic function
A Julia function composed of multiple "methods" that are selected for dynamic dispatch based on
the argument type-signature
* anonymous function or "method"
A Julia function without a name and without type-dispatch capabilities
* primitive function
A function implemented in C but exposed in Julia as a named function "method" (albeit without
generic function dispatch capabilities, similar to a anonymous function)
* intrinsic function
A low-level operation exposed as a function in Julia. These pseudo-functions implement operations
on raw bits such as add and sign extend that cannot be expressed directly in any other way. Since
they operate on bits directly, they must be compiled into a function and surrounded by a call
to `Core.Intrinsics.box(T, ...)` to reassign type information to the value.
JIT Code Generation
Codegen is the process of turning a Julia AST into native machine code.
The JIT environment is initialized by an early call to jl_init_codegen
in codegen.cpp
.
On demand, a Julia method is converted into a native function by the function emit_function(jl_method_instance_t*)
.
(note, when using the MCJIT (in LLVM v3.4+), each function must be JIT into a new module.) This
function recursively calls emit_expr()
until the entire function has been emitted.
Much of the remaining bulk of this file is devoted to various manual optimizations of specific
code patterns. For example, emit_known_call()
knows how to inline many of the primitive functions
(defined in builtins.c
) for various
combinations of argument types.
Other parts of codegen are handled by various helper files:
-
Handles backtraces for JIT functions
-
Handles the ccall and llvmcall FFI, along with various
abi_*.cpp
files -
Handles the emission of various low-level intrinsic functions
!!! sidebar “Bootstrapping” The process of creating a new system image is called “bootstrapping”.
The etymology of this word comes from the phrase "pulling oneself up by the bootstraps", and
refers to the idea of starting from a very limited set of available functions and definitions
and ending with the creation of a full-featured environment.
System Image
The system image is a precompiled archive of a set of Julia files. The sys.ji
file distributed
with Julia is one such system image, generated by executing the file sysimg.jl
,
and serializing the resulting environment (including Types, Functions, Modules, and all other
defined values) into a file. Therefore, it contains a frozen version of the Main
, Core
, and
Base
modules (and whatever else was in the environment at the end of bootstrapping). This serializer/deserializer
is implemented by jl_save_system_image
/jl_restore_system_image
in staticdata.c
.
If there is no sysimg file (jl_options.image_file == NULL
), this also implies that --build
was given on the command line, so the final result should be a new sysimg file. During Julia initialization,
minimal Core
and Main
modules are created. Then a file named boot.jl
is evaluated from the
current directory. Julia then evaluates any file given as a command line argument until it reaches
the end. Finally, it saves the resulting environment to a “sysimg” file for use as a starting
point for a future Julia run.