This CVE is about the Apple’s macOS OS kernel. It allows attackers to read sensitive data from kernel’s address space. By Kevin Backhouse. 🚴🏼

    XNU kernel is part of the Darwin operating system for use in macOS and iOS operating systems.

    Apple’s XNU Kernel: Finding a memory exposure vulnerability with CodeQL (CVE-2017-13782)
    Preparation
    Know a little bit about C++, CodeQL, and BSD subsystems code.

    CS106L: All lecture notes
    CSE333: All lecture notes
    CodeQL for C and C++
    Starting Point
    Github - XNU kernel
    Kevin runs the analysis from LGTM.com locally, and found that there are many alerts for dtrace.c. Moreover, it has interpreter in the kernel, so likely there is a bug based on Kevin’s rich experience in software programming and security research background.

    DTrace uses its own custom bytecode format. The main interpreter loop for the bytecode is in the function dtrace_dif_emulate. Validation is done by dtrace_difo_validate, which ensures that the bytecode does not perform any malicious actions.

    Key Ideas
    root -> /dev/dtrace + ioctl
    any user -> /dev/dtracehelper -> register DTrace Helpers -> enable JIT Compilers to produce better stack traces. -> the ustack feature of DTrace Helpers does not work on macOS, but works well for an attacker to plant malicious DTrace helper
    CodeQL Query
    Kevin’s CodeQL Query: Github - DTraceUnsafeIndex.ql

    By simply using a global data flow codeQL query, a CVE injection point is found.

    Part 1
    /**

    • @name DTrace unsafe index
    • @description DTrace registers are user-controllable, so they must not be
      1. used to index an array without a bounds check.
    • @kind path-problem

    • @problem.severity warning
    • @id apple-xnu/cpp/dtrace-unsafe-index
      */

    import cpp // Imports the standard CodeQL libraries for C/C++.
    import semmle.code.cpp.dataflow.DataFlow // the global data flow library
    import DataFlow::PathGraph

    class RegisterAccess extends ArrayExpr { // a QL class to find all the register accesses
    RegisterAccess() {
    exists (LocalScopeVariable regs, Function emulate | // so that ↓
    regs.getName() = “regs” and // accesses an element of the array named regs
    emulate.getName() = “dtrace_dif_emulate” and // regs in the function named dtrace_dif_emulate
    regs.getFunction() = emulate and // regs must be in dtrace_dif_emulate
    this.getArrayBase() = regs.getAnAccess()) // Is this a kind of assignment?
    }
    }
    A typical RegisterAccess: rval = regs[rd];

    Notes:

    ArrayExpr is a CodeQL class for Expr[Expr]. A C/C++ array access expression. Commonly-used library classes can be found here.

    Get all local variable regs with name regs. “regs” are LocalScopeVariable. DTrace bytecode uses 8 virtual registers, which are stored in an array named regs.

    Get all functions emulate with name dtrace_dif_emulate, because it is Dtrace’s main interpreter loop as mentioned above

    Limit the RegisterAccess conditions by only collecting the regs value in function dtrace_dif_emuluate in previous 2 steps.

    LocalScopeVariable. A C/C++ variable with block scope. Indicates that the regs are inside the block scope of function dtrace_dif_emulate.

    getArrayBase: Gets the array or pointer expression being subscripted. This is arr in both arr[0] and 0[arr].

    getAnAccess: Gets an access to this variable.

    If you are not familiar with some CodeQL definitions, can search at CodeQL library search

    Part 2
    Define the PointerUse class for potentially dangerous uses, such as indexing an array or deferencing a pointer.

    class PointerUse extends Expr {
    PointerUse() {
    exists (ArrayExpr ae | this = ae.getArrayOffset()) or
    exists (PointerDereferenceExpr deref | this = deref.getOperand()) or
    exists (PointerAddExpr add | this = add.getAnOperand())
    }
    }
    Notes:

    getArrayOffset: Gets the expression giving the index into the array. This is 0 in both arr[0] and 0[arr].
    PointerDereferenceExpr: An instance of the built-in unary operator * applied to a type.
    getOperand: Gets the operand of this unary operation.
    getAnOperand: Gets an operand of this operation. Adding operation has two operands, take one.
    Part 3
    Try to know if there are any dataflow paths from a RegisterAccess (Source) to a PointerUse (Sink).

    class DTraceUnsafeIndexConfig extends DataFlow::Configuration {
    DTraceUnsafeIndexConfig() {
    this = “DTraceUnsafeIndexConfig”
    }

    override predicate isSource(DataFlow::Node node) { // Source is the RegisterAccess
    node.asExpr() instanceof RegisterAccess
    }

    override predicate isSink(DataFlow::Node node) { // Sink is the dangerous PointerUse
    node.asExpr() instanceof PointerUse
    }
    }
    Notes:

    asExpr: Gets the non-conversion expression corresponding to this node, if any. If this node strictly (in the sense of asConvertedExpr) corresponds to a Conversion, then the result is that Conversion’s non-Conversion base expression.

    Data Flow Tracking

    The Local Data Flow library is data flow within a single function.
    Global data flow tracks data flow throughout the entire program, and is therefore more powerful than local data flow. However, global data flow is less precise than local data flow, and the analysis typically requires significantly more time and memory to perform.
    Part 4
    The Actual Query in the end:

    from DTraceUnsafeIndexConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
    where config.hasFlowPath(source, sink) // Defines a condition on the variables.
    select sink, source, sink, “DTrace unsafe index” // Defines what to report for each match with a string that explains the problem.

    /*

    • This query has 16 results. The 16th result is the vulnerability: dtrace_isa.c:817
      */
      The following predicates are defined in the Global Data Flow configuration:

    isSource: defines where data may flow from
    isSink: defines where data may flow to
    isBarrier: optional, restricts the data flow
    isBarrierGuard: optional, restricts the data flow
    isAdditionalFlowStep: optional, adds additional flow steps
    The data flow analysis is performed using the predicate hasFlow(DataFlow::Node source, DataFlow::Node sink):

    from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
    where dataflow.hasFlow(source, sink)
    select source, “Data flow to $@.”, sink, sink.toString()
    hasFlowPath Holds if data may flow from source to sink for this configuration. The corresponding paths are generated from the end-points and the graph included in the module PathGraph.
    CodeQL Results
    2020_12_10_1(Image from Kevin’s demo)

    This query produces 16 results, one of which is this pointer dereference which does not have a bounds check. The other 15 results are uninteresting. If we wanted to, we could further refine the query to reduce the number of false positives. For example, this result is a false positive, because the call to dtrace_canstore on line 5699 is a bounds check.

    The bound check at Line 5704:

    case DIFOP_STB:
    if (!dtrace_canstore(regs[rd], 1, mstate, vstate)) { // the bound check
    flags |= CPU_DTRACE_BADADDR;
    illval = regs[rd];
    break;
    }
    ((uint8t )(uintptr_t)regs[rd]) = (uint8_t)regs[r1]; // false positive
    break;
    This is amazing: This query has 16 results. The 16th result is the vulnerability: dtrace_isa.c:817

    PoC
    -Github Link: CVE-2017-13782 PoC

    Even though I learnt a little bit about C, I find it hard to fully understand the code, hope I can understand it some day, haha…🐰

    According to his sharing, after figuring out the vulnerable point, Kevin spent times creating a dtrace object to trigger this bug, but he didn’t know how to debug in macOS kernel, so he spent a few days studying:

    Extract the parsing code from the source code
    test the dtrace file generated to check whether they are valid
    But when he tried, nothing happens… Then he spent few more days to study the kernel to find what goes wrong…

    He thinks Escalating Privileges adds more value to the cve, to be a true security expert.

    Random Thoughts
    I feel like using CodeQL is just like using an effective tool like Burp Suite, nmap, or whatever. The real power of security researcher is the experience to smell the vulnerable points may hide inside the interpreter with kernel, and to filter the false positives the tool generates, and finally write the PoC as the reporter Mr. Backhouse did. 🥾

    Always embrace new challenges. 🧐

    One sentence he says is really interesting: A security researcher is as good as the last CVE he found. 😆

    Some resources mentioned in Kevin’s live sharing
    Techniques:

    Static Analysis -> scanning for large codebases, finding interesting places to look, codeQL, backwards from the potential places to trigger bugs to how to trigger it, need to test with unusual patterns sometimes
    Manual Audit -> understand why that bug happens
    Fuzzing -> developers usually test with valid input, effective on dense file formats. It is the opposite way to the static analysis, starting from the input to explore reachable paths.
    On Ubuntu 18.04, can try :

    How to escape from the fuzz
    Fuzzing software: common challenges and potential solutions (Part 1)
    Other CPP Codeql Examples:

    More Cpp CodeQL Queries
    Reference
    Apple’s XNU Kernel: Finding a memory exposure vulnerability with CodeQL (CVE-2017-13782)
    Getting started in security research - Kevin Backhouse
    C++: Using global data flow
    macOS High Sierra 10.13.1, Security Update