Reference

  • ARM Datasheet

Armv8-A_memory_model_guide.pdf
armv8_a_address_translation.pdf
ARM Architecture Reference Manual Armv8.pdf
No Barrier in the Road - liuppopp20.pdf

Background (Why)

Master devices (Processor, DMA) and software, have to access memory of slave devices (RAM, MMIO) with: (1) correctness, (2) best performance.

Memory Model Overview (What)

A memory model is a way of organizing and defining how memory behaves.
It provides a structure and a set of rules for you to follow when you configure how addresses, or regions of addresses, are accessed and used in your system.
The memory model provides attributes that you can apply to an address and it defines the rules associated with memory access ordering.
address_map.PNG

  • MMU & IOMMU

Translating VA to PA + Attributes for masters (e.g. processor)
This relates to: TTBR_EL & SCR_* registers

  • Page Table (Translation Table)

Describing memory (address and attributes) in Armv8-A
Cacheability & shareability attributes, Permissions attributes, Access Flag, etc.
This relates to: TCR_* registers

  • Memory Ordering

the order in which memory accesses appear in the memory system.
Because of mechanisms like write-buffers and caches, even when instructions are executed in order, the related memory accesses may not be executed in order.
This relates to: (1) Cache & TLB, (2) Coherency & dependency & barrier instructions

  • Memory Type (Normal memory and Device memory)

accesses to locations marked as Normal can be re-ordered, cached.
accesses to locations marked as Device should be in-ordered and never-cached.
The memory type is not directly encoded into the translation table entry. Instead, the Index field in the translation table entry is used to select an entry from the MAIR_ELx (Memory Attribute Indirection Register).

Page Table (How)

  • The mapping between virtual and physical address spaces is defined in a set of translation tables, also sometimes called page tables.
  • For each block or page of virtual addresses, the translation tables provide the corresponding physical address and the attributes for accessing that page. Each translation table entry is called a block or page descriptor.

    Descriptor Format

    page_table_descriptor_format.png

    Hierarchical Attributes

    Some memory attributes can be specified in the Table descriptors in higher-level tables. These are hierarchical attributes. This applies to Access Permission, Execution Permissions, and the Physical Address space. If these bits are set then they override the lower level entries, and if the bits are clear the lower level entries are used unmodified.

    Attribute Fields

    image.png

    Combining Stage 1 & Stage 2

    Both Stage 1 and Stage 2 tables include attributes. How are these combined?
    In the Arm architecture,the default is to use the most restrictive type.
    image.png

    Memory Aliasing

    When a given location in the physical address space has multiple virtual addresses, this is called aliasing.
    If the attributes are not compatible, the memory accesses might not behave like you expect, which can impact performance.
    image.png

    MMU (How)

    What & Where

    MMU iss to enable the system to run multiple tasks, as independent programs running in their own private virtual memory space. They do not need any knowledge of the physical memory map of the system,
    image.png

    What if MMU Disabled

    image.png

    Memory Ordering (How)

    Weakly-ordered Memory Model

    X86 uses a total store order (TSO), only loads may be reordered with earlier stores to different locations. ARM uses a weakly-ordered memory model (WMM), reordering of any non-dependent memory operations is permitted.
    image.png

    Order-preserving Options (Barriers)

    image.png

  • Data Memory Barrier (DMB) prevents reordering of memory accesses across the barrier. All data accesses, but not instruction fetches, performed by this processor before the DMB, are visible to all other masters within the specified shareability domain before any of the data accesses after the DMB.

  • Data Synchronization Barrier (DSB) prevents reordering of any instructions across the barrier. DSB will make sure that all masters in the specified domain can observe the previous operations before issuing any subsequent instructions.
  • Load-Acquire (LDAR)/Store-Release (STLR) are a pair of one-way barriers introduced in ARMv8.
  • Data Dependency (DATA Dep) exists when the value to be stored depends on the value loaded previously.
  • Address Dependency (ADDR Dep) exists when the target addresses of the following memory operations depend on the value loaded previously.
  • Control Dependency (CTRL) can preserve the order between a load to program-order-later store operations in the conditional branch when the loaded value is used to compute the condition.

    Barriers from a Hardware Perspective

  • When a barrier instruction (DMB or DSB or LDAR/STLR) reaches the issue queue of the ARM processor, it blocks different types of subsequent instructions according to the type of the barrier. Then the barrier instruction is sent to the load-store unit.

  • Barriers that require the assistance from the bus issue an ACE barrier transaction. Before receiving the response for the barrier transaction from the bus, the barrier instruction cannot retire, and the subsequent instructions cannot be issued.
  • However, weaker barriers like DMB ld and LDAR are likely to be implemented without sending anything to the bus as the processors can identify whether loads have finished without involving the bus.

    Cache & TLB & Write-buffer (TBD)

    Coherency (TBD)