Best Practices

Orleans was built with the goal to greatly simplify building of distributed scalable applications, especially for the cloud. Orleans invented the Virtual Actor Model as an evolution of the Actor Model optimized for the cloud scenarios.

Grains (virtual actors) are the base building blocks of an Orleans-based application. They encapsulate state and behavior of application entities and maintain their lifecycle. The programming model of Orleans and the characteristics of its runtime fit some types of applications better than others. This document is intended to capture some of the tried and proven application patterns that work well in Orleans.

Orleans should be considered when:

  • Significant number (hundreds, millions, billions, and even trillions) of loosely coupled entities. To put the number in perspective, Orleans can easily create a grain for every person on Earth in a small cluster, so long as a subset of that total number is active at any point in time.

    • Examples: user profiles, purchase orders, application/game sessions, stocks
  • Entities are small enough to be single-threaded

    • Example: Determine if stock should be purchased based on current price
  • Workload is interactive

    • Example: request-response, start/monitor/complete
  • More than one server is expected or may be required

    • Orleans runs on a cluster which is expanded by adding servers to expand the cluster
  • Global coordination is not needed or on a smaller scale between a few entities at a time

    • Scalability and performance of execution is achieved by parallelizing and distributed a large number of mostly independent tasks with no single point of synchronization.

Orleans is not the best fit when:

  • Memory must be shared between entities

    • Each grain maintains its own states and should not be shared.
  • A small number of large entities that may be multithreaded

    • A microservice may be a better option when supporting complex logic in a single service
  • Global coordination and/or consistency is needed

    • Such global coordination would severely limit performance of an Orleans-based application. Orleans was built to easily scale to a global scale without the need of in-depth manual coordination.
  • Operations that run for a long time

    • Batch jobs, Single Instruction Multiple Data (SIMD) tasks

    • This depends on the need of the application and may be a fit for Orleans

Grains

Overview:

  • Grains resemble objects. However, they are distributed, virtual, and asynchronous.

  • They are loosely coupled, isolated, and primarily independent

    • Each grain is encapsulated which also maintains its own state independently of other grains

    • Grains fail independently

  • Avoid chatty communication between grains

    • Direct memory use is significantly less expensive than message passing

    • Highly chatty grains may be better combined as a single grain

    • Complexity/Size of arguments and serialization need to be considered

      • Deserializing twice may be more expensive than resending a binary message
  • Avoid bottleneck grains

    • Single coordinator/Registry/Monitor

    • Do staged aggregation if required

Asynchronicity:

  • No thread blocking: All items must be Async (Task Asynchronous Programming (TAP))

  • await is the best syntax to use when composing async operations

  • Common Scenarios:

    • Return a concrete value:

      • return Task.FromResult(value);
    • Return a Task of the same type:

      • return foo.Bar();
    • Await a Task and continue execution:

      • var x = await bar.Foo(); var y = DoSomething(x); return y;
    • Fan-out:

      • var tasks = new List<Task>(); foreach(var grain in grains) { tasks.Add(grain.Foo()) } await Task.WhenAll(tasks); DoMoreWork();

Implementation of Grains:

  • Never perform a thread-blocking operation within a grain. All operations other than local computations must be explicitly asynchronous.

    • Examples: Synchronously waiting for an IO operation or a web service call, locking, running an excessive loop that is waiting for a condition, etc.
  • When to use a [StatelessWorker]

    • Functional operations such as: decryption, decompression, and before forwarding for processing

    • When only local grains are required in multiple activations

    • Example: Performs well with staged aggregation within local silo first

  • Grains are non-reentrant by default

    • Deadlock can occur due to call cycles

      • Examples:

        • The grain calls itself

        • Grains A calls B while C is also calling A (A->B->C->A)

        • Grain A calls Grain B as Grain B is calling Grain A (A->B->A)

    • Timeouts are used to automatically break deadlocks

    • Attribute [Reentrant] can be used to allow the grain class reentrant

    • Reentrant is still single-threaded however, it may interleave (divide processing/memory between tasks)

    • Handling interleaving increases risk by being error prone

  • Inheritance

    • Grain classes inherit from the Grain base class. Grain intrerfaces (one or more) can be added to each grain.

    • Disambiguation may be needed to implement the same interface in multiple grain classes

  • Generics are supported

Grain State Persistence

Orleans’ grain state persistence APIs are designed to be easy-to-use and provide extensible storage functionality.

  • Tutorial: Needs to be created

Overview:

  • Orleans.IGrainState is extended by a .NET interface which contains fields that should be included in the grain’s persisted state.

  • Grains are persisted by using IPersistentState\ is extended by the grain class that adds a strongly typed State property into the grain’s base class.

  • The initial State.ReadStateAsync() automatically occurs prior to ActiveAsync() has been called for a grain.

  • When the grain’s state object’s data is changed, then the grain should call State.WriteStateAsync()

    • Typically, grains call State.WriteStateAsync() at the end of grain method to return the Write promise.

    • The Storage provider could try to batch Writes that may increase efficiency, but behavioral contract and configurations are orthogonal (independent) to the storage API used by the grain.

    • A timer is an alternative method to write updates periodically.

      • The timer allows the application to determine the amount of “eventual consistency”/statelessness allowed.

      • Timing (immediate/none/minutes) can also be controlled as to when to update.

    • PersistetState classes, like other grain classes, can only be associated with one storage provider.

      • [StorageProvider(ProviderName=”name”)] attribute associates the grain class with a particular provider

      • \ will need to be added to the Silo config file which should also include the corresponding “name” from [StorageProvider(ProviderName=”name”)]

      • A composite storage provider can be used with SharedStorageProvider

Storage Providers

Built-in Storage Providers

  • Orleans.Storage houses all of the built-in storage providers. The namespace is: OrleansProviders.dll

  • MemoryStorage (Data stored in memory without durable persistence) is used only for debugging and unit testing.

  • AzureTableStorage

    • Configure the Azure storage account information with an optional DeleteStateOnClear (hard or soft deletions)

    • Orleans serializer efficiently stores JSON data in one Azure table cell

    • Data size limit == max size of the Azure column which is 64kb of binary data

    • Community contributed code that extends the use of multiple table columns which increases the overall maximum size to 1mb.

Storage Provider Debugging Tips

  • TraceOverride Verbose3 will log much more information about storage operations.

    • Update silo config file

      • LogPrefix=”Storage” for all providers, or specific type using “Storage.Memory” / ”Storage.Azure” / “Storage.Shard”

How to deal with Storage Operation Failures

  • Grains and storage providers can await storage operations and retry failures as needed

  • Unhandled failures will propagate back to the caller and will be seen by the client as a broken promise

  • Other than the initial read, there is not a concept that automatically destroys activations if a storage operation fails

  • Retrying a failing storage is not a default feature for built-in storage providers

Grain Persistence Tips

Grain Size

  • Optimal throughput is achieved by using multiple smaller grains rather than a few larger grains. However, the best practice of choosing a grain size and type is base on the application domain model.

    • Example: Users, Orders, etc.

External Changing Data

  • Grain are able to re-read the current state data from storage by using State.ReadStateAsyc()

  • A timer can also be used to re-read data from storage periodically as well

    • The functional requirements could be based on a suitable “staleness” of the information

      • Example: Content Cache Grain
  • Adding and Removing Fields

    • The storage provider will determine the effects of adding and removing additional fields from its persisted state.

    • Azure table does not support schemas and should automatically adjust to the additional fields.

Writing Custom Providers

  • Storage providers are simple to write which is also a significant extension element for Orleans

    • Tutorial: need tutorial
  • The API GrainState API contract drives the storage API contract (Write, Clear, ReadStateAsync())

  • The storage behavior is typically configurable (Batch writing, Hard or Soft Deletions, etc.) and defined by the storage provider

Cluster Management

  • Orleans automatically manages clusters

    • Failed nodes —that is that can fail and join at any moment— are automatically handled by Orleans

    • The same silo instance table that is created for the clustering protocol can also be used for diagnostics. The table keeps a history of all of the silos in the cluster.

    • There are also configuration options of an aggressive or a more lenient failure detection

  • Failures can happen at any time and are a normal occurrence

    • In the event a silo fails, the grains that were activated on the failed silo will automatically be reactived later on other silos within the cluster.

    • Grains have an ability to timeout. A retry solution such as Polly can assist with retries.

    • Orleans provides a message delivery guaruntee where each message is delivered at-most-once.

    • It is a responsibility of the caller to retry any failed calls if needed.

      • Common practice is to retry from end-to-end from the client/front end

Deployment and Production Management

Scaling out and in

  • Monitor the Service-Level Agreement (SLA)

  • Add or Remove instances

  • Orleans automatically rebalances and takes advantage of the new hardware. However, activated grains are not rebalanced when a new silo is added to the cluster.

Logging and Testing

  • Logging, Tracing, and Monitoring

    • Inject logging Dependency injection

      • public HelloGrain(ILogger<HelloGrain> logger) {this.logger = logger;}
    • Microsoft.Extensions.Logging is utilized for functional and flexible logging

Testing

  • Microsoft.Orleans.TestingHost NuGet package contains TestCluster which can be used to create an in-memory cluster, comprised of two silos by default, which can be used to test grains.

  • Additional information can be found here

Troubleshooting

  • Use Azure table-based membership for development and testing

    • Works with Azure Storage Emulator for local troubleshooting

    • OrleansSiloInstances table displays the state of the cluster

    • Use unique deployment Ids (partition keys) in order to keep it simple

  • Silo isn’t starting

    • Check OrleansSiloInstances to determine if the silo registered there.

    • Make sure that firewall is open for TCP ports: 11111 and 30000

    • Check the logs, including the extra log that contains startup errors

  • Frontend (Client) cannot connect to the silo cluster

    • The client must be hosted in the same service as the silos

    • Check OrleansSiloInstances to make sure the silos (gateways) are registered

    • Check the client log to make sure that the gateways match the ones listed in the OrleansSiloInstances’ table

    • Check the client log to validate that the client was able to connect to one or more of the gateways