ObjectRepository - .NET in-memory repository pattern for your home projects

Why keep all the data in memory?

To store site or backend data, the first choice of most sane people will choose a SQL database. 

But sometimes the thought comes to mind that the data model is not suitable for SQL: for example, when building a search or a social graph, you need a search for complex relationships between objects. 

The worst situation is when you work in a team, and a colleague does not know how to build quick queries. How much time did you spend solving the N+1 problems and building additional indexes so that the SELECT on the main page completed in a reasonable amount of time?

Another popular approach is NoSQL. A few years ago there was a big hype around this topic - for any opportunity they deployed MongoDB and rejoiced at the answers in the form of json documents (by the way, how many crutches had to be inserted because of circular references in documents?).

I suggest trying another, alternative way - why not try storing all data in the application's memory, periodically saving it to arbitrary storage (file, remote database)? 

Memory has become cheap, and most small and medium-sized projects will fit any possible data into 1 GB of memory. (For example, my favorite home project is financial tracker, which maintains daily statistics and a history of my spending, balances, and transactions for a year and a half consumes only 45 MB of memory.)

Pros:

  • Data access becomes easier - no need to worry about queries, lazy loading, ORM features, work happens with ordinary C # objects;
  • There are no problems with access from different threads;
  • Very fast - no network requests, no translation of code into the query language, no need to (de)serialize objects;
  • It is permissible to store data in any form - even in XML on disk, even in SQL Server, even in Azure Table Storage.

Cons:

  • Horizontal scaling is lost, and as a result, zero downtime deployment cannot be done;
  • If the application crashes, you can partially lose data. (But our application never crashes, right?)

How does it work?

The algorithm is as follows:

  • At the start, a connection is established with the data warehouse, and the data is loaded;
  • An object model is built, primary indexes, and relationship indexes (1:1, 1:Many);
  • A subscription is created for object property changes (INotifyPropertyChanged) and for adding or removing elements to the collection (INotifyCollectionChanged);
  • When the subscription is triggered, the changed object is added to the queue for writing to the data store;
  • Periodically (on a timer) in a background thread, changes are saved to the storage;
  • When you exit the application, changes are also saved to storage.

Sample code

Adding the necessary dependencies

// Основная библиотека
Install-Package OutCode.EscapeTeams.ObjectRepository
    
// Хранилище данных, в котором будут сохраняться изменения
// Используйте то, которым будете пользоваться.
Install-Package OutCode.EscapeTeams.ObjectRepository.File
Install-Package OutCode.EscapeTeams.ObjectRepository.LiteDb
Install-Package OutCode.EscapeTeams.ObjectRepository.AzureTableStorage
    
// Опционально - если нужно хранить модель данных для Hangfire
// Install-Package OutCode.EscapeTeams.ObjectRepository.Hangfire

Describe the data model that will be stored in the storage

public class ParentEntity : BaseEntity
{
    public ParentEntity(Guid id) => Id = id;
}
    
public class ChildEntity : BaseEntity
{
    public ChildEntity(Guid id) => Id = id;
    public Guid ParentId { get; set; }
    public string Value { get; set; }
}

Then the object model:

public class ParentModel : ModelBase
{
    public ParentModel(ParentEntity entity)
    {
        Entity = entity;
    }
    
    public ParentModel()
    {
        Entity = new ParentEntity(Guid.NewGuid());
    }
    
    public Guid? NullableId => null;
    
    // Пример связи 1:Many
    public IEnumerable<ChildModel> Children => Multiple<ChildModel>(x => x.ParentId);
    
    protected override BaseEntity Entity { get; }
}
    
public class ChildModel : ModelBase
{
    private ChildEntity _childEntity;
    
    public ChildModel(ChildEntity entity)
    {
        _childEntity = entity;
    }
    
    public ChildModel() 
    {
        _childEntity = new ChildEntity(Guid.NewGuid());
    }
    
    public Guid ParentId
    {
        get => _childEntity.ParentId;
        set => UpdateProperty(() => _childEntity.ParentId, value);
    }
    
    public string Value
    {
        get => _childEntity.Value;
        set => UpdateProperty(() => _childEntity.Value, value
    }
    
    // Доступ с поиском по индексу
    public ParentModel Parent => Single<ParentModel>(ParentId);
    
    protected override BaseEntity Entity => _childEntity;
}

And finally, the repository class itself for accessing data:

public class MyObjectRepository : ObjectRepositoryBase
{
    public MyObjectRepository(IStorage storage) : base(storage, NullLogger.Instance)
    {
        IsReadOnly = true; // Для тестов, позволяет не сохранять изменения в базу
    
        AddType((ParentEntity x) => new ParentModel(x));
        AddType((ChildEntity x) => new ChildModel(x));
    
        // Если используется Hangfire и необходимо хранить модель данных для Hangfire в ObjectRepository
        // this.RegisterHangfireScheme(); 
    
        Initialize();
    }
}

Create an ObjectRepository instance:

var memory = new MemoryStream();
var db = new LiteDatabase(memory);
var dbStorage = new LiteDbStorage(db);
    
var repository = new MyObjectRepository(dbStorage);
await repository.WaitForInitialize();

If the project will use HangFire

public void ConfigureServices(IServiceCollection services, ObjectRepository objectRepository)
{
    services.AddHangfire(s => s.UseHangfireStorage(objectRepository));
}

Inserting a new object:

var newParent = new ParentModel()
repository.Add(newParent);

With this call, the object ParentModel is added both to the local cache and to the queue for writing to the database. Therefore, this operation takes O(1), and you can work with this object right away.

For example, to find this object in the repository and make sure the returned object is the same instance:

var parents = repository.Set<ParentModel>();
var myParent = parents.Find(newParent.Id);
Assert.IsTrue(ReferenceEquals(myParent, newParent));

What happens? set () returns Table Dictionary, which contains Concurrent Dictionary and provides additional functionality for primary and secondary indexes. This allows you to have methods to search by Id (or other arbitrary custom indexes) without having to enumerate all objects.

When adding objects to ObjectRepository a subscription is added to change their properties, so any change in properties also results in adding this object to the write queue. 
Updating properties from the outside looks the same as working with a POCO object:

myParent.Children.First().Property = "Updated value";

You can delete an object in the following ways:

repository.Remove(myParent);
repository.RemoveRange(otherParents);
repository.Remove<ParentModel>(x => !x.Children.Any());

This also adds the object to the deletion queue.

How does saving work?

ObjectRepository when the tracked objects change (both adding or removing, or changing properties), triggers an event ModelChangedto which is subscribed IStorage. Implementations IStorage when an event occurs ModelChanged stack changes in 3 queues - for adding, for updating, and for deleting.

Also implementations IStorage during initialization, a timer is created that causes changes to be saved every 5 seconds. 

In addition, there is an API to force a save call: ObjectRepository.Save().

Before each save, meaningless operations are first removed from the queues (for example, duplicate events - when an object has changed twice or a quick addition / removal of objects), and only then the save itself. 

In all cases, the actual object is saved in its entirety, so it is possible that objects are saved in a different order than they changed, including newer versions of objects than at the time they were added to the queue.

What else is there?

  • All libraries are based on .NET Standard 2.0. Can be used in any modern .NET project.
  • The API is thread safe. Internal collections are implemented based on Concurrent Dictionary, event handlers either have locks or don't need them. 
    The only thing to remember is that when the application ends, call ObjectRepository.Save();
  • Custom indexes (require uniqueness):

repository.Set<ChildModel>().AddIndex(x => x.Value);
repository.Set<ChildModel>().Find(x => x.Value, "myValue");

Who is using it?

Personally, I started using this approach in all hobby projects because it is convenient and does not require large expenses for writing a data access layer or deploying heavy infrastructure. Personally, as a rule, storing data in litedb or in a file is enough for me. 

But in the past, when the now deceased startup EscapeTeams was made with the team (I thought here they are, money - but no, again experience) - used to store data Azure Table Storage.

Plans for the future

I would like to fix one of the main disadvantages of this approach - horizontal scaling. This requires either distributed transactions (sic!), or a strong-willed decision that the same data from different instances should not change, or let them change according to the principle “who is last is right”.

From a technical point of view, I see the following scheme as possible:

  • Store instead of the EventLog and Snapshot object model
  • Discover other instances (add endpoints of all instances to settings? udp discovery? master/slave?)
  • Replicate between EventLog instances via any of the consensus algorithms such as RAFT.

There is also another problem that worries me - this is cascade deletion, or the detection of cases of deletion of objects that are referenced from other objects. 

Source

If you have read up to here, then only the code remains to be read, it can be found on GitHub:
https://github.com/DiverOfDark/ObjectRepository

Source: habr.com

Add a comment