Using ImmutableSortedSet in C# for memory sharing

Stylized crystalline tree symbolizing structural sharing in immutable sorted sets.

Intro

Immutable data structures in C# have always seemed very intriguing to me because while they come with many interesting properties, I very rarely find a situation where they would be appropriate to use. Finding one of those situations is exciting enough to motivate me to write an article about it!

What are immutable data structures?

Immutable data structures contain data that can only be set upon creation and cannot be mutated afterwards. In the C# standard library, they are implemented by creating objects which can be created, but which have no mutating properties, members or methods. If a change needs to be made on an immutable data structure, a new object is created with the change applied and the previous objects still remains. Let’s look at the ImmutableSortedSet collection to understand how it works in depth and how we can use it to optimize memory sharing.

The Immutable Sorted Set

On the surface, the immutable sorted set is a data structure that is very close in capabilities to the sorted set collection. It represents a set where members can be:

  1. Added, removed and queried for membership with a logarithmic time complexity
  2. Retrieved in a sorted order

The immutability comes with some extra constraints:

  1. The elements can only be added on the creation of the set
  2. Adding/removing elements requires the creation of a new set
  3. Reviewers of your PR will likely need to lookup the documentation to figure out what this structure is and then ask you why you are not just using a hashset

How is an immutable sorted set created?

Creating a set is mostly done via using the builder pattern. This gives you a mutable, not thread-safe object where you can do modifications to create the data inside your structure. You get the actual structure by invoking the ToImmutable method on the builder.

public static ImmutableSortedSet<string> CreateFruitsSet()
{
    // Create a builder
    var builder = ImmutableSortedSet.CreateBuilder<string>();

    // Add elements while mutable
    builder.Add("apple");
    builder.Add("banana");
    builder.Add("cherry");

    // Finalize the set - now it's immutable
    var fruitsSet = builder.ToImmutable();

    return fruitsSet;
}

How do I make changes to it?

When we add an element, the old fruit set is not modified but instead we get a new set which also contains the new element. It works the same for any mutating operation.

ImmutableSortedSet<string> newFruitSet = fruitSet.Add("new fruit");

Structural Sharing

The pattern above looks scary if you care about performance. If I have a set with 1M elements, do I need to re-create the entirety of it on every single modification? For n elements, that would explode the time complexity of insertion to O(n * logn) and the memory complexity to O(n*n). Fortunately, unlike the string type which is also immutable, the answer here is no. The immutable sorted set uses a pattern called structural sharing in order to avoid this.

Structural sharing refers to designing the data structure in such a way so that when a new instance is created by modifying an existing one, the new instance shares as much memory as possible with the original instance. To understand that properly, let’s visit the details of the immutable sorted set and see how immutability makes this possible.

ImmutableSortedSet - Behind the scenes

If we look at the source code for the implementation of the ImmutableSortedSet type, we quickly find that it is a wrapper around an AVL tree. An AVL tree is a self-balancing binary search tree, which explains the time complexity we saw above. By traversing its nodes, we can also efficiently get them in sorted order. So this explains the SortedSet part of the name. The Immutable part is the magic that allows for structural sharing.

Let’s take an insertion to an AVL tree of integers as an example. The sapling below has an ambition to grow:

        20
       /   \
     10     30
    /  \    / \
   5   15  25 35

Let’s suppose we add the element ‘12’ to it. We will get the new tree:

        20'
       /    \
     10'     30
    /   \    / \
   5    15' 25 35
        /
      12'

Let’s compare the two trees

Since we know that the nodes from the original tree are immutable, all the unchanged sub-trees can be re-used! So in the end, the asymptotic complexity of modifications is the same as for the mutable sorted set.

Uses

Batch Operations

When a mutable collection needs to be shared between multiple threads, there are usually two options to maintain developer sanity.

  1. Put it under a lock to serialize all operations. But a coarse lock can become a bottleneck if it is in the hot path and there’s contention.
  2. Use a thread-safe collection. But then you spend a long time pondering about all the different ways in which its state may become unpredictable

One of the more challenging issues that arises in concurrent code is doing batch operations atomically. For example, in C# we can use ConcurrentDictionary collection as a thread-safe set to avoid using a lock. However, it only allows us to add/remove/update one element at a time. If we need to batch these operations together, we either need to use a single lock for modifies and reads (which defeats the purpose of the concurrent dictionary) or we need to create a new dictionary, copy all the elements and atomically swap it. The immutable sorted set is great for the latter scenario because it is optimized for creating copies with just the new elements.

Shared global state

When designing systems with multiple concurrent consumers, it is often the case that the state in the system is comprised of two parts, a global shared state and isolated state per consumer. As an example, let’s consider a search engine serving different users. The application maintains a global set of indexed URLs that all users can query. There are two kinds of URLs: standard URLs, which are part of the global index available to everyone, and user-submitted URLs, which users can choose to add to their personal search experience. Let’s also assume that the system represents this by keeping a sorted set of URLs that every user can access, with the option for users to supplement their own view separately.

The most naive implementation of a user view would be to create a new set containing all the URLs they want to have visible. One way to optimize this scenario is by using the ImmutableSortedSet type. If we store the global urls in it, we can create views on it with less or more elements which will share memory with the original set.

ImmutableSortedSet<string> MakeUrlView()
{
    //------------------------------------------------------------
    // Imagine a search engine that has indexed millions of URLs.
    // We store them in an immutable sorted set.
    //------------------------------------------------------------

    const int totalUrls = 1_000_000;

    var urls = Enumerable.Range(0, totalUrls)
                          .Select(i => $"https://site{i}.example");

    var globalIndex = ImmutableSortedSet.CreateRange(
        StringComparer.OrdinalIgnoreCase, urls);

    Console.WriteLine($"Global index contains {globalIndex.Count:N0} URLs.\n");

    //------------------------------------------------------------
    // Now a user wants a personalized view:
    // - Add some of their favorite sites.
    // - Hide one of the global sites they don't want to see.
    //
    // Instead of copying a million URLs, we'll use a Builder.
    //------------------------------------------------------------

    var userViewBuilder = globalIndex.ToBuilder();

    userViewBuilder.Add("https://personal.dev/portfolio");
    userViewBuilder.Add("https://blog.user123.net");

    userViewBuilder.Remove("https://site500000.example"); // Hide one site

    // Re-uses memory from the original
    return userViewBuilder.ToImmutable();
}

Benchmarking immutable set vs copying the entire set

Let’s have O3 write us a benchmark in BenchmarkDotNet so that can check how much we gain by using the immutable set compared to copying the entire set. The benchmark will try out different sizes of the initial shared set and a different amount of addition operations.

// Requires:
//   - BenchmarkDotNet (NuGet package)
//   - System.Collections.Immutable (NuGet package)

using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public class ImmutableSortedSetBenchmark
{
    // Allow tuning the size of the global set
    [Params(100_000, 500_000, 1_000_000)]
    public int GlobalSetSize { get; set; }

    // Allow tuning how many personal URLs to add
    [Params(2, 100, 1_000)]
    public int PersonalUrlsCount { get; set; }

    private ImmutableSortedSet<string> _globalIndex;
    private List<string> _personalUrls;

    [GlobalSetup]
    public void Setup()
    {
        // Build the big global index
        var allUrls = Enumerable.Range(0, GlobalSetSize)
                                .Select(i => $"https://site{i}.example");
        _globalIndex = ImmutableSortedSet.CreateRange(
            StringComparer.OrdinalIgnoreCase,
            allUrls);

        // Build personal URL list of the requested length
        _personalUrls = Enumerable.Range(0, PersonalUrlsCount)
                                  .Select(i => $"https://personal.site{i}.example")
                                  .ToList();
    }

    [Benchmark(Baseline = true)]
    public HashSet<string> NewHashSetCopy()
    {
        // Inefficient: copy all into a new HashSet
        var copy = new HashSet<string>(_globalIndex, StringComparer.OrdinalIgnoreCase);
        foreach (var url in _personalUrls)
            copy.Add(url);
        // Remove a middle element to simulate deletion
        copy.Remove($"https://site{GlobalSetSize / 2}.example");
        return copy;
    }

    [Benchmark]
    public ImmutableSortedSet<string> UsingBuilder()
    {
        // Efficient: use the builder
        var builder = _globalIndex.ToBuilder();
        foreach (var url in _personalUrls)
            builder.Add(url);
        builder.Remove($"https://site{GlobalSetSize / 2}.example");
        return builder.ToImmutable();
    }
}

public class Program
{
    public static void Main(string[] args)
        => BenchmarkRunner.Run<ImmutableSortedSetBenchmark>();
}

And the results:

| Method         | GlobalSetSize | PersonalUrlsCount | Mean            | Error           | StdDev          | Ratio | RatioSD | Gen0     | Gen1     | Gen2     | Allocated   | Alloc Ratio |
|--------------- |-------------- |------------------ |----------------:|----------------:|----------------:|------:|--------:|---------:|---------:|---------:|------------:|------------:|
| NewHashSetCopy | 100000        | 2                 |  4,216,144.3 ns |    62,031.32 ns |    58,024.13 ns | 1.000 |    0.02 | 164.0625 | 164.0625 | 164.0625 |  2123.24 KB |       1.000 |
| UsingBuilder   | 100000        | 2                 |        959.0 ns |         8.15 ns |         7.62 ns | 0.000 |    0.00 |   0.1068 |        - |        - |     1.65 KB |       0.001 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 100000        | 100               |  4,287,246.0 ns |    20,877.69 ns |    17,433.81 ns | 1.000 |    0.01 | 164.0625 | 164.0625 | 164.0625 |   2123.2 KB |       1.000 |
| UsingBuilder   | 100000        | 100               |     27,072.2 ns |       410.66 ns |       384.13 ns | 0.006 |    0.00 |   0.3967 |        - |        - |     6.24 KB |       0.003 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 100000        | 1000              |  4,347,954.4 ns |    55,342.24 ns |    51,767.16 ns |  1.00 |    0.02 | 164.0625 | 164.0625 | 164.0625 |   2123.2 KB |        1.00 |
| UsingBuilder   | 100000        | 1000              |    302,560.5 ns |     6,007.40 ns |     5,900.07 ns |  0.07 |    0.00 |   2.9297 |        - |        - |    48.43 KB |        0.02 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 500000        | 2                 | 26,402,802.7 ns |   245,101.09 ns |   217,275.73 ns | 1.000 |    0.01 | 187.5000 | 187.5000 | 187.5000 | 10951.53 KB |       1.000 |
| UsingBuilder   | 500000        | 2                 |      1,055.0 ns |        17.30 ns |        15.33 ns | 0.000 |    0.00 |   0.1163 |        - |        - |      1.8 KB |       0.000 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 500000        | 100               | 26,269,299.8 ns |   209,670.02 ns |   175,083.96 ns | 1.000 |    0.01 | 187.5000 | 187.5000 | 187.5000 | 10951.53 KB |       1.000 |
| UsingBuilder   | 500000        | 100               |     30,507.7 ns |       557.29 ns |       521.29 ns | 0.001 |    0.00 |   0.3967 |        - |        - |     6.39 KB |       0.001 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 500000        | 1000              | 26,866,566.1 ns |   310,470.73 ns |   275,224.22 ns |  1.00 |    0.01 | 187.5000 | 187.5000 | 187.5000 | 10951.53 KB |       1.000 |
| UsingBuilder   | 500000        | 1000              |    338,714.7 ns |     6,720.05 ns |     7,469.32 ns |  0.01 |    0.00 |   2.9297 |        - |        - |    48.58 KB |       0.004 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 1000000       | 2                 | 55,139,674.3 ns | 1,059,349.74 ns | 1,133,492.36 ns | 1.000 |    0.03 | 250.0000 | 250.0000 | 250.0000 | 22710.11 KB |       1.000 |
| UsingBuilder   | 1000000       | 2                 |      1,118.1 ns |        14.74 ns |        13.07 ns | 0.000 |    0.00 |   0.1259 |        - |        - |     1.94 KB |       0.000 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 1000000       | 100               | 55,275,844.2 ns | 1,012,249.49 ns |   946,858.80 ns | 1.000 |    0.02 | 250.0000 | 250.0000 | 250.0000 | 22710.11 KB |       1.000 |
| UsingBuilder   | 1000000       | 100               |     34,245.7 ns |       634.77 ns |       593.77 ns | 0.001 |    0.00 |   0.3662 |        - |        - |     6.53 KB |       0.000 |
|                |               |                   |                 |                 |                 |       |         |          |          |          |             |             |
| NewHashSetCopy | 1000000       | 1000              | 54,216,472.9 ns |   454,195.05 ns |   354,605.62 ns | 1.000 |    0.01 | 250.0000 | 250.0000 | 250.0000 | 22710.11 KB |       1.000 |
| UsingBuilder   | 1000000       | 1000              |    351,159.0 ns |     3,297.98 ns |     2,923.57 ns | 0.006 |    0.00 |   2.9297 |        - |        - |    48.72 KB |       0.002 |

As we would expect, it is much more efficient to build the immutable sets rather than copy all the elements. Both the allocated memory and the CPU time required were two orders of magnitude smaller when building incrementally.

Keeping the history of a set

Another benefit of the incremental way in which immutable sorted sets are built is the fact that we can efficiently keep historical snapshots of the set efficiently. Each snapshot will share memory with the sets from which they were created.

static void StoringSnapshots()
{
    List<ImmutableSortedSet<string>> snapshots = new();
    // Initial dataset
    var version1 = ImmutableSortedSet.Create("Alice", "Bob", "Charlie");
    snapshots.Add(version1);

    // Create a new version by adding a user
    var version2 = version1.Add("Diana");
    snapshots.Add(version2);

    var version3 = version2.Remove("Bob");
    snapshots.Add(version3);

    // Snapshots will only take extra memory for the new nodes
    return snapshots;
}

When not to use an immutable sorted set

Conclusion

Immutable collections in .NET often seem exotic until you run into a workload that makes mutability itself the bottleneck. The immutable sorted set shines precisely in those “shared but changing” scenarios. Because every update is expressed as a new value that structurally shares with its predecessor, you get three concrete wins: