Using ImmutableSortedSet in C# for memory sharing

Intro
Immutable data structures in C# have always seemed very intriguing to me because while they come with many interesting properties, I very rarely find a situation where they would be appropriate to use. Finding one of those situations is exciting enough to motivate me to write an article about it!
What are immutable data structures?
Immutable data structures contain data that can only be set upon creation and cannot be mutated afterwards. In the C# standard library, they are implemented by creating objects which can be created, but which have no mutating properties, members or methods. If a change needs to be made on an immutable data structure, a new object is created with the change applied and the previous objects still remains. Let’s look at the ImmutableSortedSet collection to understand how it works in depth and how we can use it to optimize memory sharing.
The Immutable Sorted Set
On the surface, the immutable sorted set is a data structure that is very close in capabilities to the sorted set collection. It represents a set where members can be:
- Added, removed and queried for membership with a logarithmic time complexity
- Retrieved in a sorted order
The immutability comes with some extra constraints:
- The elements can only be added on the creation of the set
- Adding/removing elements requires the creation of a new set
- Reviewers of your PR will likely need to lookup the documentation to figure out what this structure is and then ask you why you are not just using a hashset
How is an immutable sorted set created?
Creating a set is mostly done via using the builder pattern. This gives you a mutable, not thread-safe object where you can do modifications to create the data inside your structure.
You get the actual structure by invoking the ToImmutable
method on the builder.
public static ImmutableSortedSet<string> CreateFruitsSet()
{
// Create a builder
var builder = ImmutableSortedSet.CreateBuilder<string>();
// Add elements while mutable
builder.Add("apple");
builder.Add("banana");
builder.Add("cherry");
// Finalize the set - now it's immutable
var fruitsSet = builder.ToImmutable();
return fruitsSet;
}
How do I make changes to it?
When we add an element, the old fruit set is not modified but instead we get a new set which also contains the new element. It works the same for any mutating operation.
ImmutableSortedSet<string> newFruitSet = fruitSet.Add("new fruit");
Structural Sharing
The pattern above looks scary if you care about performance. If I have a set with 1M elements, do I need to re-create the entirety of it on every single modification? For n
elements, that would explode the time complexity of insertion to O(n * logn)
and the memory complexity to O(n*n)
.
Fortunately, unlike the string
type which is also immutable, the answer here is no. The immutable sorted set uses a pattern called structural sharing in order to avoid this.
Structural sharing refers to designing the data structure in such a way so that when a new instance is created by modifying an existing one, the new instance shares as much memory as possible with the original instance. To understand that properly, let’s visit the details of the immutable sorted set and see how immutability makes this possible.
ImmutableSortedSet - Behind the scenes
If we look at the source code for the implementation of the ImmutableSortedSetSortedSet
part of the name. The Immutable
part is the magic that allows for structural sharing.
Let’s take an insertion to an AVL tree of integers as an example. The sapling below has an ambition to grow:
20
/ \
10 30
/ \ / \
5 15 25 35
Let’s suppose we add the element ‘12’ to it. We will get the new tree:
20'
/ \
10' 30
/ \ / \
5 15' 25 35
/
12'
Let’s compare the two trees
- The new tree has to have a new root (the original root is immutable so it cannot be updated)
- For every node on the path to the new node, a new node has to be created (for the same reason as above)
- The right sub-tree (30, 25, 35) and the node for “5” are entirely unchanged
Since we know that the nodes from the original tree are immutable, all the unchanged sub-trees can be re-used! So in the end, the asymptotic complexity of modifications is the same as for the mutable sorted set.
Uses
Batch Operations
When a mutable collection needs to be shared between multiple threads, there are usually two options to maintain developer sanity.
- Put it under a lock to serialize all operations. But a coarse lock can become a bottleneck if it is in the hot path and there’s contention.
- Use a thread-safe collection. But then you spend a long time pondering about all the different ways in which its state may become unpredictable
One of the more challenging issues that arises in concurrent code is doing batch operations atomically. For example, in C# we can use ConcurrentDictionary
collection as a thread-safe set to avoid using a lock.
However, it only allows us to add/remove/update one element at a time. If we need to batch these operations together, we either need to use a single lock for modifies and reads (which defeats the purpose of the concurrent dictionary) or we need to create a new dictionary, copy all the elements and atomically swap it. The immutable sorted set is great for the latter scenario because it is optimized for creating copies with just the new elements.
Shared global state
When designing systems with multiple concurrent consumers, it is often the case that the state in the system is comprised of two parts, a global shared state and isolated state per consumer. As an example, let’s consider a search engine serving different users. The application maintains a global set of indexed URLs that all users can query. There are two kinds of URLs: standard URLs, which are part of the global index available to everyone, and user-submitted URLs, which users can choose to add to their personal search experience. Let’s also assume that the system represents this by keeping a sorted set of URLs that every user can access, with the option for users to supplement their own view separately.
The most naive implementation of a user view would be to create a new set containing all the URLs they want to have visible. One way to optimize this scenario is by using the ImmutableSortedSet
type. If we store the global urls in it, we can create views on it with less or more elements which will share memory with the original set.
ImmutableSortedSet<string> MakeUrlView()
{
//------------------------------------------------------------
// Imagine a search engine that has indexed millions of URLs.
// We store them in an immutable sorted set.
//------------------------------------------------------------
const int totalUrls = 1_000_000;
var urls = Enumerable.Range(0, totalUrls)
.Select(i => $"https://site{i}.example");
var globalIndex = ImmutableSortedSet.CreateRange(
StringComparer.OrdinalIgnoreCase, urls);
Console.WriteLine($"Global index contains {globalIndex.Count:N0} URLs.\n");
//------------------------------------------------------------
// Now a user wants a personalized view:
// - Add some of their favorite sites.
// - Hide one of the global sites they don't want to see.
//
// Instead of copying a million URLs, we'll use a Builder.
//------------------------------------------------------------
var userViewBuilder = globalIndex.ToBuilder();
userViewBuilder.Add("https://personal.dev/portfolio");
userViewBuilder.Add("https://blog.user123.net");
userViewBuilder.Remove("https://site500000.example"); // Hide one site
// Re-uses memory from the original
return userViewBuilder.ToImmutable();
}
Benchmarking immutable set vs copying the entire set
Let’s have O3 write us a benchmark in BenchmarkDotNet so that can check how much we gain by using the immutable set compared to copying the entire set. The benchmark will try out different sizes of the initial shared set and a different amount of addition operations.
// Requires:
// - BenchmarkDotNet (NuGet package)
// - System.Collections.Immutable (NuGet package)
using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class ImmutableSortedSetBenchmark
{
// Allow tuning the size of the global set
[Params(100_000, 500_000, 1_000_000)]
public int GlobalSetSize { get; set; }
// Allow tuning how many personal URLs to add
[Params(2, 100, 1_000)]
public int PersonalUrlsCount { get; set; }
private ImmutableSortedSet<string> _globalIndex;
private List<string> _personalUrls;
[GlobalSetup]
public void Setup()
{
// Build the big global index
var allUrls = Enumerable.Range(0, GlobalSetSize)
.Select(i => $"https://site{i}.example");
_globalIndex = ImmutableSortedSet.CreateRange(
StringComparer.OrdinalIgnoreCase,
allUrls);
// Build personal URL list of the requested length
_personalUrls = Enumerable.Range(0, PersonalUrlsCount)
.Select(i => $"https://personal.site{i}.example")
.ToList();
}
[Benchmark(Baseline = true)]
public HashSet<string> NewHashSetCopy()
{
// Inefficient: copy all into a new HashSet
var copy = new HashSet<string>(_globalIndex, StringComparer.OrdinalIgnoreCase);
foreach (var url in _personalUrls)
copy.Add(url);
// Remove a middle element to simulate deletion
copy.Remove($"https://site{GlobalSetSize / 2}.example");
return copy;
}
[Benchmark]
public ImmutableSortedSet<string> UsingBuilder()
{
// Efficient: use the builder
var builder = _globalIndex.ToBuilder();
foreach (var url in _personalUrls)
builder.Add(url);
builder.Remove($"https://site{GlobalSetSize / 2}.example");
return builder.ToImmutable();
}
}
public class Program
{
public static void Main(string[] args)
=> BenchmarkRunner.Run<ImmutableSortedSetBenchmark>();
}
And the results:
| Method | GlobalSetSize | PersonalUrlsCount | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|--------------- |-------------- |------------------ |----------------:|----------------:|----------------:|------:|--------:|---------:|---------:|---------:|------------:|------------:|
| NewHashSetCopy | 100000 | 2 | 4,216,144.3 ns | 62,031.32 ns | 58,024.13 ns | 1.000 | 0.02 | 164.0625 | 164.0625 | 164.0625 | 2123.24 KB | 1.000 |
| UsingBuilder | 100000 | 2 | 959.0 ns | 8.15 ns | 7.62 ns | 0.000 | 0.00 | 0.1068 | - | - | 1.65 KB | 0.001 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 100000 | 100 | 4,287,246.0 ns | 20,877.69 ns | 17,433.81 ns | 1.000 | 0.01 | 164.0625 | 164.0625 | 164.0625 | 2123.2 KB | 1.000 |
| UsingBuilder | 100000 | 100 | 27,072.2 ns | 410.66 ns | 384.13 ns | 0.006 | 0.00 | 0.3967 | - | - | 6.24 KB | 0.003 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 100000 | 1000 | 4,347,954.4 ns | 55,342.24 ns | 51,767.16 ns | 1.00 | 0.02 | 164.0625 | 164.0625 | 164.0625 | 2123.2 KB | 1.00 |
| UsingBuilder | 100000 | 1000 | 302,560.5 ns | 6,007.40 ns | 5,900.07 ns | 0.07 | 0.00 | 2.9297 | - | - | 48.43 KB | 0.02 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 500000 | 2 | 26,402,802.7 ns | 245,101.09 ns | 217,275.73 ns | 1.000 | 0.01 | 187.5000 | 187.5000 | 187.5000 | 10951.53 KB | 1.000 |
| UsingBuilder | 500000 | 2 | 1,055.0 ns | 17.30 ns | 15.33 ns | 0.000 | 0.00 | 0.1163 | - | - | 1.8 KB | 0.000 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 500000 | 100 | 26,269,299.8 ns | 209,670.02 ns | 175,083.96 ns | 1.000 | 0.01 | 187.5000 | 187.5000 | 187.5000 | 10951.53 KB | 1.000 |
| UsingBuilder | 500000 | 100 | 30,507.7 ns | 557.29 ns | 521.29 ns | 0.001 | 0.00 | 0.3967 | - | - | 6.39 KB | 0.001 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 500000 | 1000 | 26,866,566.1 ns | 310,470.73 ns | 275,224.22 ns | 1.00 | 0.01 | 187.5000 | 187.5000 | 187.5000 | 10951.53 KB | 1.000 |
| UsingBuilder | 500000 | 1000 | 338,714.7 ns | 6,720.05 ns | 7,469.32 ns | 0.01 | 0.00 | 2.9297 | - | - | 48.58 KB | 0.004 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 1000000 | 2 | 55,139,674.3 ns | 1,059,349.74 ns | 1,133,492.36 ns | 1.000 | 0.03 | 250.0000 | 250.0000 | 250.0000 | 22710.11 KB | 1.000 |
| UsingBuilder | 1000000 | 2 | 1,118.1 ns | 14.74 ns | 13.07 ns | 0.000 | 0.00 | 0.1259 | - | - | 1.94 KB | 0.000 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 1000000 | 100 | 55,275,844.2 ns | 1,012,249.49 ns | 946,858.80 ns | 1.000 | 0.02 | 250.0000 | 250.0000 | 250.0000 | 22710.11 KB | 1.000 |
| UsingBuilder | 1000000 | 100 | 34,245.7 ns | 634.77 ns | 593.77 ns | 0.001 | 0.00 | 0.3662 | - | - | 6.53 KB | 0.000 |
| | | | | | | | | | | | | |
| NewHashSetCopy | 1000000 | 1000 | 54,216,472.9 ns | 454,195.05 ns | 354,605.62 ns | 1.000 | 0.01 | 250.0000 | 250.0000 | 250.0000 | 22710.11 KB | 1.000 |
| UsingBuilder | 1000000 | 1000 | 351,159.0 ns | 3,297.98 ns | 2,923.57 ns | 0.006 | 0.00 | 2.9297 | - | - | 48.72 KB | 0.002 |
As we would expect, it is much more efficient to build the immutable sets rather than copy all the elements. Both the allocated memory and the CPU time required were two orders of magnitude smaller when building incrementally.
Keeping the history of a set
Another benefit of the incremental way in which immutable sorted sets are built is the fact that we can efficiently keep historical snapshots of the set efficiently. Each snapshot will share memory with the sets from which they were created.
static void StoringSnapshots()
{
List<ImmutableSortedSet<string>> snapshots = new();
// Initial dataset
var version1 = ImmutableSortedSet.Create("Alice", "Bob", "Charlie");
snapshots.Add(version1);
// Create a new version by adding a user
var version2 = version1.Add("Diana");
snapshots.Add(version2);
var version3 = version2.Remove("Bob");
snapshots.Add(version3);
// Snapshots will only take extra memory for the new nodes
return snapshots;
}
When not to use an immutable sorted set
- An AVL tree has extra overhead compared to a hashset lookup because of having to traverse the binary tree to find the correct node. Especially if the values stored have an expensive comparer, this cost has to be paid for every single node from the root to the destination. Even more, the data stored in an AVL tree will likely have a much worse cache locality than when stored in a hashset.
- If the data you are storing is not shared or persisted, then immutability adds overhead without giving any advantages.
- If the data changes very infrequently, an immutable set may ironically not be worth it. The AVL tree structure comes with an extra memory overhead because of having to keep pointers for nodes and their connections. This overhead needs to be weighed compared to the cost of having to make changes to the sets.
Conclusion
Immutable collections in .NET often seem exotic until you run into a workload that makes mutability itself the bottleneck. The immutable sorted set shines precisely in those “shared but changing” scenarios. Because every update is expressed as a new value that structurally shares with its predecessor, you get three concrete wins:
- Predictability under concurrency – readers always work with a stable snapshot, so you can drop most locks.
- Much lower copy-cost – the AVL-tree nodes that don’t change are literally the same objects, keeping allocation and GC pressure tiny even when you mutate frequently.
- Free history – snapshots come “for free,” enabling time-travel debugging, audit logs, or user-specific views without full duplication