In my last post I was mentioning that interoperability can be improved just paying some attention and using also new available APIs.
I will start with writing native memory avoiding any measurement that may involve the memory allocator. The primitive used here is Marshal.AllocTaskMem but it could be some memory allocated from a native library instead.
The goal is comparing three different ways to write some native memory using the awesome Benchmark.NET library. The unsafe pointers in the constructor are meant to avoid the "ToPointer" method and the potential impact on the measurements.
public class TestProducer : IDisposable
{
public const int TotalSize = 1024 * 1024 * 1024;
private int Elements = TotalSize / 8;
public IntPtr Shared { get; }
private unsafe Int64* pu;
private unsafe Int64* ps;
public TestProducer()
{
Shared = Marshal.AllocCoTaskMem(TotalSize);
unsafe { pu = (Int64*)Shared.ToPointer(); }
unsafe { ps = (Int64*)Shared.ToPointer(); }
}
// ...
public void Dispose() => Marshal.FreeCoTaskMem(Shared);
}
The first and most obvious test is the one using the main .NET APIs: Marshal.WriteXXX writing some memory with a loop
[Benchmark] public void TestProducerManaged()
{
var pm = Shared;
for (int i = 0; i < Elements; i++)
{
Marshal.WriteInt64(pm, 5);
pm += 8;
}
}
In that loop, an arbitrary Int64 value (the 5 integer) is written directly in memory, manually updating the raw pointer. While I could use IntPtr instead of the raw pointer, I really wanted to just measure the WriteInt64 method and nothing else.
Another obvious strategy to write the native memory is doing everything unsafe, which is of course the most performant way to do it.
[Benchmark] public void TestProducerUnsafe() {
unsafe
{
var p = pu;
for (int i = 0; i < Elements; i++)
{
*p = 9;
p++;
}
} }
Again, a very simple loop to update the memory. The benchmark is run at 64 bit, therefore the two executions are equivalent.
Finally, the third and final way: using the Span ref struct:
[Benchmark] public void TestProducerSpan()
{
unsafe
{
var span = new Span<Int64>(ps, Elements);
span.Fill(100);
} }
This method has the great advantage to be very readable and while it still need to be inside the an unsafe region because of the raw pointer, it is very neat.
Those tests are very simple, but the outcome of the benchmark tells a very important story:
Method | Mean | Error | StdDev | -------------------- |----------:|---------:|---------:|
TestProducerManaged | 300.87 ms | 3.445 ms | 3.223 ms |
TestProducerUnsafe | 71.03 ms | 1.411 ms | 1.320 ms |
TestProducerSpan | 71.48 ms | 1.367 ms | 1.404 ms |
The lesson number 1 is to stay away from the Marshal.WriteXXX methods. If you have to occasionally write some value, it's not a big deal, but as soon as the operation is repeated multiple time, you definitely don't want to lose one order of magnitude of performance.
The lesson number 2 is that we can avoid using unsafe even when reading and writing the native memory. The span solution is more elegant and ensure we don't randomly poke unallocated memory, given we pass the right data in the Span constructor, namely the correct unsafe pointer and its length.
The Span<T> API is important not only for pure managed code, where we can avoid memory copies, but also with native memory. The power of "ref structs" like Span<T> is to be guaranteed to be allocated on the stack, not impacting the heap/GC at all. The best way to think at Span<T> is a view over a contiguous region of memory. In other words allocate or take any ownership of the memory being observed.
When using Span<T> in conjunction with managed objects, the compiler does all its best to avoid the memory goes away before you access it via Span<T>. But in the context of the native memory, there is still the risk the memory to be deallocated too early: after all we created a new Span by mean of a native pointer inside an unsafe region. Anyway, the risks are minimal in my opinion and we take less risks in comparison of the pure unsafe code.
A slightly different test is where you want read and write the native memory.
public class TestReadWrite : IDisposable { public const int TotalSize = 1024 * 1024 * 1024;
private int Elements = TotalSize / 8;
public IntPtr Shared { get; }
public unsafe long* ptr;
public TestReadWrite()
{
Shared = Marshal.AllocCoTaskMem(TotalSize);
unsafe { ptr = (long*)Shared.ToPointer(); }
}
public void Dispose()
{ Marshal.FreeCoTaskMem(Shared); }
//...
}
The setup of the test is very similar, so let's go straight to the managed code, which has nothing really special, just the ReadInt64 method before writing.
[Benchmark] public void TestManaged()
{
var pm = Shared;
for (int i = 0; i < Elements; i++)
{
var value = Marshal.ReadInt64(pm);
Marshal.WriteInt64(pm, value + 1);
pm += 8;
} }
The pure unsafe code is also straightforward.
[Benchmark] public void TestUnsafe() {
unsafe
{
var pm = ptr;
for (int i = 0; i < Elements; i++)
{
var value = *pm;
*pm = value + 1;
pm++;
}
} }
And finally the code using Span<T> where the "++" operator does both read and write.
[Benchmark] public void TestSpan()
{
Span<long> span;
unsafe
{
span = new Span<Int64>(ptr, Elements);
}
for (int i = 0; i < span.Length; i++)
{
span[i]++;
} }
In a more realistic scenario, you may want to get the value out of the Span and use it before re-writing it at the same location. But this may give you a small perf hit. The solution is to use ref local:
[Benchmark] public void TestSpanRef()
{
Span<long> span;
unsafe
{
span = new Span<Int64>(ptr, Elements);
}
for (int i = 0; i < span.Length; i++)
{
ref long value = ref span[i];
value++;
} }
As you now may expect the performance profile of reads and writes are similar:
Method | Mean | Error | StdDev | ------------ |----------:|---------:|---------:|
TestManaged | 475.58 ms | 4.925 ms | 4.607 ms |
TestUnsafe | 82.88 ms | 1.342 ms | 1.121 ms |
TestSpan | 89.39 ms | 1.715 ms | 1.605 ms |
TestSpanRef | 89.84 ms | 1.702 ms | 1.821 ms |
At this time these are good but old tools, because .NET 5 is going to introduce new interesting stuff to the interoperability scenarios. But this is something that I want to cover with another post.
Privacy | Legal Copyright © Raffaele Rialdi 2009, Senior Software Developer, Consultant, p.iva IT01741850992, hosted by Vevy Europe Advanced Technologies Division. Site created by Raffaele Rialdi, 2009 - 2015 Hosted by: © 2008-2015 Vevy Europe S.p.A. - via Semeria, 16A - 16131 Genova - Italia - P.IVA 00269300109