jemalloc.NET: A native memory manager for .NET

jemalloc.NET is a .NET API over the jemalloc native memory allocator and provides .NET applications with efficient data structures backed by native memory for large scale in-memory computation scenarios. jemalloc is "a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support" that is widely used in the industry, particularly in applications that must scale and utilize large amounts of memory. In addition to its fragmentation and concurrency optimizations, jemalloc provides an array of developer options for debugging, monitoring and tuning allocations that make it a great choice for use in developing memory-intensive applications.

The jemalloc.NET project provides:

A low-level .NET API over the native jemalloc API functions like je_malloc, je_calloc, je_free, je_mallctl...
A safety-focused high-level .NET API providing data structures like arrays backed by native memory allocated using jemalloc together with management features like reference counting.
A benchmark CLI program: jembench which uses the excellent BenchmarkDotNet library for easy and accurate benchmarking operations on native data structures vs managed objects using different parameters.

Data structures provided by the high-level API are more efficient than managed .NET arrays and objects at the scale of millions of elements, and memory allocation is much more resistant to fragmentation, while still providing necessary safety features like array bounds checking. Large .NET arrays must be allocated on the Large Object Heap and are not relocatable which leads to fragmentation and lower performance. For example in the following jembench benchmark on my laptop, simply filling an array is more or less the same across different kinds of memory and scales linearly depending on the size of the array, but allocating and filling a UInt64[] managed array of size 10000000 and 100000000 is more than 2x slower than using an equivalent native array provided by jemalloc.NET:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531251 Hz, Resolution=395.0616 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Job=JemBenchmark  Jit=RyuJit  Platform=X64  
Runtime=Core  AllowVeryLargeObjects=True  Toolchain=InProcessToolchain  
RunStrategy=Throughput

Method	Parameter	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
'Fill a managed array with a single value.'	10000000	9.059 ms	0.1745 ms	0.4777 ms	8.913 ms	-	-	-	208 B
'Fill a SafeArray on the system unmanaged heap with a single value.'	10000000	8.715 ms	0.1682 ms	0.2466 ms	8.623 ms	-	-	-	208 B
'Create and Fill a managed array with a single value.'	10000000	32.867 ms	0.9156 ms	1.3420 ms	32.175 ms	142.8571	142.8571	142.8571	80000769 B
'Create and Fill a SafeArray on the system unmanaged heap with a single value.'	10000000	13.809 ms	0.2679 ms	0.2506 ms	13.727 ms	-	-	-	192 B
'Fill a managed array with a single value.'	100000000	90.326 ms	1.7718 ms	2.4253 ms	89.468 ms	-	-	-	208 B
'Fill a SafeArray on the system unmanaged heap with a single value.'	100000000	88.377 ms	0.9775 ms	0.8665 ms	88.505 ms	-	-	-	208 B
'Create and Fill a managed array with a single value.'	100000000	310.880 ms	5.9732 ms	8.1762 ms	306.952 ms	125.0000	125.0000	125.0000	800000624 B
'Create and Fill a SafeArray on the system unmanaged heap with a single value.'	100000000	137.288 ms	0.9710 ms	0.9083 ms	137.111 ms	-	-	-	192 B

You can run this benchmark with the command jembench array --fill 10000000 100000000 -l -u. In this case we see that using the managed array of size 10 million elements allocated 800 MB on the managed heap while using the native array did not cause any allocations on the managed heap for the array data. Avoiding the managed heap for very large but simple data structures like arrays is a key optimizarion for apps that do large-scale in-memory computation.

Managed .NET arays are also limited to Int32 indexing and a maximum size of about 2.15 billion elements. jemalloc.NET provides huge arrays through the HugeArray<T> class which allows you to access all available memory as a flat contiguous buffer using array semantics. In the next benchmark jembench hugearray --fill -i 4200000000:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531251 Hz, Resolution=395.0616 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Job=JemBenchmark  Jit=RyuJit  Platform=X64  
Runtime=Core  AllowVeryLargeObjects=True  Toolchain=InProcessToolchain  
RunStrategy=ColdStart  TargetCount=7  WarmupCount=-1

Method	Parameter	Mean	Error	StdDev	Allocated
'Fill a managed array with the maximum size [2146435071] with a single value.'	4200000000	3.177 s	0.1390 s	0.0617 s	8585740456 B
'Fill a HugeArray on the system unmanaged heap with a single value.'	4200000000	4.029 s	3.2233 s	1.4312 s	0 B

an Int32[] of maximum size can be allocated and filled in 3.2s. This array consumes 8.6GB on the managed heap. But a jemalloc.NET HugeArray<Int32> of nearly double the size at 4.2 billion elements can be allocated in only 4 s and again consumes no memory on the managed heap. The only limit on the size of a HugeArray<T> is the available system memory.

Perhaps the killer feature of the recently introduced Span<T> class in .NET is its ability to efficently zero-copy re-interpret numeric data structures (Int32, Int64 and their siblings) into other structures like the Vector<T> SIMD-enabled data types introduced in 2016. Vector<T> types are special in that the .NET RyuJIT JIT compiler can compile operations on Vectors to use SIMD instructions like SSE, SSE2, and AVX for parallelizing operations on data on a single CPU core.

Using the SIMD-enabled SafeBuffer<T>.VectorMultiply(n) method provided by the jemalloc.NET API yields a more than 12x speedup for a simple in-place multiplication of a UInt64[] array of 10 million elements, compared to the unoptimized linear approach, allowing the operation to complete in 60 ms:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531251 Hz, Resolution=395.0616 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Job=JemBenchmark  Jit=RyuJit  Platform=X64  
Runtime=Core  AllowVeryLargeObjects=True  Toolchain=InProcessToolchain  
RunStrategy=Throughput

Method	Parameter	Mean	Error	StdDev	Gen 0	Gen 1	Allocated
'Multiply all values of a managed array with a single value.'	10000000	761.10 ms	10.367 ms	9.190 ms	254250.0000	62.5000	800000304 B
'Vector multiply all values of a native array with a single value.'	10000000	59.23 ms	1.170 ms	1.149 ms	-	-	360 B

For huge arrays of UInt16[] we see similar speedups:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531251 Hz, Resolution=395.0616 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Job=JemBenchmark  Jit=RyuJit  Platform=X64  
Runtime=Core  AllowVeryLargeObjects=True  Toolchain=InProcessToolchain  
RunStrategy=ColdStart  TargetCount=1

Method	Parameter	Mean	Error	Gen 0	Gen 1	Allocated
'Multiply all values of a managed array with the maximum size [2146435071] with a single value.'	4096000000	34.25 s	NA	16375000.0000	3000.0000	51514441704 B
'Vector multiply all values of a native array with a single value.'	4096000000	12.06 s	NA	-	-	0 B

For a huge array with 4.1 billion UInt16 values it takes 12 seconds to do a SIMD-enabled multiplication operation on all the elements of the array. This is still 3x the performance of doing the same non-vectorized operation on a managed array of half the size.

Inside a .NET application, jemalloc.NET native arrays and data structures can be straightforwardly accessed by native libraries without the need to make additional copies or allocations. The goal of the jemalloc.NET project is to make accessible to .NET the kind of big-data in-memory numeric, scientific and other computing that typically would require coding in a low=level language like C/C++ or assembler.

Installation

Requirements

Currently only runs on 64bit Windows; support for Linux 64bit and other 64bit platforms supported by .NET Core will be added soon.

Windows

The latest .NET Core 2.0 x64 runtime
The latest version of the Microsoft Visual C++ Redistributable for Visual Studio 2017

Steps

Grab the latest release from the releases page and unzip to a folder. Type jembench to run the benchmark CLI program and you should see the program version and options printed. NuGet packagees can be found in x64\Release. The API library assembly files themselves are in x64\Release\netstandard2.0

Note that if using jemalloc.NET in your own projects you must put the native jemallocd.dll library somewhere where it can be located by the .NET runtime. You can create a post-build step to copy it to the output folder of your project or put it somewhere on your %PATH%.

Building from source

Currently build instuctions are only provided for Visual Studio 2017 on Windows but instructions for building on Linux will also be provided. jemalloc.NET is a 64-bit library only.

Requirements

Visual Studio 2017 15.5 with at least the following components:

C# 7.2 compiler
.NET Core 2.0 SDK x64
MSVC 2017 compiler toolset v141 or higher
Windows 10 SDK for Desktop C++ version 10.0.10.15603 or higher. Note that if you only have higher versions installed you will need to retarget the jemalloc MSVC project to your SDK version from Visual Studio.

Per the instructions for building the native jemalloc library for Windows, you will also need Cygwin (32- or 64-bit )with the following packages:

autoconf
autogen
gcc
gawk
grep
sed

Cygwin tools aren't actually used for compiling jemalloc but for generating the header files. jemalloc on Windows is built using MSVC.

Steps

You must add the .NET Core NuGet feed on MyGet and also the CoreFxLab feed to your NuGet package sources. You can do this in Visual Studio 2017 from Tools->Options->NuGet Package Manager menu item.
Clone the project: git clone https://github.com/alllisterb/jemalloc.NET and init the submodules: git submodule update --init --recursive
Open a x64 Native Tools Command Prompt for VS 2017 and temporarily add Cygwin\bin to the PATH e.g set PATH=%PATH%;C:\cygwin\bin. Switch to the jemalloc subdirectory in your jemalloc.NET solution dir and run sh -c "CC=cl ./autogen.sh". This will generate some files in the jemalloc subdirectory and only needs to be done once.
From a Visual Studio 2017 Developer Command prompt run build.cmd. Alternatively you can load the solution in Visual Studio and using the "Benchmark" solution configuration build the entire solution.
The solution should build without errors.
Run jembench from the solution folder to see the project version and help.

Usage

jembench CLI

Examples:

jembench hugearray -l -u --math --cold-start -t 3 4096000000 Benchmark math operations on HugeArray<UInt64> arrays of size 4096000000 without benchmark warmup and only using 3 iterations of the target methods. Benchmarks on huge arrays can be lengthy so you should carefully choose the benchmark parameters affecting how long you want the benchmark to run,

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
jemalloc @ adf5969		jemalloc @ adf5969
jemalloc.Api		jemalloc.Api
jemalloc.Benchmarks		jemalloc.Benchmarks
jemalloc.Bindings		jemalloc.Bindings
jemalloc.Buffers		jemalloc.Buffers
jemalloc.Cli		jemalloc.Cli
jemalloc.Tests		jemalloc.Tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build.cmd		build.cmd
jemalloc.NET.sln		jemalloc.NET.sln
jembench.cmd		jembench.cmd

License

corefan/jemalloc.NET

Folders and files

Latest commit

History

Repository files navigation

jemalloc.NET: A native memory manager for .NET

Installation

Requirements

Windows

Steps

Building from source

Requirements

Steps

Usage

jembench CLI

About

Resources

License

Stars

Watchers

Forks

Languages