Skip to content

billwillman/SimdJsonSharp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimdJsonSharp: Parsing gigabytes of JSON per second

C# version of Daniel Lemire's SimdJson fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions including AVX2 (C# version uses System.Runtime.Intrinsics API).

UPD: Now it's also available as a set of pinvokes on top of native lib via NS2.0 library, thus there are two implementations:

  1. Fully managed netcoreapp3.0 library (100% port from C to C#)
  2. netstandard2.0 library with native lib (autogenerated bindings for C)

Benchmarks

The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.

1. Parse doubles

Open canada.json and parse all coordinates as System.Double:

|          Method |     fileName |    fileSize |      Mean | Ratio |
|---------------- |------------- |-------------|----------:|------:|
|        SimdJson |  canada.json | 2,251.05 Kb |  4,733 ms |  1.00 |
|  Utf8JsonReader |  canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
|         JsonNet |  canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
|    SpanJsonUtf8 |  canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |

2. Count all tokens

|            Method |           fileName |    fileSize |         Mean | Ratio |
|------------------ |------------------- |------------ |-------------:|------:|
|          SimdJson | apache_builds.json |   127.28 Kb |     99.28 us |  1.00 |
|    Utf8JsonReader | apache_builds.json |   127.28 Kb |    226.42 us |  2.28 |
|           JsonNet | apache_builds.json |   127.28 Kb |    461.30 us |  4.64 |
|      SpanJsonUtf8 | apache_builds.json |   127.28 Kb |    168.08 us |  1.69 |
|                   |                    |             |              |       |
|          SimdJson |        canada.json | 2,251.05 Kb |  4,494.44 us |  1.00 |
|    Utf8JsonReader |        canada.json | 2,251.05 Kb |  6,308.01 us |  1.40 |
|           JsonNet |        canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
|      SpanJsonUtf8 |        canada.json | 2,251.05 Kb |  6,679.82 us |  1.49 |
|                   |                    |             |              |       |
|          SimdJson |  citm_catalog.json | 1,727.20 Kb |  1,572.78 us |  1.00 |
|    Utf8JsonReader |  citm_catalog.json | 1,727.20 Kb |  3,786.10 us |  2.41 |
|           JsonNet |  citm_catalog.json | 1,727.20 Kb |  5,903.38 us |  3.75 |
|      SpanJsonUtf8 |  citm_catalog.json | 1,727.20 Kb |  3,021.13 us |  1.92 |
|                   |                    |             |              |       |
|          SimdJson | github_events.json |    65.13 Kb |     46.01 us |  1.00 |
|    Utf8JsonReader | github_events.json |    65.13 Kb |    113.80 us |  2.47 |
|           JsonNet | github_events.json |    65.13 Kb |    214.01 us |  4.65 |
|      SpanJsonUtf8 | github_events.json |    65.13 Kb |     89.09 us |  1.94 |
|                   |                    |             |              |       |
|          SimdJson |     gsoc-2018.json | 3,327.83 Kb |  2,209.42 us |  1.00 |
|    Utf8JsonReader |     gsoc-2018.json | 3,327.83 Kb |  4,010.10 us |  1.82 |
|           JsonNet |     gsoc-2018.json | 3,327.83 Kb |  6,729.44 us |  3.05 |
|      SpanJsonUtf8 |     gsoc-2018.json | 3,327.83 Kb |  2,759.59 us |  1.25 |
|                   |                    |             |              |       |
|          SimdJson |   instruments.json |   220.35 Kb |    257.78 us |  1.00 |
|    Utf8JsonReader |   instruments.json |   220.35 Kb |    594.22 us |  2.31 |
|           JsonNet |   instruments.json |   220.35 Kb |    980.42 us |  3.80 |
|      SpanJsonUtf8 |   instruments.json |   220.35 Kb |    409.47 us |  1.59 |
|                   |                    |             |              |       |
|          SimdJson |      truenull.json |    12.00 Kb |  16,032.6 ns |  1.00 |
|    Utf8JsonReader |      truenull.json |    12.00 Kb |  58,365.2 ns |  3.64 |
|           JsonNet |      truenull.json |    12.00 Kb |  60,977.3 ns |  3.80 |
|      SpanJsonUtf8 |      truenull.json |    12.00 Kb |  24,069.2 ns |  1.50 |

3.Json minification:

|                Method |           fileName |    fileSize |         Mean | Ratio |
|---------------------- |------------------- |------------ |-------------:|------:|
|  SimdJsonNoValidation | apache_builds.json |   127.28 Kb |     186.8 us |  1.00 |
|              SimdJson | apache_builds.json |   127.28 Kb |     262.5 us |  1.41 |
|               JsonNet | apache_builds.json |   127.28 Kb |   1,802.6 us |  9.65 |
|                       |                    |             |              |       |
|  SimdJsonNoValidation |        canada.json | 2,251.05 Kb |   4,130.7 us |  1.00 |
|              SimdJson |        canada.json | 2,251.05 Kb |   7,940.7 us |  1.92 |
|               JsonNet |        canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
|                       |                    |             |              |       |
|  SimdJsonNoValidation |  citm_catalog.json | 1,727.20 Kb |   2,346.9 us |  1.00 |
|              SimdJson |  citm_catalog.json | 1,727.20 Kb |   4,064.0 us |  1.75 |
|               JsonNet |  citm_catalog.json | 1,727.20 Kb |  34,831.0 us | 14.84 |

Usage

The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.

Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).

dotnet add package SimdJsonSharp.Bindings
or
dotnet add package SimdJsonSharp.Managed

The following sample parses a file and iterate numeric tokens

byte[] bytes = File.ReadAllBytes(somefile);
fixed (byte* ptr = bytes) // pin bytes while we are working on them
using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
using (var iterator = doc.CreateIterator())
{
    while (iterator.MoveForward())
    {
        if (iterator.GetTokenType() == JsonTokenType.Number)
            Console.WriteLine("integer: " + iterator.GetInteger());
    }
}

UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN

As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0

Also it's possible to just validate JSON or minify it (remove whitespaces, etc):

string someJson = ...;
string minifiedJson = SimdJson.MinifyJson(someJson);

Requirements

  • AVX2 enabled CPU

About

C# version of lemire's SimdJson

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 52.5%
  • C++ 47.5%