Skip to content

jmfn/protobuf-net-data

 
 

Repository files navigation

Protocol Buffers DataReader Extensions for .NET

A library for serializing ADO.NET DataTables and DataReaders into an efficient, portable binary format. Uses Marc Gravell's Google Protocol Buffers library, protobuf-net.

Install-Package protobuf-net-data

The latest protobuf-net-data is now available in NuGet here, or as a zip from the Google Code downloads page.

Usage examples

Writing a DataTable to a file:

DataTable dt = ...;

using (Stream stream = File.OpenWrite("C:\foo.dat"))
using (IDataReader reader = dt.CreateDataReader())
{
    DataSerializer.Serialize(stream, reader);
}

Loading a DataTable from a file:

DataTable dt = new DataTable();
    
using (Stream stream = File.OpenRead("C:\foo.dat"))
using (IDataReader reader = DataSerializer.Deserialize(stream))
{
    dt.Load(reader);
}

Serializing an IDataReader into a buffer... and back again:

Stream buffer = new MemoryStream();

// Serialize SQL results to a buffer
using (var command = new SqlCommand("SELECT * FROM ..."))
using (var reader = command.ExecuteReader())
    DataSerializer.Serialize(buffer, reader);

// Read them back
buffer.Seek(0, SeekOrigin.Begin);
using (var reader = DataSerializer.Deserialize(buffer))
{
    while (reader.Read())
    {
        ...
    }
}

Supported Data Types

DataSerializer supports all the types exposed by IDataReader:

  • Boolean
  • Byte
  • Byte[]
  • Char
  • Char[]
  • DateTime
  • Decimal
  • Double
  • Float
  • Guid
  • Int16
  • Int32
  • Int64
  • String

Note that no distinction is made between null and zero-length arrays; both will be deserialized as null.

Custom serialization options

This library supports custom serialization options for data tables/data readers.

var options = new ProtoDataWriterOptions
{
    SerializeEmptyArraysAsNull = true,
    IncludeComputedColumns = true
};

DataSerializer.Serialize(buffer, reader, options);

The following options are currently supported:

  • SerializeEmptyArraysAsNull: In versions 2.0.4.480 and earlier, zero-length arrays were serialized as null. After that, they are serialized properly as a zero-length array. Set this flag if you need to write to the old format. Default is false.
  • IncludeComputedColumns: Computed columns are ignored by default (columns who's values are determined by an Expression rather than a stored value). Set to true to include computed columns in serialization.

WCF Streaming support

protobuf-net-data provides a readable ProtoDataStream class, which incrementally serializes a data reader (row by row) as it is read.

This is required for WCF Streaming, where a readable Stream instance must be returned from your operation contract (instead of writing directly to the output stream like in most other .NET stream-based interfaces e.g. HTTP and file IO).

Usage example:

[ServiceContract]
public class MyWcfService : IMyWcfService
{
    [OperationContract]
    public Stream GetStream()
    {
        using (var command = new SqlCommand("SELECT * FROM ..."))
        using (var reader = command.ExecuteReader())
            return new ProtoDataStream(reader);
    }
}

Why does this library exist?

.NET, as a mostly-statically typed language, has a lot of really good options for serializing statically-typed objects. Protocol Buffers, MessagePack, JSON, BSON, XML, SOAP, and the BCL's own proprietary binary serialization are all great for CLR objects, where the fields can be determined at runtime.

However, for data that is tabular in nature, there aren't so many options. Protocol Buffers DataReader Extensions for .NET was born out of a need to serialize data:

  • That is tabular - not necessarily CLR DTOs.
  • Where the schema is unknown before it is deserialized - each data set can have totally different columns.
  • In a way that is streamable, so entire entire data sets do not have to be buffered in memory at once.
  • that can be as large as hundreds of thousands of rows/columns.
  • In a reasonably performant manner.
  • In a way that could potentially be read by different platforms.
  • Into as small a number of bytes as possible.

DataSerializer packs data faster and smaller than the equivalent DataTable.Save/Write XML:

DataSerializer vs DataTable benchmarks

FAQ

Are multiple result sets supported?

Yes! Multiple data tables (IDataReader.NextResult()) are now supported. For example, a DataSet containing the results of 3 SQL queries executed as a single batch.

Are nested DataTables supported?

No. Nested DataTable support was removed in protobuf-net-data 2.0.5.601 because they are not portable and the way they were implemented was actually violating the protobuf spec. If nested DataTables is required we recommend you dump them out using an old version of this library into another format.

What exactly from the data reader gets serialized?

Only the data reader's contents are serialized - i.e., the column name, data type, and values. Metadata about unique keys, auto increment, default value, base table name, data provider, data relations etc is ignored. Any other DataRowVersions will also ignored.

What about computed columns?

By default, computed columns (i.e. those with an Expression set) will be skipped and not written to the byte stream. Set the IncludeComputedColumns option true to override this.

Will protobuf-net v1 be supported?

No. Only protobuf-net v2 is supported right now, and it is unlikely any effort will be spent back-porting it to v1 (if indeed it is even possible with v1).

What about backwards compatiblity?

This library is backwards compatible with itself (old versions can deserialize binary blobs produced from later versions and vice versa). The only change to the binary serialization format is that prior to version 2.0.4.480, empty arrays were serialized as null. This behaviour is not a breaking change, but will produce different output. The old behaviour can be restored in the current version by setting the SerializeEmptyArraysAsNull option to true.

How can I mock/stub out the DataSerializer class in my unit tests? All its methods are static.

You can use IDataSerializerEngine/DataSerializerEngine for testing and dependency injection - it has all the same methods as DataSerializer (new in 2.0.2.480). Alternatively, both the lower-level classes, ProtoDataReader and ProtoDataWriter, have interfaces and can be mocked out as well.

How can I remap column values while streaming results?

Check out this guide and code example.

Is this library supported in any other languages? E.g. for Java ResultSets?

In theory, protobuf-net-data binary streams should be able to be serialized and deserialized by any programming language with a protocol-buffers implementation. The protocol buffer structure is documented in ProtoDataWriter.cs.

This would be a great future roadmap - as far as I know there is currently no tool for cross-platform binary (and streaming) serialization of tabular data.

Credits

Thanks to:

License

Protocol Buffers DataReader Extensions for .NET is available under the Apache License 2.0.

Release History / Changelog

2.0.6.621 - Jan 17 2013

  • Upgraded to protobuf-net 2.0.621.

2.0.6.619 - Jan 11 2013

  • Upgraded to protobuf-net 2.0.619.

2.0.6.614 - Jan 4 2013

  • Upgraded to protobuf-net 2.0.614.

2.0.6.612 - Dec 10 2012

  • Upgraded to protobuf-net 2.0.612.

2.0.6.611 - Dec 8 2012

  • Added ProtoDataStream to support WCF Streaming (issue #20).

2.0.5.611 - Dec 5 2012

  • Upgraded to protobuf-net 2.0.0.611.
  • Now supporting .NET 4.5.

2.0.5.602 - Nov 14 2012

  • Upgraded to protobuf-net 2.0.0.602.

2.0.5.601 - Nov 8 2012

  • Upgraded to protobuf-net 2.0.0.601.
  • Removed nested DataTables support.

2.0.5.480 - July 2 2012

  • New feature ProtoDataWriterOptions to specify handling of zero-length arrays and computed columns.
  • Bug fix for an issue where an exception would be thrown when serializing char column values (issue #15).

2.0.4.480 - June 27 2012

  • Bug fix for an issue where ProtoDataWriter incorrectly assumed all IDataReader schema tables have an 'Expression' column (issue #12).

2.0.3.480 - March 16 2012

  • Bug fix for an issue where computed columns were serialized (issue #11).

2.0.2.480 - February 18 2012

  • Extracted IDataSerializerEngine to make mocking and dependency injection easier.

2.0.1.480 - January 11 2012

  • Upgraded to protobuf-net 2.0.0.480.
  • Fixed an issue saving floats and doubles (issue #10).

2.0.1.470 - December 8 2011

  • Upgraded to protobuf-net 2.0.0.470 (issue #9).
  • Version number had to incremented unfortunately due to me uploading a broken package to NuGet.org (version numbers can't be reused).

2.0.0.452 - November 1 2011

  • Initial release.

About

A library for serializing ADO.NET DataTables and DataReaders into a portable binary format.

Resources

Stars

Watchers

Forks

Packages

No packages published