Skip to content

rouault/flatgeobuf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

layout FlatGeobuf

CircleCI npm Maven Central Nuget

A performant binary encoding for geographic data based on flatbuffers that can hold a collection of Simple Features including circular interpolations as defined by SQL-MM Part 3.

Inspired by geobuf and flatbush. Deliberately does not support random writes for simplicity and to be able to cluster the data on a packed Hilbert R-Tree enabling fast bounding box spatial filtering. The spatial index is optional to allow the format to be efficiently written as a stream and for use cases where spatial filtering is not needed.

Goals are to be suitable for large volumes of static data, significantly faster than legacy formats without size limitations for contents or metainformation and to be suitable for streaming/random access.

The site switchfromshapefile.org has more in depth information about the problems of legacy formats and provides some alternatives but acknowledges that the current alternatives has some drawbacks on their own, for example they are not suitable for streaming.

Examples

Specification

layout

  • MB: Magic bytes (0x6667620366676200)
  • H: Header (variable size flatbuffer)
  • I (optional): Static packed Hilbert R-tree index (static size custom buffer)
  • DATA: Features (variable size flatbuffers)

Any 64-bit flatbuffer value contained anywhere in the file (for example coordinates) is aligned to 8 bytes to from the start of the file or feature to allow for direct memory access.

Encoding of any string value is assumed to be UTF-8.

Performance

Preliminary performance tests has been done using road data from OSM for Denmark in SHP format from download.geofabrik.de, containing 906602 LineString features with a set of attributes.

Shapefile GeoPackage FlatGeobuf GeoJSON GML
Read full dataset 1 1.02 0.46 15 8.9
Read w/spatial filter 1 0.94 0.71 705 399
Write full dataset 1 0.77 0.39 3.9 3.2
Write w/spatial index 1 1.58 0.65 - -
Size 1 0.72 0.77 1.2 2.1

The test was done using GDAL implementing FlatGeobuf as a driver and measurements for repeated reads using loops of ogrinfo -qq -oo VERIFY_BUFFERS=NO runs and measurements for repeated writes was done with ogr2ogr conversion from the original to a new file with -lco SPATIAL_INDEX=NO and -lco SPATIAL_INDEX=YES respectively.

Note that for the test with spatial filter a small bounding box was chosen resulting in only 1204 features. The reason for this is to primarily test the spatial index search performance.

As performance is highly data dependent I've also made similar tests on a larger dataset with Danish cadastral data consisting of 2511772 Polygons with extensive attribute data.

Shapefile GeoPackage FlatGeobuf
Read full dataset 1 0.23 0.12
Read w/spatial filter 1 0.31 0.26
Write full dataset 1 0.95 0.63
Write w/spatial index 1 1.07 0.70
Size 1 0.77 0.95

Features

TODO

  • Java index support
  • C langauge support
  • Go langauge support
  • Rust language support

FAQ

Why not use WKB geometry encoding?

It does not align on 8 bytes so it not always possible to consume it without copying first.

Why not use Protobuf?

Performance reasons and to allow streaming/random access.

Why am I not getting expected performance in GDAL?

Default behaviour is to assume untrusted data and verify buffer integrity for safety. If you have trusted data and want maximum performance make sure to set the open option VERIFY_BUFFERS to NO.

About

A performant binary encoding for geographic data based on flatbuffers

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 33.7%
  • C++ 23.6%
  • C# 22.0%
  • Java 19.6%
  • Other 1.1%