To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.
this class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift
value (number of bits removed) used during encoding.
To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: #doubleToSortableLong, #floatToSortableInt. You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to longs or ints (e.g. date to long: java.util.Date#getTime).
For easy usage, the trie algorithm is implemented for indexing inside NumericTokenStream that can index int
, long
, float
, and double
. For querying, NumericRangeQuery and NumericRangeFilter implement the query part for the same data types.
this class can also be used, to generate lexicographically sortable (according to BytesRef#getUTF8SortedAsUTF16Comparator()) representations of numeric data types for other usages (e.g. sorting). @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0