Resin is a document-based search engine and analytics tool. Querying support includes exact, fuzzy and prefix. Analyzers, tokenizers and scoring schemes are customizable.
The current query language is a copy of Lucene's (minus range and grouping (coming soon)).
Resin can be seen as an index of the same kind you attach to database tables when you want to make reading from them fast. Resin indices are fast to write and read from and support near (as in "almost match") which is out-of-scope for most database index types.
Apart from offering fast lookups, like a database index, Resin also scores documents based on their relevance. Relevance in turn is based on the distance from a document and a query in vector space.
To support the default tf-idf scoring scheme Resin stores term counts. Resin supports any scoring scheme and also gives you the ability to store additional document/sentence/token meta-data your model might need (up-coming feature in RC4). That data will be delivered to you neatly as a field on the document posting. In your custom IScoringScheme you then base your per-document posting calculations on that instead of just the term count.
In many scenarios Resin is already faster than the market leader when it comes down to querying and indexing speed, making it a in-many-scenarios-fastest information retrieval system on the .net plaform and certainly a good choice if you're on dotnet core being there is no real alternative.
If you have a scenario where you feel Resin should do better, this is important to me. Please let me know.
Five years ago the .net community created the search engine, Lucene 3.0.3, we are still using today.
Who could use a modern and powerful search engine based on sound mathematics that's open source, extensible and built on Core, though?
Resin's API and file format should be considered unstable until release candidate 4. Coming features are indexing support for numbers and dates as well as support for range queries.
Resin is built for dotnet Core 1.1.
Latest release is here
Start here.
{
"_id": "Q1",
"label": "universe",
"description": "totality of planets, stars, galaxies, intergalactic space, or all matter or all energy",
"aliases": "cosmos The Universe existence space outerspace"
}
var docs = GetWikipediaAsJson();
var dir = @"C:\Users\Yourname\Resin\wikipedia";
using (var upsert = new DocumentUpsertOperation(dir, new Analyzer(), compression:true, primaryKey:"_id", docs))
{
upsert.Commit();
}
varr result = new Searcher(dir).Search("label:good bad~ description:leone", page:0, size:15);
// Document scores, i.e. the aggregated tf-idf weights a document recieve from a simple
// or compound query, are included in the result:
var scoreOfFirstDoc = result.Docs.First().Fields["__score"];
- Layout basic architecture and infrastructure of a modern IR system - v0.9b
- Query faster than Lucene - v1.0 RC1
- Index faster than Lucene - v1.0 RC2
- Compress more impressively than Lucene - v1.0 RC3
- Numbers, dates, range, grouping of query statements - 1.0
- Build Sir, a distributed search engine
Sir is Resin distributed, a search engine, map/reduce system and long-term data storage solution in one, a Elasticsearch+Hadoop replacement.