Skip to content

AhmedHamam/resin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resin

Resin is a document-based search engine and analytics tool. Querying support includes exact, fuzzy and prefix. Analyzers, tokenizers and scoring schemes are customizable.

Query language

The current query language is a copy of Lucene's (minus range and grouping (coming soon)).

It's a smarter index

Resin can be seen as an index of the same kind you attach to database tables when you want to make reading from them fast. Resin indices are fast to write and read from and support near (as in "almost match") which is out-of-scope for most database index types.

Apart from offering fast lookups, like a database index, Resin also scores documents based on their relevance. Relevance in turn is based on the distance from a document and a query in vector space.

Supports any scoring scheme

To support the default tf-idf scoring scheme Resin stores term counts. Resin supports any scoring scheme and also gives you the ability to store additional document/sentence/token meta-data your model might need (up-coming feature in RC4). That data will be delivered to you neatly as a field on the document posting. In your custom IScoringScheme you then base your per-document posting calculations on that instead of just the term count.

Fast at indexing and querying

In many scenarios Resin is already faster than the market leader when it comes down to querying and indexing speed, making it a in-many-scenarios-fastest information retrieval system on the .net plaform and certainly a good choice if you're on dotnet core being there is no real alternative.

If you have a scenario where you feel Resin should do better, this is important to me. Please let me know.

Deeply influenced by but not based on a java port

Five years ago the .net community created the search engine, Lucene 3.0.3, we are still using today.

Who could use a modern and powerful search engine based on sound mathematics that's open source, extensible and built on Core, though?

Stable (API and file format) in RC4

Resin's API and file format should be considered unstable until release candidate 4. Coming features are indexing support for numbers and dates as well as support for range queries.

Supported .net version

Resin is built for dotnet Core 1.1.

Download

Latest release is here

Help out?

Start here.

Documentation

A document.

{
	"_id": "Q1",
	"label":  "universe",
	"description": "totality of planets, stars, galaxies, intergalactic space, or all matter or all energy",
	"aliases": "cosmos The Universe existence space outerspace"
}

Many like that.

var docs = GetWikipediaAsJson();

Index them.

var dir = @"C:\Users\Yourname\Resin\wikipedia";
using (var upsert = new DocumentUpsertOperation(dir, new Analyzer(), compression:true, primaryKey:"_id", docs))
{
	upsert.Commit();
}

Query the index.

varr result = new Searcher(dir).Search("label:good bad~ description:leone", page:0, size:15);

// Document scores, i.e. the aggregated tf-idf weights a document recieve from a simple 
// or compound query, are included in the result:

var scoreOfFirstDoc = result.Docs.First().Fields["__score"];

More documentation here.

Roadmap

  • Layout basic architecture and infrastructure of a modern IR system - v0.9b
  • Query faster than Lucene - v1.0 RC1
  • Index faster than Lucene - v1.0 RC2
  • Compress more impressively than Lucene - v1.0 RC3
  • Numbers, dates, range, grouping of query statements - 1.0
  • Build Sir, a distributed search engine

Sir

Sir is Resin distributed, a search engine, map/reduce system and long-term data storage solution in one, a Elasticsearch+Hadoop replacement.

About

Fastest search engine on dotnet core

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C# 99.3%
  • Batchfile 0.7%