Skip to content

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

License

yasmineChelly-95/catalyst

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nuget Build Status

catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

âš¡ Features

✨ Getting Started

Using catalyst is as simple as installing its NuGet Package, and setting the storage to use our online repository. This way, models will be lazy loaded either from disk or downloaded from our online repository. Check out also some of the sample projects for more examples on how to use catalyst.

Storage.Current = new OnlineRepositoryStorage(new DiskStorage("catalyst-models"));
var nlp = await Pipeline.ForAsync(Language.English);
var doc = new Document("The quick brown fox jumps over the lazy dog", Language.English);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());

You can also take advantage of C# lazy evaluation and native multi-threading support to process a large number of documents in parallel:

var docs = GetDocuments();
var parsed = nlp.Process(docs);
DoSomething(parsed);

IEnumerable<IDocument> GetDocuments()
{
    //Generates a few documents, to demonstrate multi-threading & lazy evaluation
    for(int i = 0; i < 1000; i++)
    {
        yield return new Document("The quick brown fox jumps over the lazy dog", Language.English);
    }
}

void DoSomething(IEnumerable<IDocument> docs)
{
    foreach(var doc in docs)
    {
        Console.WriteLine(doc.ToJson());
    }
}

Training a new FastText word2vec embedding model is as simple as this:

var nlp = await Pipeline.ForAsync(Language.English);
var ft = new FastText(Language.English, 0, "wiki-word2vec");
ft.Data.Type = FastText.ModelType.CBow;
ft.Data.Loss = FastText.LossType.NegativeSampling;
ft.Train(nlp.Process(GetDocs()));
ft.StoreAsync();

For fast embedding search, we have also released a C# version of the "Hierarchical Navigable Small World" (HNSW) algorithm on NuGet, based on our fork of Microsoft's HNSW.Net. We have also released a C# version of the "Uniform Manifold Approximation and Projection" (UMAP) algorithm for dimensionality reduction on GitHub and on NuGet.

📖 Documentation (coming soon)

Documentation
Getting Started How to use catalyst and its features.
API Reference The detailed reference for catalyst's API.
Contribute How to contribute to catalyst codebase.
Samples Sample projects demonstrating catalyst capabilities
Slack Channel Join our Slack channel

About

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%