C# (CSharp) Lucene.Net.Analysis.Ngram NGramTokenizer примеры использования

Язык программирования: C# (CSharp)

Пространство имен/Пакет: Lucene.Net.Analysis.Ngram

Класс/Тип: NGramTokenizer

Примеров на hotexamples.com: 8

C# (CSharp) Lucene.Net.Analysis.Ngram NGramTokenizer - 8 примеров найдено. Это лучшие примеры C# (CSharp) кода для Lucene.Net.Analysis.Ngram.NGramTokenizer, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Tokenizes the input into n-grams of the given size(s).

On the contrary to NGramTokenFilter, this class sets offsets so that characters between startOffset and endOffset in the original stream are the same as the term chars.

For example, "abcde" would be tokenized as (minGram=2, maxGram=3):

Term	ab	abc	bc	bcd	cd	cde	de
Position increment	1	1	1	1	1	1	1
Position length	1	1	1	1	1	1	1
Offsets	[0,2[	[0,3[	[1,3[	[1,4[	[2,4[	[2,5[	[3,5[

This tokenizer changed a lot in Lucene 4.4 in order to:

tokenize in a streaming fashion to support streams which are larger than 1024 chars (limit of the previous version),
count grams based on unicode code points instead of java chars (and never split in the middle of surrogate pairs),
give the ability to #isTokenChar(int) pre-tokenize the stream before computing n-grams.

Additionally, this class doesn't trim trailing whitespaces and emits tokens in a different order, tokens are now emitted by increasing start offsets while they used to be emitted by increasing lengths (which prevented from supporting large input streams).

Although highly discouraged, it is still possible to use the old behavior through Lucene43NGramTokenizer.

Наследование: Tokenizer

Документация по классу NGramTokenizer

Пример #1

Показать файл

        public virtual void TestReset()
        {
            NGramTokenizer tokenizer = new NGramTokenizer(TEST_VERSION_CURRENT, input, 1, 1);

            AssertTokenStreamContents(tokenizer, new string[] { "a", "b", "c", "d", "e" }, new int[] { 0, 1, 2, 3, 4 }, new int[] { 1, 2, 3, 4, 5 }, 5); // abcde
            tokenizer.Reader = new StringReader("abcde");
            AssertTokenStreamContents(tokenizer, new string[] { "a", "b", "c", "d", "e" }, new int[] { 0, 1, 2, 3, 4 }, new int[] { 1, 2, 3, 4, 5 }, 5); // abcde
        }

Пример #2

Показать файл