var analyzer = new StandardAnalyzer(); var tokenStream = analyzer.TokenStream("field", new StringReader("Lucene.Net is an information retrieval library")); var termAttribute = tokenStream.GetAttribute(); while (tokenStream.IncrementToken()) { Console.WriteLine(termAttribute.Term); }
var analyzer = new StopAnalyzer(); var tokenStream = analyzer.TokenStream("field", new StringReader("Lucene.Net is an information retrieval library")); var termAttribute = tokenStream.GetAttributeThis code example uses the StopAnalyzer to analyze the same text as Example 1. The StopAnalyzer removes "stop words" like "is" and "an" from the text before analyzing it, resulting in a different set of tokens being generated and written to the console output. Package library: Lucene.Net.Analysis.(); while (tokenStream.IncrementToken()) { Console.WriteLine(termAttribute.Term); }
A typical Analyzer implementation will first build a Tokenizer. The Tokenizer will break down the stream of characters from the System.IO.TextReader into raw Tokens. One or more TokenFilters may then be applied to the output of the Tokenizer.