A {@link TokenFilter} with a stop word table.
  • Numeric tokens are removed.
  • English tokens must be larger than 1 char.
  • One Chinese char as one Chinese word.
TO DO:
  1. Add Chinese stop words, such as \ue400
  2. Dictionary based Chinese word extraction
  3. Intelligent Chinese word extraction
Inheritance: Lucene.Net.Analysis.TokenFilter
Exemplo n.º 1
0
        /// <summary>
        /// Creates a TokenStream which tokenizes all the text in the provided Reader.
        /// </summary>
        /// <returns>A TokenStream build from a ChineseTokenizer filtered with ChineseFilter.</returns>
        public override sealed TokenStream TokenStream(String fieldName, TextReader reader)
        {
            TokenStream result = new ChineseTokenizer(reader);

            result = new ChineseFilter(result);
            return(result);
        }
Exemplo n.º 2
0
		/// <summary>
		/// Creates a TokenStream which tokenizes all the text in the provided Reader.
		/// </summary>
		/// <returns>A TokenStream build from a ChineseTokenizer filtered with ChineseFilter.</returns>
		public override sealed TokenStream TokenStream(String fieldName, TextReader reader) 
		{
			TokenStream result = new ChineseTokenizer(reader);
			result = new ChineseFilter(result);
			return result;
		}