C# (CSharp) lucene.analysis.Token 예제들

프로그래밍 언어: C# (CSharp)

hotexamples.com에서의 예제들: 3

C# (CSharp) lucene.analysis.Token - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 C# (CSharp)의 lucene.analysis.Token에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

endOffset(1)

setPositionIncrement(1)

startOffset(1)

termBuffer(1)

termLength(1)

termText(1)

예제 #1

파일 보기

        /**
         * Inserts bigrams for common words into a token stream. For each input token,
         * output the token. If the token and/or the following token are in the list
         * of common words also output a bigram with position increment 0 and
         * type="gram"
         */
        /*
         * TODO: implement new lucene 2.9 API incrementToken() instead of deprecated
         * Token.next() TODO:Consider adding an option to not emit unigram stopwords
         * as in CDL XTF BigramStopFilter, CommonGramsQueryFilter would need to be
         * changed to work with this. TODO: Consider optimizing for the case of three
         * commongrams i.e "man of the year" normally produces 3 bigrams: "man-of",
         * "of-the", "the-year" but with proper management of positions we could
         * eliminate the middle bigram "of-the"and save a disk seek and a whole set of
         * position lookups.
         */

        protected override Token process(Token token)
        {
            Token next = peek(1);

            // if this is the last token just spit it out. Any commongram would have
            // been output in the previous call
            if (next == null)
            {
                return(token);
            }

            /**
             * if this token or next are common then construct a bigram with type="gram"
             * position increment = 0, and put it in the output queue. It will be
             * returned when super.next() is called, before this method gets called with
             * a new token from the input stream See implementation of next() in
             * BufferedTokenStream
             */

            if (isCommon(token) || isCommon(next))
            {
                Token gram = gramToken(token, next);
                write(gram);
            }
            // we always return the unigram token
            return(token);
        }

예제 #2

파일 보기

        /** Construct a compound token. */
        private Token gramToken(Token first, Token second)
        {
            buffer.setLength(0);
#pragma warning disable 612
            buffer.append(first.termText());
#pragma warning restore 612
            buffer.append(SEPARATOR);
#pragma warning disable 612
            buffer.append(second.termText());
#pragma warning restore 612
            Token result = new Token(buffer.toString(), first.startOffset(), second
                                     .endOffset(), "gram");
            result.setPositionIncrement(0);
            return(result);
        }

예제 #3

파일 보기

 /** True if token is for a common term. */
 private bool isCommon(Token token)
 {
     return(commonWords != null &&
            commonWords.contains(token.termBuffer(), 0, token.termLength()));
 }