C# (CSharp) StringTokenizer.GetAllPossibleTokens示例

编程语言: C# (CSharp)

类/类型: StringTokenizer

方法/功能: GetAllPossibleTokens

hotexamples.com的示例: 3

C# (CSharp) StringTokenizer.GetAllPossibleTokens - 已找到3个示例。这些是从开源项目中提取的最受好评的StringTokenizer.GetAllPossibleTokens现实C# (CSharp)示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

HasMoreTokens(30)

GetEnumerator(30)

NextToken(30)

MoveNext(27)

Parse(22)

CountTokens(21)

Next(19)

HasMoreElements(12)

Consume(11)

ReadDouble(11)

GetCurrentLocation(10)

GetNextToken(9)

Contains(7)

Advance(6)

NextElement(6)

Read(5)

Match(5)

Any(5)

Count(5)

IsAtEnd(4)

AtEnd(4)

GetAllPossibleTokens(3)

GetRest(3)

ReadLine(3)

MatchInteger(3)

ProximoToken(2)

ReadInt(2)

ReadInt32(2)

ReadString(2)

MatchString(2)

MatchIdentifier(2)

MatchDouble(2)

ReadToEnd(2)

GetDigrams(2)

AddCurrentChar(2)

ElementAt(2)

HasElements(2)

GetIterator(2)

FindMaxWordCount(2)

IsString(2)

GetCurrentTypeAndForward(2)

ReadChar(1)

ClearToken(1)

CommitSnapshot(1)

CountAsync(1)

ReadFloat(1)

Last(1)

ReadAll(1)

ParseToCloseParenthesis(1)

GetRemainText(1)

示例#1

显示文件

文件： RuntimeClient.cs 项目： microsoft/CogSLanguageUtilities

        private List <MatchResult> MatchEntitiesWithIndicesPostTokenizeApproach(ProcessedDataset processedDataset, List <string> dataset, string inputSentence, float threshold, int ngramSize = 3)
        {
            // initial match (reduce database)
            var initialMatchResult = MatchEntitiesWithoutIndices(processedDataset, dataset, inputSentence, ngramSize);

            if (initialMatchResult.Count == 0)
            {
                return(initialMatchResult);
            }

            // get initial match sentences TFIDF values
            var matchingSentencesIndices        = initialMatchResult.Select(m => m.DatabaseMatchInfo.MatchIndex).ToList();
            var initialMatchTFIDFMatrix         = processedDataset.TFIDFMatrix.Where((rowValue, rowIndex) => matchingSentencesIndices.Contains(rowIndex)).ToArray();
            var initialMatchTFIDFAbsoluteValues = processedDataset.TFIDFMatrixAbsoluteValues.Where((rowValue, rowIndex) => matchingSentencesIndices.Contains(rowIndex)).ToArray();
            var initialMatchAsDataset           = dataset.Where((rowValue, rowIndex) => matchingSentencesIndices.Contains(rowIndex)).ToList();

            // get all possible tokens of input sentence
            var sentenceTokens         = StringTokenizer.GetAllPossibleTokens(inputSentence, processedDataset.MaximumWordCount, ngramSize);
            var inputTokensTFIDFMatrix = TFIDFController.CalculateInputSenenteceTokensTFIDFMatrix(sentenceTokens, processedDataset, ngramSize);

            // re-matching (with resolution)
            var similarityValuesMatrix = DotProductCalculator.GetDotProduct(inputTokensTFIDFMatrix, initialMatchTFIDFMatrix, matrix2Abs: initialMatchTFIDFAbsoluteValues);

            // re-filter
            var tfidfThreshold = 0.5f;
            var tfidfMatches   = MatchFilter.FilterByThresholdBatch(similarityValuesMatrix, initialMatchAsDataset, sentenceTokens, tfidfThreshold);

            //post processing
            var updatedScoresMatches = PostprocessingController.UpdateMatchScores(tfidfMatches);

            return(MatchFilter.FilterByThreshold(updatedScoresMatches, threshold));
        }

示例#2

显示文件

        public void TokenGenerationTest(List <List <TokenMatchInfo> > wordListTestset, List <List <TokenMatchInfo> > expected, int maxWordCount, int ngrams)
        {
            Console.WriteLine("======================");

            // for each element in the testing set
            foreach (var sentenceWords in wordListTestset)
            {
                var tokenList = StringTokenizer.GetAllPossibleTokens(sentenceWords, maxWordCount, ngrams);

                foreach (var token in tokenList)
                {
                    Console.WriteLine(token.TokenText);
                    Console.WriteLine(token.StartIndex);
                    Console.WriteLine(token.EndIndex);
                    Console.WriteLine("======================");
                }
            }
        }

示例#3

显示文件

文件： RuntimeClient.cs 项目： microsoft/CogSLanguageUtilities

        private List <MatchResult> MatchEntitiesWithIndicesPreTokenizeApproach(ProcessedDataset processedDataset, List <string> dataset, string inputSentence, float threshold, int ngramSize = 3)
        {
            // get all input sentence possible tokens
            var sentenceTokens = StringTokenizer.GetAllPossibleTokens(inputSentence, processedDataset.MaximumWordCount, ngramSize);

            // calculate tokens TFIDF matrix
            var inputTokensTFIDFMatrix = TFIDFController.CalculateInputSenenteceTokensTFIDFMatrix(sentenceTokens, processedDataset, ngramSize);

            // calculate tokens cosine similarity
            var similarityValuesMatrix = DotProductCalculator.GetDotProduct(inputTokensTFIDFMatrix, processedDataset.TFIDFMatrix, matrix2Abs: processedDataset.TFIDFMatrixAbsoluteValues);

            // filter results
            var tfidfThreshold = 0.5f;
            var tfidfMatches   = MatchFilter.FilterByThresholdBatch(similarityValuesMatrix, dataset, sentenceTokens, tfidfThreshold);

            // post processing
            var updatedScoresMatches = PostprocessingController.UpdateMatchScores(tfidfMatches);

            return(MatchFilter.FilterByThreshold(updatedScoresMatches, threshold));
        }