C# (CSharp) UW.NLP.LanguageModels SentenceNormalizer示例

编程语言: C# (CSharp)

命名空间/包名称: UW.NLP.LanguageModels

hotexamples.com的示例: 2

C# (CSharp) UW.NLP.LanguageModels SentenceNormalizer - 已找到2个示例。这些是从开源项目中提取的最受好评的UW.NLP.LanguageModels.SentenceNormalizer现实C# (CSharp)示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

Normalize(1)

Tokenize(1)

示例#1

显示文件

文件： Program.cs 项目： elendil326/UWNLP

        static void Main(string[] args)
        {
            string file1 = @"C:\Users\azend\Documents\GitHubVisualStudio\UWNLP\Assignment1\LanguageModels.UnitTests\TestData\brown.txt";

            SentenceNormalizer normalizer = new SentenceNormalizer(1, "{{*}}", "{{END}}", " ", ".");
            HashSet<string> vocabulary = new HashSet<string>(StringComparer.Ordinal);

            List<List<string>> splitCorpus = SplitCorpus(file1, 80, 10, 10);
            foreach (string sentence in splitCorpus[0])
            {
                string normalizedSentence = normalizer.Normalize(sentence);
                foreach (string token in normalizer.Tokenize(normalizedSentence))
                {
                    vocabulary.Add(token);
                }
            }

            double validationWords = 0;
            double unkownWords = 0;
            foreach (string sentence in splitCorpus[1])
            {
                string normalizedSentence = normalizer.Normalize(sentence);
                foreach (string token in normalizer.Tokenize(normalizedSentence))
                {
                    validationWords++;
                    if (!vocabulary.Contains(token))
                    {
                        unkownWords++;
                    }
                }
            }

            Console.WriteLine("Total number of words in validation: {0}", validationWords);
            Console.WriteLine("Number of unseen words in validaiton is: {0}", unkownWords);
            Console.WriteLine("Percentage is: {0}", unkownWords / validationWords);
        }

示例#2

显示文件

文件： LanguageModel.cs 项目： elendil326/UWNLP

 /// <summary>
 /// Initializes the class.
 /// </summary>
 private void Init()
 {
     Normalizer = new SentenceNormalizer(Settings.NGramOrder, Settings.StartToken, Settings.EndToken, Settings.Separator, Settings.PossibleEnd);
     NGramCounter = new NGramCounter(Settings);
     Vocabulary = new HashSet<string>(Settings.StringComparer);
     _wordsSeenOnlyOnce = new HashSet<string>(Settings.StringComparer);
     PMLCache = new Dictionary<NGram, double>();
 }