C# (CSharp) Edu.Stanford.Nlp.IE.Machinereading.Domains.Ace.Reader RobustTokenizerの例

プログラミング言語: C# (CSharp)

名前空間/パッケージ名: Edu.Stanford.Nlp.IE.Machinereading.Domains.Ace.Reader

クラス/型: RobustTokenizer

hotexamples.comのコード掲載数: 2

C# (CSharp) Edu.Stanford.Nlp.IE.Machinereading.Domains.Ace.Reader RobustTokenizer - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたC# (CSharp)のEdu.Stanford.Nlp.IE.Machinereading.Domains.Ace.Reader.RobustTokenizerの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Tokenize(1)

TokenizeToWordTokens(1)

コード例 #1

ファイルを表示

        /// <exception cref="System.Exception"/>
        public static void Main(string[] argv)
        {
            if (argv.Length != 1)
            {
                log.Info("Usage: java edu.stanford.nlp.ie.machinereading.common.RobustTokenizer <file to tokenize>");
                System.Environment.Exit(1);
            }
            // tokenize this file
            BufferedReader @is = new BufferedReader(new FileReader(argv[0]));
            // read the whole file in a buffer
            // XXX: for sure there are more efficient ways of reading a file...
            int           ch;
            StringBuilder buffer = new StringBuilder();

            while ((ch = @is.Read()) != -1)
            {
                buffer.Append((char)ch);
            }
            // create the tokenizer object
            RobustTokenizer <Word> t      = new RobustTokenizer <Word>(buffer.ToString());
            IList <Word>           tokens = t.Tokenize();

            foreach (Word token in tokens)
            {
                System.Console.Out.WriteLine(token);
            }
        }

コード例 #2

ファイルを表示

ファイル: AceSentenceSegmenter.cs プロジェクト: awesomedotnetcore/Stanford.CoreNLP.NET

        /// <param name="filenamePrefix">path to an ACE .sgm file (but not including the .sgm extension)</param>
        /// <exception cref="System.IO.IOException"/>
        /// <exception cref="Org.Xml.Sax.SAXException"/>
        /// <exception cref="Javax.Xml.Parsers.ParserConfigurationException"/>
        public static IList <IList <AceToken> > TokenizeAndSegmentSentences(string filenamePrefix)
        {
            IList <IList <AceToken> > sentences = new List <IList <AceToken> >();
            File   inputFile = new File(filenamePrefix + AceDocument.OrigExt);
            string input     = IOUtils.SlurpFile(inputFile);
            // now we can split the text into tokens
            RobustTokenizer <Word>            tokenizer = new RobustTokenizer <Word>(input);
            IList <RobustTokenizer.WordToken> tokenList = tokenizer.TokenizeToWordTokens();
            // and group the tokens into sentences
            List <AceToken> currentSentence = new List <AceToken>();
            int             quoteCount      = 0;

            for (int i = 0; i < tokenList.Count; i++)
            {
                RobustTokenizer.WordToken token = tokenList[i];
                string   tokenText      = token.GetWord();
                AceToken convertedToken = WordTokenToAceToken(token, sentences.Count);
                // start a new sentence if we skipped 2+ lines (after datelines, etc.)
                // or we hit some SGML
                // if (token.getNewLineCount() > 1 || AceToken.isSgml(tokenText)) {
                if (AceToken.IsSgml(tokenText))
                {
                    if (currentSentence.Count > 0)
                    {
                        sentences.Add(currentSentence);
                    }
                    currentSentence = new List <AceToken>();
                    quoteCount      = 0;
                }
                currentSentence.Add(convertedToken);
                if (tokenText.Equals("\""))
                {
                    quoteCount++;
                }
                // start a new sentence whenever we hit sentence-final punctuation
                if (sentenceFinalPuncSet.Contains(tokenText))
                {
                    // include quotes after EOS
                    if (i < tokenList.Count - 1 && quoteCount % 2 == 1 && tokenList[i + 1].GetWord().Equals("\""))
                    {
                        AceToken quoteToken = WordTokenToAceToken(tokenList[i + 1], sentences.Count);
                        currentSentence.Add(quoteToken);
                        quoteCount++;
                        i++;
                    }
                    if (currentSentence.Count > 0)
                    {
                        sentences.Add(currentSentence);
                    }
                    currentSentence = new List <AceToken>();
                    quoteCount      = 0;
                }
                else
                {
                    // start a new sentence when we hit an SGML tag
                    if (AceToken.IsSgml(tokenText))
                    {
                        if (currentSentence.Count > 0)
                        {
                            sentences.Add(currentSentence);
                        }
                        currentSentence = new List <AceToken>();
                        quoteCount      = 0;
                    }
                }
            }
            return(sentences);
        }