void UpdateIndex(string fullPath, PendingRetrySource pendingRetrySource = null) { if (IsFile(fullPath)) { var fileInfo = new FileInfo(fullPath); try { Thread.Sleep(WaitMilliseconds); // Wait to let file finished write to disk if (fileInfo.Exists) { var content = FilesContentHelper.ReadAllText(fullPath); var document = CodeIndexBuilder.GetDocumentFromSource(CodeSource.GetCodeSource(fileInfo, content)); CodeIndexBuilder.UpdateIndex(config.LuceneIndexForCode, GetNoneTokenizeFieldTerm(nameof(CodeSource.FilePath), fullPath), document); WordsHintBuilder.UpdateWordsHint(config, WordSegmenter.GetWords(content), log); pendingChanges++; } } catch (IOException) { AddFileChangesToRetrySouce(fullPath, WatcherChangeTypes.Changed, pendingRetrySource); } catch (Exception ex) { log?.Error(ex.ToString()); } } }
void CreateNewIndex(string fullPath, PendingRetrySource pendingRetrySource = null) { if (IsFile(fullPath)) { var fileInfo = new FileInfo(fullPath); try { Thread.Sleep(WaitMilliseconds); // Wait to let file finished write to disk if (fileInfo.Exists) { var content = FilesContentHelper.ReadAllText(fullPath); CodeIndexBuilder.BuildIndex(config, false, false, false, new[] { CodeSource.GetCodeSource(fileInfo, content) }); WordsHintBuilder.UpdateWordsHint(config, WordSegmenter.GetWords(content), log); pendingChanges++; } } catch (IOException) { AddFileChangesToRetrySouce(fullPath, WatcherChangeTypes.Created, pendingRetrySource); } catch (Exception ex) { log?.Error(ex.ToString()); } } }
public override Exp Parse() { var wordTree = this.ExpContext.ExpWordDictionary; WordSegmenter segmenter = new WordSegmenter(wordTree); List <Token> tokens = new List <Token>(); foreach (var tok in RawTokens) { if (tok.Kind == TokenKind.Ident) { Token[] newTokens = segmenter.Split(tok); tokens.AddRange(newTokens); } else if (tok.Kind != TokenKind.NewLine) { tokens.Add(tok); } } ExpParser parser = new ExpParser(); Exp exp = parser.Parse(tokens, this.FileContext); //exp.ParentExp = this.ParentExp; exp.SetContext(this.ExpContext); return(exp); }
public void TestGetWords() { var content = "It's a content for test" + Environment.NewLine + "这是一个例句,我知道了"; CollectionAssert.AreEquivalent(new[] { "It", "s", "a", "content", "for", "test", "这是一个例句", "我知道了" }, WordSegmenter.GetWords(content)); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(null)); }
/** * SmartChineseAnalyzer内部带有默认停止词库,主要是标点符号。如果不希望结果中出现标点符号, * 可以将useDefaultStopWords设为true, useDefaultStopWords为false时不使用任何停止词 * * @param useDefaultStopWords */ public SmartChineseAnalyzer(bool useDefaultStopWords) { if (useDefaultStopWords) { stopWords = loadStopWords(this.GetType().Assembly.GetManifestResourceStream( "imdict.stopwords.txt")); } wordSegment = new WordSegmenter(); }
/** * SmartChineseAnalyzer内部带有默认停止词库,主要是标点符号。如果不希望结果中出现标点符号, * 可以将useDefaultStopWords设为true, useDefaultStopWords为false时不使用任何停止词 * * @param useDefaultStopWords */ public SmartChineseAnalyzer(bool useDefaultStopWords) { if (useDefaultStopWords) { stopWords = loadStopWords(this.GetType().Assembly.GetManifestResourceStream( "imdict.stopwords.txt")); } wordSegment = new WordSegmenter(); }
public void TestGetWords() { var content = "It's a content for test" + Environment.NewLine + "这是一个例句,我知道了"; CollectionAssert.AreEquivalent(new[] { "It", "s", "a", "content", "for", "test", "这是一个例句", "我知道了" }, WordSegmenter.GetWords(content)); CollectionAssert.AreEquivalent(new[] { "It", "for", "test", "我知道了" }, WordSegmenter.GetWords(content, 2, 4)); CollectionAssert.IsEmpty(WordSegmenter.GetWords("a".PadRight(201, 'b'))); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(null)); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(content, 0)); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(content, 200)); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(content, 3, 1)); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(content, 3, -1)); Assert.Throws <ArgumentException>(() => WordSegmenter.GetWords(content, 3, 1001)); }
private ParseResult ParseNameBySegmenter(Token token, IWordDictionary collection) { //WordCollection nameManager = this.procContext.ClassContext.FileContext.GetNameDimWordManger(); WordSegmenter segmenter = new WordSegmenter(collection); Token[] newTokens = segmenter.Split(token); if (newTokens.Length == 2) { string argTypeName = newTokens[0].GetText(); var ArgType = ZTypeManager.GetByMarkName(argTypeName)[0] as ZType; var result = new ParseResult() { TypeName = argTypeName, ZType = ArgType, VarName = newTokens[1].GetText() }; return(result); } else { return(null); } }
/** * 使用自定义的而不使用内置的停止词库,停止词可以使用SmartChineseAnalyzer.loadStopWords(InputStream)加载 * * @param stopWords * @see SmartChineseAnalyzer.loadStopWords(InputStream) */ public SmartChineseAnalyzer(List<string> stopWords) { this.stopWords = stopWords; wordSegment = new WordSegmenter(); }
/** * 使用自定义的而不使用内置的停止词库,停止词可以使用SmartChineseAnalyzer.loadStopWords(InputStream)加载 * * @param stopWords * @see SmartChineseAnalyzer.loadStopWords(InputStream) */ public SmartChineseAnalyzer(List <string> stopWords) { this.stopWords = stopWords; wordSegment = new WordSegmenter(); }