GetParagraphSentences(IList <SentenceItems> paragraphSentenceItems, int paragraphStartPosition) { List <TextParsing.Sentence> result = new List <TextParsing.Sentence>(); TextParsing.Sentence parent = null; foreach (var sentenceItems in paragraphSentenceItems) { Entity lastEntity = sentenceItems.Entities.LastOrDefault(); TextParsing.Sentence current = sentenceItems.Sentence; if (parent != null) { parent.AppendNextPart(current); current = parent; parent = null; } int sentenceEndPosition = paragraphStartPosition + current.StartPosition + current.Text.Length; if ((lastEntity != null) && (sentenceEndPosition < lastEntity.PositionInfo.End)) { parent = current; } else { result.Add(current); } } return(new Tuple <TextParsing.Sentence[], TextParsing.Sentence>(result.ToArray(), parent)); }
public SentenceItems(TextParsing.Sentence sentence, Entity[] entities, int contentIndex) : this() { Sentence = sentence; Entities = entities; ContentIndex = contentIndex; }
/// <summary> /// Задание элементов предложения /// </summary> /// <param name="sentence">предложение</param> /// <param name="entities">сущности</param> /// <param name="currentContentIndex">индекс, с которого идут контенты данного предложения</param> /// <returns>элементы предложения</returns> private SentenceItems SetSentenceItems(TextParsing.Sentence sentence, IList <Entity> entities, ref int currentContentIndex) { var result = new SentenceItems(sentence, entities.ToArray(), currentContentIndex); GetContentIndexesInsideSentence(sentence, ref currentContentIndex); entities.Clear(); return(result); }
/// <summary> /// Получение списка индексов контентов начиная с текущего, которые лежат в заданном предложении /// </summary> /// <param name="textSentence">текстовое предложение</param> /// <param name="currentContentIndex">индекс текущего контента</param> /// <param name="sentenceShift">сдвиг предложения</param> /// <returns>список индексов контентов в заданном предложении</returns> private List <int> GetContentIndexesInsideSentence(TextParsing.Sentence textSentence, ref int currentContentIndex, int sentenceShift = 0) { List <int> result = new List <int>(); int sentenceStart = textSentence.GetFullStartPosition() + sentenceShift; int sentenceEnd = textSentence.GetFullEndPosition() + sentenceShift; while (IsSentenceContainContent(currentContentIndex, textSentence.Text, sentenceStart, sentenceEnd)) { result.Add(currentContentIndex); ++currentContentIndex; } return(result); }
/// <summary> /// Получение текста предложения с дочерними частями /// </summary> /// <param name="sentence">предложение</param> /// <param name="startPosition">начальная позиция предложения</param> /// <param name="isOnlyPotencial">флаг, что нужно выбирать только потенциальные части</param> /// <param name="contentIndex">индекс текущего контента</param> /// <returns>текст предложения с дочерними частями</returns> private string GetSentenceWithChildrenParts(TextParsing.Sentence sentence, int startPosition, bool isOnlyPotencial, ref int contentIndex) { StringBuilder result = new StringBuilder(sentence.Text); var indexes = GetContentIndexesInsideSentence(sentence, ref contentIndex); int partsLength = 0; foreach (int index in indexes) { ChildContent child = Children[index]; if (!isOnlyPotencial || child.IsPotencialParentPart) { result.Insert(GetContentCorrectedStart(index) - startPosition + partsLength, child.GetTextWithChildrenParts(isOnlyPotencial)); partsLength += child.GetFullLength(); } } return(result.ToString()); }
/// <summary> /// Создание объекта подпредложения /// </summary> /// <param name="parentSentence">родительское предложение</param> /// <param name="textSentence">текстовое предложение</param> /// <param name="entities">список сущностей предложения</param> /// <param name="contentIndexes">индексы контентов в предложении</param> /// <returns>объект подпредложения</returns> private SubSentence CreateSubSentenceObject( Sentence parentSentence, TextParsing.Sentence textSentence, IEnumerable <Entity> entities, IEnumerable <int> contentIndexes) { var result = new SubSentence(SubSentenceType.Default, 0) { ParentObject = parentSentence }; var units = GetSubSentenceUnits(result, textSentence, entities); var subSentences = GetChildContentSubSentences(contentIndexes, units, parentSentence); result.SetUnits(units); result.AppendSubSentences(subSentences); return(result); }
/// <summary> /// Получение нового параграфа /// </summary> /// <param name="sentenceItemsCollection">коллекция элементов предложений</param> /// <param name="startPosition">начальная позиция</param> /// <param name="contentInfo">информация о корректируемом контенте</param> /// <returns>новый параграф</returns> private Paragraph GetNewParagraph(IEnumerable <SentenceItems> sentenceItemsCollection, int startPosition, CorrectedContentInfo contentInfo) { List <TextParsing.Sentence> sentences = new List <TextParsing.Sentence>(); int sentenceStartPosition = 0; foreach (var sentenceItems in sentenceItemsCollection) { var sentence = new TextParsing.Sentence(GetNewSentenceText(sentenceItems, contentInfo), sentenceStartPosition); sentences.Add(sentence); sentenceStartPosition = sentence.EndPosition; } Paragraph result = new Paragraph(sentences.GetText(), startPosition); result.SetSentences(sentences); return(result); }
/// <summary> /// Создание объекта предложения /// </summary> /// <param name="textSentence">текстовое предложение</param> /// <param name="entities">список сущностей предложения</param> /// <param name="currentContentIndex">индекс текущего контента</param> /// <returns>объект предложения</returns> private Sentence CreateSentenceObject(TextParsing.Sentence textSentence, IEnumerable <Entity> entities, ref int currentContentIndex) { var result = new Sentence(textSentence.GetFullStartPosition() + GetFullContentsLength(currentContentIndex)); if (entities.Any()) { var entityWithLang = entities.FirstOrDefault(_ => !string.IsNullOrEmpty(_.Language)); if (entityWithLang != null) { result.Language = entityWithLang.Language; } else { result.Language = "EN"; } } var subSentence = CreateSubSentenceObject(result, textSentence, entities, GetContentIndexesInsideSentence(textSentence, ref currentContentIndex)); result.SetSubSentencesHeirarchy(new SubSentence[] { subSentence }); return(result); }
/// <summary> /// Проставка для предложений статуса обработки /// </summary> //---WTF-?!?!?!---// /* * public void SetSentenceProcessedStatus() * { * * foreach ( var paragraph in _paragraphs ) * { * foreach ( var sentence in paragraph.Sentences ) * { * if ( GarbageSentenceSelector.IsGarbage( sentence.Text ) ) * { * sentence.IsProcessed = false; * } * } * } * * }*/ /// <summary> /// Объединение предложений по сущностям /// </summary> /// <param name="entities">сущности</param> public void UnionSentencesByEntities(IEnumerable <Entity> entities) { var sentenceItems = GetSentencesItems(entities); int sentenceIndex = 0; List <Paragraph> paragraphs = new List <Paragraph>(_paragraphs.Length); TextParsing.Sentence nextParagraphSentence = null; for (int i = 0; i < _paragraphs.Length; ++i) { var paragraphSentenceItems = sentenceItems.GetRange(sentenceIndex, _paragraphs[i].Sentences.Length); sentenceIndex += _paragraphs[i].Sentences.Length; var sentences = GetParagraphSentences(paragraphSentenceItems, _paragraphs[i].StartPosition); var paragraph = GetParagraph(_paragraphs[i], nextParagraphSentence, sentences.Item1, sentences.Item2); nextParagraphSentence = sentences.Item2; if (paragraph != null) { paragraphs.Add(paragraph); } } _paragraphs = paragraphs.ToArray(); }
/// <summary> /// Получение параграфа /// </summary> /// <param name="paragraph">текущий параграф</param> /// <param name="firstSentence">первое предложение текущего параграфа (наследуется от предыдущего)</param> /// <param name="sentences">предложения текущего параграфа</param> /// <param name="nextParagraphSentence">предложение следующего параграфа</param> /// <returns>параграф</returns> private Paragraph GetParagraph( Paragraph paragraph, TextParsing.Sentence firstSentence, TextParsing.Sentence[] sentences, TextParsing.Sentence nextParagraphSentence) { Paragraph result = paragraph; if ((firstSentence != null) || (nextParagraphSentence != null)) { int startPosition = paragraph.StartPosition; string paragraphText = string.Empty; if (firstSentence != null) { if (sentences.Any()) { sentences[0].InsertPreviousPart(firstSentence); startPosition = firstSentence.Parent.StartPosition + firstSentence.StartPosition; } else { nextParagraphSentence.InsertPreviousPart(firstSentence); } } if (!sentences.Any()) { result = null; } else { result = new Paragraph(sentences.GetText(), startPosition); } } if (result != null) { result.SetSentences(sentences); } return(result); }
/// <summary> /// Получение коллекции элементов подпредложения /// </summary> /// <param name="parentSubSentence">объект подпредложения</param> /// <param name="textSentence">текстовое подпредложение</param> /// <param name="entities">список сущностей</param> /// <returns>коллекция элементов предложения</returns> private List <UnitTextBase> GetSubSentenceUnits(SubSentence parentSubSentence, TextParsing.Sentence textSentence, IEnumerable <Entity> entities) { List <UnitTextBase> result = new List <UnitTextBase>(); int currentTextPos = 0; foreach (Entity entity in entities) { entity.MovePosition(textSentence.GetFullStartPosition()); if (entity.PositionInfo.Start != 0) { result.Add(SetParentSubSentence(CreateUnmarkedText(textSentence.Text, currentTextPos, entity.PositionInfo.Start), parentSubSentence)); } result.Add(SetParentSubSentence(entity, parentSubSentence)); currentTextPos = entity.PositionInfo.End; } if (currentTextPos < textSentence.Text.Length) { result.Add(SetParentSubSentence(CreateUnmarkedText(textSentence.Text, currentTextPos, textSentence.Text.Length), parentSubSentence)); } return(result); }