C# (CSharp) Xtractor.Xtractor.ExtractTextWithLocation Exemples

Langage de programmation: C# (CSharp)

Méthode/Fonction: ExtractTextWithLocation

Exemples au hotexamples.com: 2

C# (CSharp) Xtractor.Xtractor.ExtractTextWithLocation - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de Xtractor.Xtractor.ExtractTextWithLocation extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

CountInstances(2)

ExtractImagesToFile(2)

ExtractTextWithLocation(2)

ExtractText(2)

GetPageSize(2)

ExtractImages(1)

ExtractImageToFile(1)

ExtractTextAbove(1)

ExtractTextBelow(1)

ExtractTextLeft(1)

Dispose(1)

ExtractTextToFile(1)

ExtractImage(1)

ExtractTextWithin(1)

FindAllByRegex(1)

FindText(1)

GetBookmarksAsJson(1)

GetBookmarksAsTree(1)

GetDocumentInfo(1)

ExtractTextRight(1)

Méthodes fréquemment utilisées

CountInstances (2)

ExtractImagesToFile (2)

ExtractTextWithLocation (2)

ExtractText (2)

GetPageSize (2)

ExtractImages (1)

ExtractImageToFile (1)

ExtractTextAbove (1)

ExtractTextBelow (1)

ExtractTextLeft (1)

Méthodes fréquemment utilisées

Dispose (1)

ExtractTextToFile (1)

ExtractImage (1)

ExtractTextWithin (1)

FindAllByRegex (1)

FindText (1)

GetBookmarksAsJson (1)

GetBookmarksAsTree (1)

GetDocumentInfo (1)

ExtractTextRight (1)

Associées

Amazon__BL

TsName

CsvDocumentRenderer

BusLines

CustomFieldOption

GameNotificationType

SendKeyInput.VKKeys

FinalQuestion

FeedbackDate

FeedbackData

Related in langs

field_info_extra_fields (PHP)

CometObservations (PHP)

RIL_CLIENT_LOCK (C++)

lowmem_oom_adj_to_oom_score_adj (C++)

RunFcgi (Go)

TstReadElement (Go)

FMOD_RESULT (Java)

UsageContext (Java)

getmass (Python)

convertdata (Python)

Exemple #1

0

Afficher le fichier

Fichier : ExtractTextWithLocation.cs Projet : ActivePDF/-Legacy-Xtractor

public static void Example() { // You have a few options, depending on how much detail you want. // Note that all of the "Xtractor.Xtractor.CoordinateOrigin" arguments are optional. using (Xtractor.Xtractor xtractor = new Xtractor.Xtractor(@"..\..\..\Input\Xtractor.Input.pdf")) { string searchPhrase = "ActivePDF"; // Gives back the bounding box of each occurrance of "ActivePDF" on page 1, // and results are relative to the top-left corner of the page. Console.WriteLine($"Retrieve the bounding box coordinates for all instances of {searchPhrase} on page one."); RectangleF[] page1BoundingBoxes = xtractor.FindText(text: searchPhrase, pageNumber: 1, origin: Xtractor.Xtractor.CoordinateOrigin.TopLeft); Console.WriteLine($"{page1BoundingBoxes.Length.ToString()} instance(s) of {searchPhrase} found on page 1."); foreach (RectangleF boundingBox in page1BoundingBoxes) { Console.WriteLine($" Box: ({boundingBox.X}, {boundingBox.Y}), ({boundingBox.X + boundingBox.Width}, {boundingBox.Y + boundingBox.Height})"); } Console.WriteLine(); // Gives back the bounding box of each occurrance of "ActivePDF" in the document. // Results are relative to the top-left corner of the page. // The first dimension of the array is sorted by page number, // so wholeDocumentBoundingBoxes[0] contains the same data as page1BoundingBoxes. Console.WriteLine($"Retrieve the bounding box coordinates for all instances of {searchPhrase} in the document."); RectangleF[][] wholeDocumentBoundingBoxes = xtractor.FindText(text: searchPhrase, origin: Xtractor.Xtractor.CoordinateOrigin.TopLeft); for (int i = 0; i < wholeDocumentBoundingBoxes.Length; ++i) { Console.WriteLine($"{wholeDocumentBoundingBoxes[i].Length.ToString()} instance(s) of {searchPhrase} found on page {i + 1}."); for (int j = 0; j < wholeDocumentBoundingBoxes[i].Length; ++j) { RectangleF boundingBox = wholeDocumentBoundingBoxes[i][j]; Console.WriteLine($" Box: ({boundingBox.X}, {boundingBox.Y}), ({boundingBox.X + boundingBox.Width}, {boundingBox.Y + boundingBox.Height})"); } } // Uses the regex @"\w+" to find all words on page 1. Gets back each word and location found. // Returned coordinates are given relative to the bottom left corner, in PDF units. Tuple <string, RectangleF>[] allWordsPage1 = xtractor.FindText(new Regex(@"\w+"), 1, Xtractor.Xtractor.CoordinateOrigin.BottomLeft); // Uses the regex @"\w+" to find all words in the document. Gets back each word and location found. // Returned coordinates are given relative to the bottom left corner, in PDF units. // allWordsWholeDocument[0] contains the same data as allWordsPage1. Tuple <string, RectangleF>[][] allWordsWholeDocument = xtractor.FindText(re: new Regex(@"\w+"), origin: Xtractor.Xtractor.CoordinateOrigin.BottomLeft); // Extracts the location of each individual character on page 1. Coordinates are relative to the top left corner. // The order of characters is in the PDF's order, which may or may not be in natural reading order. Xtractor.CharAndBox[] eachCharacterPage1 = xtractor.ExtractTextWithLocation(pageNumber: 1, origin: Xtractor.Xtractor.CoordinateOrigin.TopLeft); // Extracts the location of each individual character in the whole document. Coordinates are relative to the bottom left corner. // eachCharacterWholeDocument[0] contains the same characters in the same order as eachCharacterPage1, but the // coordinates returned will differ because they used different coordinate spaces when they were called. // Coordinates will match if they used the same coordinate space. // The order of characters is in the PDF's order, which may or may not be in natural reading order. Xtractor.CharAndBox[][] eachCharacterWholeDocument = xtractor.ExtractTextWithLocation(origin: Xtractor.Xtractor.CoordinateOrigin.BottomLeft); } }

Exemple #2

0

Afficher le fichier

public static void Example() { using (Xtractor.Xtractor xtractor = new Xtractor.Xtractor(filename: @"..\..\..\Input\Xtractor.Input.pdf")) { /* * PDF documents don't always store the desired reading order of the text. * Even if it does, the text is not required to be stored in the reading * order for that language. Some languages even have multiple acceptible * reading orders. Thus, Xtractor cannot guarantee getting text back in the * desired reading order for a given language. * * However, if you know what reading order you expect from your document, * it is still quite easy to get the desired result using LINQ. The example * below sorts the text for English, meaning top -> bottom first, * and left -> right second. */ Console.WriteLine("Extracting document text by reading order ..."); Xtractor.CharAndBox[] englishText = xtractor.ExtractTextWithLocation(pageNumber: 1, origin: Xtractor.Xtractor.CoordinateOrigin.BottomLeft); IEnumerable <Xtractor.CharAndBox> sortedText = englishText.OrderBy(cab => cab.Box.Y).ThenBy(cab => cab.Box.X); StringBuilder stringBuilder = new StringBuilder(); foreach (Xtractor.CharAndBox character in sortedText) { stringBuilder.Append(character.Character); } Console.WriteLine($"Document Text: {stringBuilder.ToString()}"); } }