Skip to content

miguelbandera/PdfDocumentParser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PdfDocumentParser

PdfDocumentParser is a parsing engine intended to find and extract text/images from PDF documents that conform to predictable graphic layouts - such as reports, bills, forms, tickets and the like. Its parsing approach is based on finding certain text or image fragments in page and then extracting text/images located relatively to those fragments.

PdfDocumentParser does all the tricky job of building parsing templates, search, recognition and extraction, thus, leaving you only to code a custom logic.

PdfDocumentParser is a .NET DLL.

For a sample of using PdfDocumentParser or a framework refer to SampleParser project in the repository.

Known issues

  • because it is WinForm, GUI may appear mangled in UHD display (or otherwise in FHD, depending on version). Don't be afraid: you can open it in VS and tune for your resolution. WPF version is in freezing developement...

Support

Contact me if you want me to enhance PdfDocumentParser. Also, you can hire me for solving a parsing task of any complexity or for general development.

About

PdfDocumentParser is a .NET toolset for building PDF parsers.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C# 94.6%
  • HTML 2.7%
  • JavaScript 1.9%
  • Other 0.8%