Skip to content

SergiyStoyan/PdfDocumentParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PdfDocumentParser

PdfDocumentParser is a parsing engine intended to find and extract text/images from PDF documents that conform to predictable graphic layouts - such as reports, bills, forms, tickets and the like. Its parsing approach is based on finding certain text or image fragments in page and then extracting text/images located relatively to those fragments.

PdfDocumentParser does all the tricky job of building parsing templates, search, recognition and extraction, thus, leaving you only to code a custom logic.

PdfDocumentParser is a .NET DLL.

For a usage example / framework refer to SampleParser project in the repository.

Known issues

  • because it is WinForm, GUI may appear mangled in UHD display (or otherwise in FHD, depending on version). Don't be afraid: you can open it in VS and tune for your resolution. WPF version is in stalled developement...

Documentation

Here

Support

Contact me if you want me to enhance PdfDocumentParser. Also, you can hire me for solving a parsing task of any complexity or for general development.