Skip to content

kurapica/doc2web

 
 

Repository files navigation

alt text

Free your document

Word to html converter engine (work in progress)

Goals of the project

This project is an attempt to create an lazy, extensible, cross-platform and high performance wordprocessingML (open xml) to html converter.

We will not accept pull requests until we reach version 1.0.

Doc2web is lazy

Doc2web will only gather the minimum text and css for the conversion. If you are converting a single paragraph you should expect a slim html, even if it's a 200 page's document that weights 5mb.

Doc2web is extensible

Doc2web provide a simple plugin system that allows any developer to add virtual nodes.

These nodes will be then converted in tags and the mutations will be applies to the output. All the hard work is done for you, you just have to describe the result that you want and Doc2web will give valid html.

Doc2web is cross platform

Doc2web leverage the new .NET STANDARD 2.0 which is supported on .NET CORE 2.0 and .NET 4.6.1, Mono 5.4, Xamarin iOS 10.14 Xamarin Mac 3.8 and Xamarin Android 7.5.

Doc2web is fast

Doc2web is build for real time. Lazy mechanism and efficient cpu cycles and memory manage is at the core of this project goals. You can render open xml in parallel using the right parameter.

Our first benchmarks (using an Ryzen 7 1800x) converts a 42 pages document with complex styling and numbering while highlighting all camel case keywords under 120ms. Our second benchmarks (using an Ryzen 7 1800x) converts a 260 pages transaction agreement with simple OpenXML under 130ms.


Roadmap 1.0 (2018 Q4)

  • Core
    • Virtual nodes
      • Tag, style and attributes
      • Tag optimization and rendering
  • Implemented plugins
    • Text processing
      • Paragraphs
      • Runs
      • Tabulation configuration
      • Break/tabs/hypen character insertions
    • Numbering
      • Roman, letters, ordinal, etc.
      • Indentation
      • Styling (theme and inline)
    • Styling
      • Media query
      • Dynamic styling
      • Paragraph styling
      • Run styling
      • Interconnected styling
      • OpenXML Properties support
        • Bold
        • Borders
        • Caps
        • Color
        • Font size
        • Highlighting
        • Indentation (responsive)
        • Italic
        • Justification
        • Run fonts
        • Spacing
        • Small caps
        • Vanish
        • Underline
    • Track changes
      • Inserted/Deleted content
      • Move to/Move from content
      • Inserted/Deleted numbering
      • Changed numbering via changed paragraph
      • Legacy changed numbering support
    • Comments
    • Tables
    • Table of content
  • Benchmarks
    • Conversion
    • Conversion with pascal case highlighting
    • Rendering
    • Styling
    • Numbering
    • Comparing against OpenXmlPowerTools
  • CLI Tool
    • Convert documents
    • Verbose, debug and parallelism options
    • Crash tests
    • Search for keywords in documents
  • Documentation
    • XML documentation
    • How to use the C# Api
    • How to extend with plugins
    • Benchmark and performance breakdown
    • Doc2web vs other tools
    • Contribution guide
    • Plugin samples
    • Github pages
  • Other
    • Coverage > 90%
    • Continuous integration
    • NuGet package publicly available (pre-release)
    • Public and easy docker container with CLI/Benchmark

About

Word to html converter engine.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%