Convert LaTeX to plaintext using a long-running Pandoc process.
Phyks (Lucas Verney) 8a01dbcbb2 Fix identation 2 years ago
.gitignore Initial commit 2 years ago
README.md Initial commit 2 years ago
main.cpp Fix identation 2 years ago

README.md

LaTeX2Plain

This is kind of a “daemon” mode for Pandoc, enabling one to convert single lines of LaTeX to plaintext output using Pandoc with a single long-running process.

Dissemin project was looking for a way to batch-convert LaTeX titles to plaintext, efficiently (as it was supposed to be running on a few millions of lines). The only reliable solution for such a conversion out there is Pandoc, but Pandoc does not work in a “daemon” mode, thus requiring to spawn one process per line to process. This was way too much (and due to the Dissemin pipeline, we could not do a single batch conversion on many lines).

This is an attempt at solving this issue. It exposes a long running process which you can communicate with via sockets.

Once the Haskell stack has been loaded and initialized, it takes around 1ms to process a typical line (scientific article title).

Installation

To build and use this code, you will need to build and install libpandoc. You also need an up to date install of Pandoc.

You can then clone this repo and build the main code:

gcc main.c -lpandoc -o latex2plain

Usage

TODO

License

This code is released under GPLv2 or later, which is also the license for libpandoc and Pandoc.