This is kind of a “daemon” mode for Pandoc, enabling one to convert single lines of LaTeX to plaintext output using Pandoc with a single long-running process.
Dissemin project was looking for a way to batch-convert LaTeX titles to plaintext, efficiently (as it was supposed to be running on a few millions of lines). The only reliable solution for such a conversion out there is Pandoc, but Pandoc does not work in a “daemon” mode, thus requiring to spawn one process per line to process. This was way too much (and due to the Dissemin pipeline, we could not do a single batch conversion on many lines).
This is an attempt at solving this issue. It exposes a long running process which you can communicate with via sockets.
Once the Haskell stack has been loaded and initialized, it takes around 1ms to process a typical line (scientific article title).
To build and use this code, you will need to build and install
libpandoc
. You also need an up to
date install of Pandoc.
You can then clone this repo and build the main code:
gcc main.c -lpandoc -o latex2plain
TODO
This code is released under GPLv2 or later, which is also the license for
libpandoc
and Pandoc
.