This is kind of a “daemon” mode for Pandoc, enabling one to convert single lines of LaTeX to plaintext output using Pandoc with a single long-running process.
Dissemin project was looking for a way to batch-convert LaTeX titles to plaintext, efficiently (as it was supposed to be running on a few millions of lines). The only reliable solution for such a conversion out there is Pandoc, but Pandoc does not work in a “daemon” mode, thus requiring to spawn one process per line to process. This was way too much (and due to the Dissemin pipeline, we could not do a single batch conversion on many lines).
This is an attempt at solving this issue. It exposes a long running process which you can communicate with via sockets.
Once the Haskell stack has been loaded and initialized, it takes around 1ms to process a typical line (scientific article title).
To build and use this code, you will need to build and install
libpandoc. You also need an up to
date install of Pandoc.
You can then clone this repo and build the main code:
gcc main.c -lpandoc -o latex2plain
This code is released under GPLv2 or later, which is also the license for