Add a statement about common issues with pdfextract

This commit is contained in:
Lucas Verney 2016-01-19 17:58:24 +01:00
parent e9d7f3ad78
commit 65967cfa96

View File

@ -86,14 +86,21 @@ def pdfextract(pdf_file):
``gem install pdf-extract``, provided that you have a correct \ ``gem install pdf-extract``, provided that you have a correct \
Ruby install on your system. Ruby install on your system.
.. note::
``pdfextract`` is full a bugs and as the time of writing this, \
you had to manually ``gem install pdf-reader -v 1.2.0`` \
before installing ``pdfextract`` or you would get errors. See \
`this Github issue <https://github.com/CrossRef/pdfextract/issues/23>`_.
:param pdf_file: Path to the PDF file to handle. :param pdf_file: Path to the PDF file to handle.
:returns: Raw output from ``pdfextract`` or ``None`` if an error \ :returns: Raw output from ``pdfextract`` or ``None`` if an error \
occurred. No post-processing is done. See \ occurred. No post-processing is done. See \
``libbmc.citations.pdf.pdfextract_dois`` for a similar function \ ``libbmc.citations.pdf.pdfextract_dois`` for a similar function \
with post-processing to return DOIs. with post-processing to return DOIs.
""" """
# Run pdf-extract
try: try:
# Run pdf-extract
references = subprocess.check_output(["pdf-extract", references = subprocess.check_output(["pdf-extract",
"extract", "--references", "extract", "--references",
pdf_file]) pdf_file])