Add some doc, especially about external dependencies

This commit is contained in:
Lucas Verney 2016-01-19 18:17:12 +01:00
parent 65967cfa96
commit 9019833dbb
5 changed files with 57 additions and 5 deletions

3
.gitmodules vendored
View File

@ -1,6 +1,3 @@
[submodule "libbmc/external/opendetex"]
path = libbmc/external/opendetex
url = https://github.com/Phyks/opendetex
[submodule "libbmc/external/poppler"]
path = libbmc/external/poppler
url = git://git.freedesktop.org/git/poppler/poppler

43
README.md Normal file
View File

@ -0,0 +1,43 @@
libBMC
======
A generic Python library to manage bibliography and play with scientific
papers.
_Note_: This library is written for Python 3 and may not work with Python 2.
This is not a major priority for me, but if anyone needed to make it work with
Python 2 and want to make a PR, I will happily merge it :)
## Dependencies
Python dependencies are listed in the `requirements.txt` file at the root of
this repo, and can be installed with `pip install -r requirements.txt`.
External dependencies are [OpenDeTeX](https://code.google.com/p/opendetex/)
(an improved version of DeTeX) and the `pdftotext` and `djvutxt` programs.
OpenDeTeX is available as a Git submodule in the `libbmc/external` folder. If
you do not have it installed system-wide, you can use the following steps to
build it in this repo and the library will use it:
* `git submodule init; git submodule update` to initialize the Git submodules.
* `cd libbmc/external/opendetex; make` to build OpenDeTeX (see `INSTALL` file
in the same folder for more info, you will need `make`, `gcc` and `flex` to
build it).
OpenDeTeX is used to get references from a `.bbl` file (or directly from arXiv
as it uses the same pipeline).
`pdftotext` and `djvutxt` should be available in the packages of your
distribution and should be installed systemwide. Both are used to extract
identifiers from papers PDF files.
If you plan on using the `libbmc.citations.pdf` functions, you should also
install the matching software (`CERMINE`, `Grobid` or `pdf-extract`). See the
docstrings of those functions for more infos on this particular point.

View File

@ -25,6 +25,12 @@ def bibitem_as_plaintext(bibitem):
This plaintext representation can be super ugly, contain URLs and so \
on.
.. note::
You need to have ``delatex`` installed system-wide, or to build it in \
this repo, according to the ``README.md`` before using this \
function.
:param bibitem: The text content of the bibitem.
:returns: A cleaned plaintext citation from the bibitem.
"""

View File

@ -68,7 +68,7 @@ def get_cited_DOIs(bibtex):
BibTeX file.
:returns: A dict of cleaned plaintext citations and their associated DOI.
"""
# Get the plaintext citations from the bbl file
# Get the plaintext citations from the bibtex file
plaintext_citations = get_plaintext_citations(bibtex)
# Use the plaintext citations parser on these citations
return plaintext.get_cited_DOIs(plaintext_citations)

View File

@ -20,6 +20,12 @@ def find_identifiers(src):
likely to be relevant for this file. However, it may fail and return an
identifier taken from the references or another paper.
.. note::
You will need to have ``pdftotext`` and/or ``djvutxt`` installed \
system-wide before processing files with this function.
:params src: Path to the file to scan.
:returns: a tuple (type, identifier) or ``None`` if not found or \