Browse Source

Add some doc, especially about external dependencies

Phyks (Lucas Verney) 3 years ago
parent
commit
9019833dbb
5 changed files with 57 additions and 5 deletions
  1. 0
    3
      .gitmodules
  2. 43
    0
      README.md
  3. 6
    0
      libbmc/citations/bbl.py
  4. 1
    1
      libbmc/citations/bibtex.py
  5. 7
    1
      libbmc/papers/identifiers.py

+ 0
- 3
.gitmodules View File

@@ -1,6 +1,3 @@
1 1
 [submodule "libbmc/external/opendetex"]
2 2
 	path = libbmc/external/opendetex
3 3
 	url = https://github.com/Phyks/opendetex
4
-[submodule "libbmc/external/poppler"]
5
-	path = libbmc/external/poppler
6
-	url = git://git.freedesktop.org/git/poppler/poppler

+ 43
- 0
README.md View File

@@ -0,0 +1,43 @@
1
+libBMC
2
+======
3
+
4
+A generic Python library to manage bibliography and play with scientific
5
+papers.
6
+
7
+
8
+_Note_: This library is written for Python 3 and may not work with Python 2.
9
+This is not a major priority for me, but if anyone needed to make it work with
10
+Python 2 and want to make a PR, I will happily merge it :)
11
+
12
+
13
+## Dependencies
14
+
15
+Python dependencies are listed in the `requirements.txt` file at the root of
16
+this repo, and can be installed with `pip install -r requirements.txt`.
17
+
18
+
19
+External dependencies are [OpenDeTeX](https://code.google.com/p/opendetex/)
20
+(an improved version of DeTeX) and the `pdftotext` and `djvutxt` programs.
21
+
22
+
23
+OpenDeTeX is available as a Git submodule in the `libbmc/external` folder. If
24
+you do not have it installed system-wide, you can use the following steps to
25
+build it in this repo and the library will use it:
26
+
27
+* `git submodule init; git submodule update` to initialize the Git submodules.
28
+* `cd libbmc/external/opendetex; make` to build OpenDeTeX (see `INSTALL` file
29
+  in the same folder for more info, you will need `make`, `gcc` and `flex` to
30
+  build it).
31
+
32
+OpenDeTeX is used to get references from a `.bbl` file (or directly from arXiv
33
+as it uses the same pipeline).
34
+
35
+
36
+`pdftotext` and `djvutxt` should be available in the packages of your
37
+distribution and should be installed systemwide. Both are used to extract
38
+identifiers from papers PDF files.
39
+
40
+
41
+If you plan on using the `libbmc.citations.pdf` functions, you should also
42
+install the matching software (`CERMINE`, `Grobid` or `pdf-extract`). See the
43
+docstrings of those functions for more infos on this particular point.

+ 6
- 0
libbmc/citations/bbl.py View File

@@ -25,6 +25,12 @@ def bibitem_as_plaintext(bibitem):
25 25
         This plaintext representation can be super ugly, contain URLs and so \
26 26
         on.
27 27
 
28
+    .. note::
29
+
30
+        You need to have ``delatex`` installed system-wide, or to build it in \
31
+                this repo, according to the ``README.md`` before using this \
32
+                function.
33
+
28 34
     :param bibitem: The text content of the bibitem.
29 35
     :returns: A cleaned plaintext citation from the bibitem.
30 36
     """

+ 1
- 1
libbmc/citations/bibtex.py View File

@@ -68,7 +68,7 @@ def get_cited_DOIs(bibtex):
68 68
             BibTeX file.
69 69
     :returns: A dict of cleaned plaintext citations and their associated DOI.
70 70
     """
71
-    # Get the plaintext citations from the bbl file
71
+    # Get the plaintext citations from the bibtex file
72 72
     plaintext_citations = get_plaintext_citations(bibtex)
73 73
     # Use the plaintext citations parser on these citations
74 74
     return plaintext.get_cited_DOIs(plaintext_citations)

+ 7
- 1
libbmc/papers/identifiers.py View File

@@ -14,12 +14,18 @@ def find_identifiers(src):
14 14
     """
15 15
     Search for a valid identifier (DOI, ISBN, arXiv, HAL) in a given file.
16 16
 
17
-    .. note ::
17
+    .. note::
18 18
 
19 19
         This function returns the first matching identifier, that is the most
20 20
         likely to be relevant for this file. However, it may fail and return an
21 21
         identifier taken from the references or another paper.
22 22
 
23
+    .. note::
24
+
25
+        You will need to have ``pdftotext`` and/or ``djvutxt`` installed \
26
+                system-wide before processing files with this function.
27
+
28
+
23 29
     :params src: Path to the file to scan.
24 30
 
25 31
     :returns: a tuple (type, identifier) or ``None`` if not found or \