|Phyks (Lucas Verney) 96a85feec0 Merge branch 'master' of https://github.com/Phyks/BMC||4 years ago|
|libbmc||4 years ago|
|.gitignore||5 years ago|
|.gitmodules||7 years ago|
|.travis.yml||5 years ago|
|LICENSE||4 years ago|
|README.md||4 years ago|
|bmc.py||4 years ago|
|setup.py||5 years ago|
BiblioManager is a simple script to download and store your articles. Read on if you want more info :)
Note : This script is currently a work in progress.
Note: If you want to extract some functions from this repo, please consider using libbmc instead, which is specifically dedicated to this (and this repo should be using it, rather than duplicating code).
I used to have a folder with poorly named papers and books and wanted something to help me handle it. I don’t like Mendeley and Zotero and so on, which are heavy and overkill for my needs. I just want to feed a script with PDF files of papers and books, or URLs to PDF files, and I want it to automatically maintain a BibTeX index of these files, to help me cite them and find them back. Then, I want it to give me a way to easily retrieve a file, either by author, by title or with some other search method, and give me the associated bibtex entry.
This is the goal of BiblioManager. This script can :
BiblioManager will always use standard formats such as BibTeX, so that you can easily edit your library, export it and manage it by hand, even if you quit this software for any reason.
Should be almost working and usable now, although still to be considered as experimental. It can be broken at any commit and not repaired for a few days. I will update this when I will have a version that I can consider to be “stable”.
Important note : I use it for personal use, but I don’t read articles from many journals. If you find any file which is not working, please fill an issue or send me an e-mail with the relevant information. There are alternative ways to get the metadata for example, and I didn’t know really which one was the best one as writing this code. Please do backups regularly if using this. I could not be held responsible for any loss of papers.
Error reporting : If you have any issue with this script, please report error. If possible, send me the article responsible for the error, or at least give me the reference so that I can test and debug easily.
git clone https://github.com/Phyks/BMC
isbnlibvia Pypi (or better, in a virtualenv, or using your package manager, according to your preferences)
sudo pip install arxiv2bib PySocks bibtexparser pyPDF2 isbnlib(this script should be compatible with Python 2 and Python 3)
pdftotext(provided by Xpdf) and
djvulibrevia your package manager or the way you want
python setup.py install.
~/.config/bmc/bmc.jsonaccording to your needs. A documentation of the available options can be found in file
Note: To update the script, just run
git pull in the script dir.
./bmc.py import PATH_TO_FILE [article|book].
[article|book] is an optional argument (article or book) to search only for DOI or ISBN and thus, speed up the import.
It will get automatically the bibtex entry corresponding to the document, and you will be prompted for confirmation. It will then copy the file to your papers dir, renaming it according to the specified mask in
./bmc.py download URL_TO_PDF [article|book], where
[article|book] (article or book) is again a parameter to specify to search only for DOI or ISBN only, and thus speed up the import. The
URL_TO_PDF parameter should be a direct link to the PDF file (meaning it should be the link to the pdf page, which may have an authentication portal and not the page with abstract on many publishers websites).
The script will try to download the file with the proxies specified in
~/.config/bmc/bmc.json until it manages to get the file, or runs out of available proxies.
It will get automatically the bibtex entry corresponding to the document, and you will be prompted for confirmation. It will then put the file in your papers dir, renaming it according to the specified mask in
./bmc.py delete PARAM where
PARAM should be either a path to a paper file, or an ident in the bibtex index. This will remove the corresponding entry in the bibtex index, and will remove the file from your papers dir. Although it will prompt you for confirmation, there’s no way to recover your file after deletion, so use with care.
Note : There is currently no search engine implemented. I will first focus on stabilizing the script, and will implement it later. The
search.py file is not functional as of today and is only there to present a rough idea of what I expect the search engine to be. Ideally, it should understand complex expressions like
(author=foo or title=bar) or year=1111. However, in the meantime, you can
grep the generated
index.bib file to have basic search features.
./bmc.py list to list all the papers in your paper folder.
./bmc.py edit PARAM where
PARAM should be either a path to a paper file or an ident in the bibtex index. This will open a text editor to edit the corresponding bibtex entry.
./bmc.py update to look for available updated versions of your arXiv papers. You can use the optionnal
--entries ID argument (where ID is either a bibtex index identifier or a filename) to search only for a limited subset of papers.
When you import a long article without any DOI or ISBN, the script will process the whole file before finding out that there is no such information. This can take a while for long articles, and you may feel the script has entered an infinite loop. If you think it’s taking too long, you can
^C and you will be dropped to manual entry of bibtex infos.
All your documents will be stored in the papers dir specified in
~/.config/bmc/bmc.json. All the bibtex entries will be added to the
index.bib file. You should not add entries to this file (but you can edit existing entries without any problem), as this will break synchronization between documents in papers dir and the index. If you do so, you can resync the index file with
The resync option will check that all bibtex entries have a corresponding file and all file have a corresponding bibtex entry. It will prompt you what to do for unmatched entries.
Unittests are available for all the files in the
lib/. You can simply run the tests using
nosetests. Builds are run after each commit on Travis.
All the source code I wrote is under a
no-alcohol beer-ware license. All functions that I didn’t write myself are under the original license and their origin is specified in the function itself.
* -------------------------------------------------------------------------------- * "THE NO-ALCOHOL BEER-WARE LICENSE" (Revision 42): * Phyks (email@example.com) wrote this file. As long as you retain this notice you * can do whatever you want with this stuff (and you can also do whatever you want * with this stuff without retaining it, but that's not cool...). If we meet some * day, and you think this stuff is worth it, you can buy me a <del>beer</del> soda * in return. * Phyks * ---------------------------------------------------------------------------------
Here are some sources of inspirations for this project :
A list of ideas and TODO. Don’t hesitate to give feedback on the ones you really want or to propose your owns.
tests/srcare under CC-BY license, from arXiv, HAL, New Journal of Physics and PhysRev.
test_watermark.pdffile originally had a first blank page, which is supposed to be teared down. For this test, I just duplicated the first page, as the original first page contained personnal information.