Improved doc

This commit is contained in:
Phyks 2014-04-26 15:32:34 +02:00
parent ffdbbedfbb
commit 6e18c16010
2 changed files with 57 additions and 27 deletions

View File

@ -1,21 +1,22 @@
BiblioManager BiblioManager
============= =============
BiblioManager is a simple script to download and store your articles. This is mostly based upon [the paperbot fork from a3nm](https://github.com/a3nm/paperbot). BiblioManager is a simple script to download and store your articles. Read on if you want more info :)
**Note :** This script is currently a work in progress. **Note :** This script is currently a work in progress.
## What is BiblioManager (or what it is **not**) ? ## What is BiblioManager (or what it is **not**) ?
I used to have a folder with poorly named papers and books and wanted something to help me handle it. I don't like Mendeley and Zotero and so on, which are heavy and overkill for my needs. I just want to feed a script with PDF files of papers and books, and I want it to automatically maintain a BibTeX index of these files, to help me cite them and find them back. I used to have a folder with poorly named papers and books and wanted something to help me handle it. I don't like Mendeley and Zotero and so on, which are heavy and overkill for my needs. I just want to feed a script with PDF files of papers and books, or URLs to PDF files, and I want it to automatically maintain a BibTeX index of these files, to help me cite them and find them back. Then, I want it to give me a way to easily retrieve a file, either by author, by title or with some other search method, and give me the associated bibtex entry.
This is the goal of BiblioManager. It will : This is the goal of BiblioManager. This script can :
* Download or import PDF/Djvu files * Download or import PDF/Djvu files
* Try to get automatically the metadata of the files (keywords, author, review, …) * Try to get automatically the metadata of the files (keywords, author, review, …)
* Store all the metadata in a BibTex file * Store all the metadata in a BibTex file
* Rename your files to store them in a logical and homogeneous way * Rename your files to store them in a logical and homogeneous way according to a user-defined mask
* Help you find them back * Help you find them back
* Give you directly the bibtex entry necessary to cite them * Give you directly the bibtex entry necessary to cite them
* Remove some of the watermarks included in those files (the front page with your ip address from IOP for instance)
BiblioManager will always use standard formats such as BibTeX, so that you can easily edit your library, export it and manage it by hand, even if you quit this software for any reason. BiblioManager will always use standard formats such as BibTeX, so that you can easily edit your library, export it and manage it by hand, even if you quit this software for any reason.
@ -31,49 +32,78 @@ Should be almost working and usable now, although still to be considered as **ex
## Installation ## Installation
TODO -- To be updated
* Clone this git repository where you want : `git clone https://github.com/Phyks/BMC`
* Install `requesocks` and `isbntools` _via_ Pypi
* Install `pdftotext` (provided by Xpdf) and `djvulibre` _via_ your package manager the way you want
* Copy `params.py.example` to `params.py` and customize it to fit your needs
Install pdfminer, pdfparanoia (via pip) and requesocks. ## Usage
Copy params.py.example as params.py and customize it.
Install pdftotext.
Install djvulibre to use djvu files.
Install isbntools with pip.
### To import an existing PDF / Djvu file
## Used source codes Run `./main.py import PATH_TO_FILE [article|book]`. `[article|book]` is an optional argument (article or book) to search only for DOI or ISBN and thus, speed up the import.
* [pdfparanoia](https://github.com/kanzure/pdfparanoia) : Watermark removal It will get automatically the bibtex entry corresponding to the document, and you will be prompted for confirmation. It will then copy the file to your papers dir, renaming it according to the specified mask in `params.py`.
* [paperbot](https://github.com/kanzure/paperbot) although my fetching of papers is way more basic
### To download a PDF / Djvu file
## License Run `./main.py download URL_TO_PDF [article|book]`, where `[article|book]` (article or book) is again a parameter to specify to search only for DOI or ISBN only, and thus speed up the import. The `URL_TO_PDF` parameter should be a direct link to the PDF file (meaning it should be the link to the pdf page, which may have an authentication portal and not the page with abstract on many publishers websites).
The script will try to download the file with the proxies specified in `params.py` until it manages to get the file, or runs out of available proxies.
It will get automatically the bibtex entry corresponding to the document, and you will be prompted for confirmation. It will then put the file in your papers dir, renaming it according to the specified mask in `params.py`.
### Delete an entry
Run `./main.py delete PARAM` where `PARAM` should be either a path to a paper file, or an ident in the bibtex index. This will remove the corresponding entry in the bibtex index, and will remove the file from your papers dir. Although it will prompt you for confirmation, there's no way to recover your file after deletion, so use with care.
### Search for an entry
TODO TODO
### List all entries
TODO
### Data storage
All your documents will be stored in the papers dir specified in `params.py`. All the bibtex entries will be added to the `index.bib` file. You should **not** add entries to this file (but you can edit existing entries without any problem), as this will break synchronization between documents in papers dir and the index. If you do so, you can rebuild the index fie with `./main.py rebuild`.
## License
```
* --------------------------------------------------------------------------------
* "THE NO-ALCOHOL BEER-WARE LICENSE" (Revision 42):
* Phyks (webmaster@phyks.me) wrote this file. As long as you retain this notice you
* can do whatever you want with this stuff (and you can also do whatever you want
* with this stuff without retaining it, but that's not cool...). If we meet some
* day, and you think this stuff is worth it, you can buy me a <del>beer</del> soda
* in return.
* Phyks
* ---------------------------------------------------------------------------------
```
## Inspiration ## Inspiration
Here are some sources of inspirations for this project :
* MPC * MPC
* http://en.dogeno.us/2010/02/release-a-python-script-for-organizing-scientific-papers-pyrenamepdf-py/ * http://en.dogeno.us/2010/02/release-a-python-script-for-organizing-scientific-papers-pyrenamepdf-py/
* [Bibsoup](http://openbiblio.net/2012/02/09/bibsoup-beta-released/) * [Bibsoup](http://openbiblio.net/2012/02/09/bibsoup-beta-released/)
* [Paperbot](https://github.com/kanzure/paperbot)
## Ideas, TODO ## Ideas, TODO
A list of ideas and TODO. Don't hesitate to give feedback on the ones you really want or to propose your owns. A list of ideas and TODO. Don't hesitate to give feedback on the ones you really want or to propose your owns.
* pdfparanoia to remove the watermarks on pdf files * Open
* Confirmation for deletion
* Rebuild
* Remove the watermarks on pdf files : First page of IOP publishing articles
* Webserver interface * Webserver interface
* Various re.compile ? * Various re.compile ?
* check output of subprocesses before it ends * check output of subprocesses before it ends
* Split main.py * Split main.py
* Categories * Categories
* Edit an entry instead of deleting it and adding it again
## Roadmap
* Working with local files
[x] Import
[x] Deletion
[ ] Update ?
* Get distant files
* cf paperbot
* Search engine / list

View File

@ -195,14 +195,14 @@ def getExtension(filename):
def checkBibtex(filename, bibtex): def checkBibtex(filename, bibtex):
print("The bibtex entry found for "+filename+" is :") print("The bibtex entry found for "+filename+" is :")
print(bibtex)
check = rawInput("Is it correct ? [Y/n] ")
bibtex = StringIO(bibtex) bibtex = StringIO(bibtex)
bibtex = BibTexParser(bibtex, customization=homogeneize_latex_encoding) bibtex = BibTexParser(bibtex, customization=homogeneize_latex_encoding)
bibtex = bibtex.get_entry_dict() bibtex = bibtex.get_entry_dict()
bibtex_name = bibtex.keys()[0] bibtex_name = bibtex.keys()[0]
bibtex = bibtex[bibtex_name] bibtex = bibtex[bibtex_name]
print(parsed2Bibtex(bibtex))
check = rawInput("Is it correct ? [Y/n] ")
while check.lower() == 'n': while check.lower() == 'n':
fields = [u'type', u'id'] + [i for i in sorted(bibtex) fields = [u'type', u'id'] + [i for i in sorted(bibtex)