Merge branch 'master' of https://github.com/Phyks/BMC

Add link to libbmc
Merge pull request #33 from bcbnz/fixdoisearch
2016-01-10 17:59:41 +01:00 · 2016-01-10 17:59:23 +01:00 · 2015-12-07 19:08:20 +01:00 · 2015-12-07 15:39:57 +13:00 · 2015-09-05 16:55:14 +02:00 · 2015-09-05 16:49:56 +02:00
29 changed files with 533 additions and 304 deletions
--- a/.gitignore
+++ b/.gitignore
@ -8,3 +8,6 @@
 *.pdf
 *.bib
 *.djvu
+
+# build
+build/
--- a/.travis.yml
+++ b/.travis.yml
@ -1,12 +1,13 @@
 language: python
 python:
  - 2.7
+  - 3.3
 before_install:
  - sudo apt-get update
 # command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
 install:
  - pip install arxiv2bib
-  - pip install requesocks
+  - pip install PySocks
  - pip install pyPDF2
  - pip install tear-pages
  - pip install isbnlib
@ -14,7 +15,7 @@ install:
  - pip install coveralls
  - sudo apt-get install -qq poppler-utils
  - sudo apt-get install -qq djvulibre-bin
-    #  - python setup.py install
+  - python setup.py install
 # command to run tests, e.g. python setup.py test
 script:
  - nosetests
--- a/9
+++ b/9
@ -0,0 +1,9 @@
+* --------------------------------------------------------------------------------
+* "THE NO-ALCOHOL BEER-WARE LICENSE" (Revision 42):
+* Phyks (webmaster@phyks.me) wrote this file. As long as you retain this notice you
+* can do whatever you want with this stuff (and you can also do whatever you want
+* with this stuff without retaining it, but that's not cool...). If we meet some
+* day, and you think this stuff is worth it, you can buy me a <del>beer</del> soda
+* in return.
+*																		Phyks
+* ---------------------------------------------------------------------------------
--- a/README.md
+++ b/README.md
@ -5,6 +5,11 @@ BiblioManager is a simple script to download and store your articles. Read on if

 **Note :** This script is currently a work in progress.

+**Note: If you want to extract some functions from this repo, please consider using [libbmc](https://github.com/Phyks/libbmc/) instead, which is specifically dedicated to this (and this repo should be using it, rather than duplicating code).**
+
+
+Travis build status : [![Build Status](https://travis-ci.org/Phyks/BMC.svg?branch=master)](https://travis-ci.org/Phyks/BMC)
+
 ## What is BiblioManager (or what it is **not**) ?

 I used to have a folder with poorly named papers and books and wanted something to help me handle it. I don't like Mendeley and Zotero and so on, which are heavy and overkill for my needs. I just want to feed a script with PDF files of papers and books, or URLs to PDF files, and I want it to automatically maintain a BibTeX index of these files, to help me cite them and find them back. Then, I want it to give me a way to easily retrieve a file, either by author, by title or with some other search method, and give me the associated bibtex entry.
@ -56,12 +61,13 @@ Should be almost working and usable now, although still to be considered as **ex
 ```
 git clone https://github.com/Phyks/BMC
 ```
-* Install `arxiv2bib`, `tear-pages`, `requesocks`, `bibtexparser` (https://github.com/sciunto/python-bibtexparser), `PyPDF2` and `isbnlib` _via_ Pypi
+* Install `arxiv2bib`, `PySocks`, `bibtexparser` (https://github.com/sciunto/python-bibtexparser), `PyPDF2` and `isbnlib` _via_ Pypi (or better, in a virtualenv, or using your package manager, according to your preferences)
 ```
-sudo pip install arxiv2bib requesocks bibtexparser pyPDF2 isbnlib
+sudo pip install arxiv2bib PySocks bibtexparser pyPDF2 isbnlib
 ```
-(replace pip by pip2 if your distribution ships python3 by default)
+(this script should be compatible with Python 2 and Python 3)
 * Install `pdftotext` (provided by Xpdf) and `djvulibre` _via_ your package manager or the way you want
+* Install the script _via_ `python setup.py install`.
 * Run the script to initialize the conf in `~/.config/bmc/bmc.json`.
 * Customize the configuration by editing `~/.config/bmc/bmc.json` according to your needs. A documentation of the available options can be found in file `config.py`.
 * _Power users :_ Add your custom masks in `~/.config/bmc/masks.py`.
@ -117,6 +123,12 @@ All your documents will be stored in the papers dir specified in `~/.config/bmc/

 The resync option will check that all bibtex entries have a corresponding file and all file have a corresponding bibtex entry. It will prompt you what to do for unmatched entries.

+
+## Unittests
+
+Unittests are available for all the files in the `lib/`. You can simply run the tests using `nosetests`. Builds are run after each commit on [Travis](https://travis-ci.org/Phyks/BMC).
+
+
 ## License

 All the source code I wrote is under a `no-alcohol beer-ware license`. All functions that I didn't write myself are under the original license and their origin is specified in the function itself.
@ -132,7 +144,6 @@ All the source code I wrote is under a `no-alcohol beer-ware license`. All funct
 * ---------------------------------------------------------------------------------
 ```

-I used the `tearpages.py` script from sciunto, which can be found [here](https://github.com/sciunto/tear-pages) and is released under a GNU GPLv3 license.

 ## Inspiration

@ -147,8 +158,6 @@ Here are some sources of inspirations for this project :

 A list of ideas and TODO. Don't hesitate to give feedback on the ones you really want or to propose your owns.

-60. Unittest
-70. Python3 compatibility ?
 80. Search engine
 85. Anti-duplicate ?
 90. Look for published version in arXiv
@ -160,6 +169,7 @@ A list of ideas and TODO. Don't hesitate to give feedback on the ones you really
 * Nathan Grigg for his [arxiv2bib](https://pypi.python.org/pypi/arxiv2bib/1.0.5#downloads) python module
 * François Boulogne for his [python-bibtexparser](https://github.com/sciunto/python-bibtexparser) python module and his integration of new requested features
 * pyparsing [search parser example](http://pyparsing.wikispaces.com/file/view/searchparser.py)
+* François Boulogne (@sciunto) for his (many) contributions to this software !

 ## Note on test files

--- a/bmc.py
+++ b/bmc.py
@ -1,19 +1,21 @@
-#!/usr/bin/env python2
+#!/usr/bin/env python
 # -*- coding: utf8 -*-

+from __future__ import unicode_literals
+
 import argparse
 import os
 import shutil
 import subprocess
 import sys
 import tempfile
-import backend
-import fetcher
-import tearpages
-import tools
-from bibtexparser.bparser import BibTexParser
+import bibtexparser
 from codecs import open
-from config import Config
+from libbmc.config import Config
+from libbmc import backend
+from libbmc import fetcher
+from libbmc import tearpages
+from libbmc import tools


 config = Config()
@ -23,23 +25,24 @@ EDITOR = os.environ.get('EDITOR') if os.environ.get('EDITOR') else 'vim'
 def checkBibtex(filename, bibtex_string):
    print("The bibtex entry found for "+filename+" is:")

-    bibtex = BibTexParser(bibtex_string)
-    bibtex = bibtex.get_entry_dict()
+    bibtex = bibtexparser.loads(bibtex_string)
+    bibtex = bibtex.entries_dict
    try:
-        bibtex = bibtex[bibtex.keys()[0]]
+        bibtex = bibtex[list(bibtex.keys())[0]]
        # Check entries are correct
-        assert bibtex['title']
-        if bibtex['type'] == 'article':
-            assert bibtex['authors']
-        elif bibtex['type'] == 'book':
-            assert bibtex['author']
-        assert bibtex['year']
+        if "title" not in bibtex:
+            raise AssertionError
+        if "authors" not in bibtex and "author" not in bibtex:
+            raise AssertionError
+        if "year" not in bibtex:
+            raise AssertionError
        # Print the bibtex and confirm
        print(tools.parsed2Bibtex(bibtex))
        check = tools.rawInput("Is it correct? [Y/n] ")
    except KeyboardInterrupt:
        sys.exit()
-    except (KeyError, AssertionError):
+    except (IndexError, KeyError, AssertionError):
+        print("Missing author, year or title in bibtex.")
        check = 'n'

    try:
@ -49,16 +52,16 @@ def checkBibtex(filename, bibtex_string):

    while check.lower() == 'n':
        with tempfile.NamedTemporaryFile(suffix=".tmp") as tmpfile:
-            tmpfile.write(bibtex_string)
+            tmpfile.write(bibtex_string.encode('utf-8'))
            tmpfile.flush()
            subprocess.call([EDITOR, tmpfile.name])
            tmpfile.seek(0)
-            bibtex = BibTexParser(tmpfile.read()+"\n")
+            bibtex = bibtexparser.loads(tmpfile.read().decode('utf-8')+"\n")

-        bibtex = bibtex.get_entry_dict()
+        bibtex = bibtex.entries_dict
        try:
-            bibtex = bibtex[bibtex.keys()[0]]
-        except KeyError:
+            bibtex = bibtex[list(bibtex.keys())[0]]
+        except (IndexError, KeyError):
            tools.warning("Invalid bibtex entry")
            bibtex_string = ''
            tools.rawInput("Press Enter to go back to editor.")
@ -90,7 +93,7 @@ def checkBibtex(filename, bibtex_string):
    return bibtex


-def addFile(src, filetype, manual, autoconfirm, tag):
+def addFile(src, filetype, manual, autoconfirm, tag, rename=True):
    """
    Add a file to the library
    """
@ -101,9 +104,11 @@ def addFile(src, filetype, manual, autoconfirm, tag):
    if not manual:
        try:
            if filetype == 'article' or filetype is None:
-                doi = fetcher.findDOI(src)
-            if doi is False and (filetype == 'article' or filetype is None):
-                arxiv = fetcher.findArXivId(src)
+                id_type, article_id = fetcher.findArticleID(src)
+                if id_type == "DOI":
+                    doi = article_id
+                elif id_type == "arXiv":
+                    arxiv = article_id

            if filetype == 'book' or (doi is False and arxiv is False and
                                      filetype is None):
@ -172,10 +177,10 @@ def addFile(src, filetype, manual, autoconfirm, tag):
    else:
        bibtex = ''

-    bibtex = BibTexParser(bibtex)
-    bibtex = bibtex.get_entry_dict()
+    bibtex = bibtexparser.loads(bibtex)
+    bibtex = bibtex.entries_dict
    if len(bibtex) > 0:
-        bibtex_name = bibtex.keys()[0]
+        bibtex_name = list(bibtex.keys())[0]
        bibtex = bibtex[bibtex_name]
        bibtex_string = tools.parsed2Bibtex(bibtex)
    else:
@ -190,30 +195,33 @@ def addFile(src, filetype, manual, autoconfirm, tag):
        tag = args.tag
    bibtex['tag'] = tag

-    new_name = backend.getNewName(src, bibtex, tag)
+    if rename:
+        new_name = backend.getNewName(src, bibtex, tag)

-    while os.path.exists(new_name):
-        tools.warning("file "+new_name+" already exists.")
-        default_rename = new_name.replace(tools.getExtension(new_name),
-                                          " (2)"+tools.getExtension(new_name))
-        rename = tools.rawInput("New name ["+default_rename+"]? ")
-        if rename == '':
-            new_name = default_rename
-        else:
-            new_name = rename
-    bibtex['file'] = new_name
-
-    try:
-        shutil.copy2(src, new_name)
-    except shutil.Error:
-        new_name = False
-        sys.exit("Unable to move file to library dir " +
-                 config.get("folder")+".")
+        while os.path.exists(new_name):
+            tools.warning("file "+new_name+" already exists.")
+            default_rename = new_name.replace(tools.getExtension(new_name),
+                                              " (2)" +
+                                              tools.getExtension(new_name))
+            rename = tools.rawInput("New name ["+default_rename+"]? ")
+            if rename == '':
+                new_name = default_rename
+            else:
+                new_name = rename
+        try:
+            shutil.copy2(src, new_name)
+        except shutil.Error:
+            new_name = False
+            sys.exit("Unable to move file to library dir " +
+                     config.get("folder")+".")
+    else:
+        new_name = src
+    bibtex['file'] = os.path.abspath(new_name)

    # Remove first page of IOP papers
    try:
-        if 'IOP' in bibtex['publisher'] and bibtex['type'] == 'article':
-            tearpages.main(new_name)
+        if 'IOP' in bibtex['publisher'] and bibtex['ENTRYTYPE'] == 'article':
+            tearpages.tearpage(new_name)
    except (KeyError, shutil.Error, IOError):
        pass

@ -268,13 +276,13 @@ def editEntry(entry, file_id='both'):
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            index = BibTexParser(fh.read())
-        index = index.get_entry_dict()
+            index = bibtexparser.load(fh)
+        index = index.entries_dict
    except (TypeError, IOError):
        tools.warning("Unable to open index file.")
        return False

-    index[new_bibtex['id']] = new_bibtex
+    index[new_bibtex['ID']] = new_bibtex
    backend.bibtexRewrite(index)
    return True

@ -287,7 +295,7 @@ def downloadFile(url, filetype, manual, autoconfirm, tag):
        print('Download finished')
        tmp = tempfile.NamedTemporaryFile(suffix='.'+contenttype)

-        with open(tmp.name, 'w+') as fh:
+        with open(tmp.name, 'wb+') as fh:
            fh.write(dl)
        new_name = addFile(tmp.name, filetype, manual, autoconfirm, tag)
        if new_name is False:
@ -303,13 +311,13 @@ def openFile(ident):
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            bibtex = BibTexParser(fh.read())
-        bibtex = bibtex.get_entry_dict()
+            bibtex = bibtexparser.load(fh)
+        bibtex = bibtex.entries_dict
    except (TypeError, IOError):
        tools.warning("Unable to open index file.")
        return False

-    if ident not in bibtex.keys():
+    if ident not in list(bibtex.keys()):
        return False
    else:
        subprocess.Popen(['xdg-open', bibtex[ident]['file']])
@ -326,7 +334,7 @@ def resync():
        entry = diff[key]
        if entry['file'] == '':
            print("\nFound entry in index without associated file: " +
-                  entry['id'])
+                  entry['ID'])
            print("Title:\t"+entry['title'])
            loop = True
            while confirm:
@ -336,23 +344,23 @@ def resync():
                if filename == '':
                    break
                else:
-                    if 'doi' in entry.keys():
-                        doi = fetcher.findDOI(filename)
+                    if 'doi' in list(entry.keys()):
+                        doi = fetcher.findArticleID(filename, only=["DOI"])
                        if doi is not False and doi != entry['doi']:
                            loop = tools.rawInput("Found DOI does not " +
                                                  "match bibtex entry " +
                                                  "DOI, continue anyway " +
                                                  "? [y/N]")
                            loop = (loop.lower() != 'y')
-                    if 'Eprint' in entry.keys():
-                        arxiv = fetcher.findArXivId(filename)
+                    if 'Eprint' in list(entry.keys()):
+                        arxiv = fetcher.findArticleID(filename, only=["arXiv"])
                        if arxiv is not False and arxiv != entry['Eprint']:
                            loop = tools.rawInput("Found arXiv id does " +
                                                  "not match bibtex " +
                                                  "entry arxiv id, " +
                                                  "continue anyway ? [y/N]")
                            loop = (loop.lower() != 'y')
-                    if 'isbn' in entry.keys():
+                    if 'isbn' in list(entry.keys()):
                        isbn = fetcher.findISBN(filename)
                        if isbn is not False and isbn != entry['isbn']:
                            loop = tools.rawInput("Found ISBN does not " +
@ -362,19 +370,19 @@ def resync():
                            loop = (loop.lower() != 'y')
                    continue
            if filename == '':
-                backend.deleteId(entry['id'])
-                print("Deleted entry \""+entry['id']+"\".")
+                backend.deleteId(entry['ID'])
+                print("Deleted entry \""+entry['ID']+"\".")
            else:
                new_name = backend.getNewName(filename, entry)
                try:
                    shutil.copy2(filename, new_name)
                    print("Imported new file "+filename+" for entry " +
-                          entry['id']+".")
+                          entry['ID']+".")
                except shutil.Error:
                    new_name = False
                    sys.exit("Unable to move file to library dir " +
                             config.get("folder")+".")
-                backend.bibtexEdit(entry['id'], {'file': filename})
+                backend.bibtexEdit(entry['ID'], {'file': filename})
        else:
            print("Found file without any associated entry in index:")
            print(entry['file'])
@ -430,46 +438,70 @@ def update(entry):
            print("Previous version successfully deleted.")


+def commandline_arg(bytestring):
+    # UTF-8 encoding for python2
+    if sys.version_info >= (3, 0):
+        unicode_string = bytestring
+    else:
+        unicode_string = bytestring.decode(sys.getfilesystemencoding())
+    return unicode_string
+
+
 if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="A bibliography " +
                                     "management tool.")
-    subparsers = parser.add_subparsers(help="sub-command help")
+    subparsers = parser.add_subparsers(help="sub-command help", dest='parser')
+    subparsers.required = True  # Fix for Python 3.3.5

    parser_download = subparsers.add_parser('download', help="download help")
    parser_download.add_argument('-t', '--type', default=None,
                                 choices=['article', 'book'],
-                                 help="type of the file to download")
+                                 help="type of the file to download",
+                                 type=commandline_arg)
    parser_download.add_argument('-m', '--manual', default=False,
                                 action='store_true',
                                 help="disable auto-download of bibtex")
    parser_download.add_argument('-y', default=False,
                                 help="Confirm all")
-    parser_download.add_argument('--tag', default='', help="Tag")
+    parser_download.add_argument('--tag', default='',
+                                 help="Tag", type=commandline_arg)
+    parser_download.add_argument('--keep', default=False,
+                                 help="Do not remove the file")
    parser_download.add_argument('url',  nargs='+',
-                                 help="url of the file to import")
+                                 help="url of the file to import",
+                                 type=commandline_arg)
    parser_download.set_defaults(func='download')

    parser_import = subparsers.add_parser('import', help="import help")
    parser_import.add_argument('-t', '--type', default=None,
                               choices=['article', 'book'],
-                               help="type of the file to import")
+                               help="type of the file to import",
+                               type=commandline_arg)
    parser_import.add_argument('-m', '--manual', default=False,
                               action='store_true',
                               help="disable auto-download of bibtex")
    parser_import.add_argument('-y', default=False,
                               help="Confirm all")
-    parser_import.add_argument('--tag', default='', help="Tag")
+    parser_import.add_argument('--tag', default='', help="Tag",
+                               type=commandline_arg)
+    parser_import.add_argument('--in-place', default=False,
+                               dest="inplace", action='store_true',
+                               help="Leave the imported file in place",)
    parser_import.add_argument('file',  nargs='+',
-                               help="path to the file to import")
+                               help="path to the file to import",
+                               type=commandline_arg)
    parser_import.add_argument('--skip',  nargs='+',
-                               help="path to files to skip", default=[])
+                               help="path to files to skip", default=[],
+                               type=commandline_arg)
    parser_import.set_defaults(func='import')

    parser_delete = subparsers.add_parser('delete', help="delete help")
    parser_delete.add_argument('entries', metavar='entry', nargs='+',
-                               help="a filename or an identifier")
+                               help="a filename or an identifier",
+                               type=commandline_arg)
    parser_delete.add_argument('--skip',  nargs='+',
-                               help="path to files to skip", default=[])
+                               help="path to files to skip", default=[],
+                               type=commandline_arg)
    group = parser_delete.add_mutually_exclusive_group()
    group.add_argument('--id', action="store_true", default=False,
                       help="id based deletion")
@ -482,9 +514,11 @@ if __name__ == '__main__':

    parser_edit = subparsers.add_parser('edit', help="edit help")
    parser_edit.add_argument('entries', metavar='entry', nargs='+',
-                             help="a filename or an identifier")
+                             help="a filename or an identifier",
+                             type=commandline_arg)
    parser_edit.add_argument('--skip',  nargs='+',
-                             help="path to files to skip", default=[])
+                             help="path to files to skip", default=[],
+                             type=commandline_arg)
    group = parser_edit.add_mutually_exclusive_group()
    group.add_argument('--id', action="store_true", default=False,
                       help="id based deletion")
@ -500,12 +534,14 @@ if __name__ == '__main__':

    parser_open = subparsers.add_parser('open', help="open help")
    parser_open.add_argument('ids', metavar='id',  nargs='+',
-                             help="an identifier")
+                             help="an identifier",
+                             type=commandline_arg)
    parser_open.set_defaults(func='open')

    parser_export = subparsers.add_parser('export', help="export help")
    parser_export.add_argument('ids', metavar='id',  nargs='+',
-                               help="an identifier")
+                               help="an identifier",
+                               type=commandline_arg)
    parser_export.set_defaults(func='export')

    parser_resync = subparsers.add_parser('resync', help="resync help")
@ -513,12 +549,14 @@ if __name__ == '__main__':

    parser_update = subparsers.add_parser('update', help="update help")
    parser_update.add_argument('--entries', metavar='entry', nargs='+',
-                               help="a filename or an identifier")
+                               help="a filename or an identifier",
+                               type=commandline_arg)
    parser_update.set_defaults(func='update')

    parser_search = subparsers.add_parser('search', help="search help")
    parser_search.add_argument('query', metavar='entry', nargs='+',
-                               help="your query, see README for more info.")
+                               help="your query, see README for more info.",
+                               type=commandline_arg)
    parser_search.set_defaults(func='search')

    args = parser.parse_args()
@ -543,7 +581,7 @@ if __name__ == '__main__':
            skipped = []
            for filename in list(set(args.file) - set(args.skip)):
                new_name = addFile(filename, args.type, args.manual, args.y,
-                                   args.tag)
+                                   args.tag, not args.inplace)
                if new_name is not False:
                    print(filename+" successfully imported as " +
                          new_name+".")
@ -567,8 +605,9 @@ if __name__ == '__main__':
                    confirm = 'y'

                if confirm.lower() == 'y':
-                    if args.file or not backend.deleteId(filename):
-                        if args.id or not backend.deleteFile(filename):
+                    if args.file or not backend.deleteId(filename, args.keep):
+                        if(args.id or
+                           not backend.deleteFile(filename, args.keep)):
                            tools.warning("Unable to delete "+filename)
                            sys.exit(1)

@ -594,13 +633,14 @@ if __name__ == '__main__':
            sys.exit()

        elif args.func == 'list':
-            listPapers = tools.listDir(config.get("folder"))
+            listPapers = backend.getEntries(full=True)
+            if not listPapers:
+                sys.exit()
+            listPapers = [v["file"] for k, v in listPapers.items()]
            listPapers.sort()
-
            for paper in listPapers:
-                if tools.getExtension(paper) not in [".pdf", ".djvu"]:
-                    continue
                print(paper)
+            sys.exit()

        elif args.func == 'search':
            raise Exception('TODO')
--- a/libbmc/init.py
+++ b/libbmc/init.py
@ -0,0 +1,2 @@
+#!/usr/bin/env python2
+# -*- coding: utf-8 -*-
--- a/libbmc/backend.py
+++ b/libbmc/backend.py
@ -9,13 +9,13 @@
 #                                                                   Phyks
 # -----------------------------------------------------------------------------

-
+from __future__ import unicode_literals
 import os
 import re
-import tools
-import fetcher
-from bibtexparser.bparser import BibTexParser
-from config import Config
+import libbmc.tools as tools
+import libbmc.fetcher as fetcher
+import bibtexparser
+from libbmc.config import Config
 from codecs import open


@ -29,7 +29,7 @@ def getNewName(src, bibtex, tag='', override_format=None):
    """
    authors = re.split(' and ', bibtex['author'])

-    if bibtex['type'] == 'article':
+    if bibtex['ENTRYTYPE'] == 'article':
        if override_format is None:
            new_name = config.get("format_articles")
        else:
@ -38,7 +38,7 @@ def getNewName(src, bibtex, tag='', override_format=None):
            new_name = new_name.replace("%j", bibtex['journal'])
        except KeyError:
            pass
-    elif bibtex['type'] == 'book':
+    elif bibtex['ENTRYTYPE'] == 'book':
        if override_format is None:
            new_name = config.get("format_books")
        else:
@ -103,8 +103,8 @@ def bibtexEdit(ident, modifs):
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            bibtex = BibTexParser(fh.read())
-        bibtex = bibtex.get_entry_dict()
+            bibtex = bibtexparser.load(fh)
+        bibtex = bibtex.entries_dict
    except (IOError, TypeError):
        tools.warning("Unable to open index file.")
        return False
@ -131,13 +131,13 @@ def bibtexRewrite(data):
        return False


-def deleteId(ident):
+def deleteId(ident, keep=False):
    """Delete a file based on its id in the bibtex file"""
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            bibtex = BibTexParser(fh.read().decode('utf-8'))
-        bibtex = bibtex.get_entry_dict()
+            bibtex = bibtexparser.load(fh)
+        bibtex = bibtex.entries_dict
    except (IOError, TypeError):
        tools.warning("Unable to open index file.")
        return False
@ -145,11 +145,12 @@ def deleteId(ident):
    if ident not in bibtex.keys():
        return False

-    try:
-        os.remove(bibtex[ident]['file'])
-    except (KeyError, OSError):
-        tools.warning("Unable to delete file associated to id "+ident+" : " +
-                      bibtex[ident]['file'])
+    if not keep:
+        try:
+            os.remove(bibtex[ident]['file'])
+        except (KeyError, OSError):
+            tools.warning("Unable to delete file associated to id " + ident +
+                          " : " + bibtex[ident]['file'])

    try:
        if not os.listdir(os.path.dirname(bibtex[ident]['file'])):
@ -167,27 +168,28 @@ def deleteId(ident):
    return True


-def deleteFile(filename):
+def deleteFile(filename, keep=False):
    """Delete a file based on its filename"""
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            bibtex = BibTexParser(fh.read().decode('utf-8'))
-        bibtex = bibtex.get_entry_dict()
+            bibtex = bibtexparser.load(fh)
+        bibtex = bibtex.entries_dict
    except (TypeError, IOError):
        tools.warning("Unable to open index file.")
        return False

    found = False
-    for key in bibtex.keys():
+    for key in list(bibtex.keys()):
        try:
            if os.path.samefile(bibtex[key]['file'], filename):
                found = True
-                try:
-                    os.remove(bibtex[key]['file'])
-                except (KeyError, OSError):
-                    tools.warning("Unable to delete file associated to id " +
-                                  key+" : "+bibtex[key]['file'])
+                if not keep:
+                    try:
+                        os.remove(bibtex[key]['file'])
+                    except (KeyError, OSError):
+                        tools.warning("Unable to delete file associated " +
+                                      "to id " + key+" : "+bibtex[key]['file'])

                try:
                    if not os.listdir(os.path.dirname(filename)):
@ -222,8 +224,8 @@ def diffFilesIndex():
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            index = BibTexParser(fh.read())
-        index_diff = index.get_entry_dict()
+            index = bibtexparser.load(fh)
+        index_diff = index.entries_dict
    except (TypeError, IOError):
        tools.warning("Unable to open index file.")
        return False
@ -237,7 +239,7 @@ def diffFilesIndex():
    for filename in files:
        index_diff[filename] = {'file': filename}

-    return index.get_entry_dict()
+    return index.entries_dict


 def getBibtex(entry, file_id='both', clean=False):
@ -250,8 +252,8 @@ def getBibtex(entry, file_id='both', clean=False):
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            bibtex = BibTexParser(fh.read())
-        bibtex = bibtex.get_entry_dict()
+            bibtex = bibtexparser.load(fh)
+        bibtex = bibtex.entries_dict
    except (TypeError, IOError):
        tools.warning("Unable to open index file.")
        return False
@ -277,18 +279,21 @@ def getBibtex(entry, file_id='both', clean=False):
    return bibtex_entry


-def getEntries():
+def getEntries(full=False):
    """Returns the list of all entries in the bibtex index"""
    try:
        with open(config.get("folder")+'index.bib', 'r', encoding='utf-8') \
                as fh:
-            bibtex = BibTexParser(fh.read())
-        bibtex = bibtex.get_entry_dict()
+            bibtex = bibtexparser.load(fh)
+        bibtex = bibtex.entries_dict
    except (TypeError, IOError):
        tools.warning("Unable to open index file.")
        return False

-    return bibtex.keys()
+    if full:
+        return bibtex
+    else:
+        return list(bibtex.keys())


 def updateArXiv(entry):
@ -313,9 +318,9 @@ def updateArXiv(entry):
            continue
        ids.add(bibtex['eprint'])

-    last_bibtex = BibTexParser(fetcher.arXiv2Bib(arxiv_id_no_v))
-    last_bibtex = last_bibtex.get_entry_dict()
-    last_bibtex = last_bibtex[last_bibtex.keys()[0]]
+    last_bibtex = bibtexparser.loads(fetcher.arXiv2Bib(arxiv_id_no_v))
+    last_bibtex = last_bibtex.entries_dict
+    last_bibtex = last_bibtex[list(last_bibtex.keys())[0]]

    if last_bibtex['eprint'] not in ids:
        return last_bibtex
--- a/libbmc/config.py
+++ b/libbmc/config.py
@ -1,10 +1,11 @@
+from __future__ import unicode_literals
 import os
 import errno
 import imp
 import inspect
 import json
 import sys
-import tools
+import libbmc.tools as tools

 # List of available options (in ~/.config/bmc/bmc.json file):
 # * folder : folder in which papers are stored
@ -81,12 +82,20 @@ class Config():
            except (ValueError, IOError):
                tools.warning("Config file could not be read.")
                sys.exit(1)
+            try:
+                folder_exists = make_sure_path_exists(self.get("folder"))
+            except OSError:
+                tools.warning("Unable to create paper storage folder.")
+                sys.exit(1)
        self.load_masks()

    def save(self):
        try:
            with open(self.config_path + "bmc.json", 'w') as fh:
-                fh.write(json.dumps(self.config))
+                fh.write(json.dumps(self.config,
+                                    sort_keys=True,
+                                    indent=4,
+                                    separators=(',', ': ')))
        except IOError:
            tools.warning("Could not write config file.")
            sys.exit(1)
--- a/libbmc/fetcher.py
+++ b/libbmc/fetcher.py
@ -12,16 +12,30 @@

 import isbnlib
 import re
-import requesocks as requests  # Requesocks is requests with SOCKS support
+import socket
+import socks
 import subprocess
 import sys
+try:
+    # For Python 3.0 and later
+    from urllib.request import urlopen, Request
+    from urllib.error import URLError
+except ImportError:
+    # Fall back to Python 2's urllib2
+    from urllib2 import urlopen, Request, URLError
 import arxiv2bib as arxiv_metadata
-import tools
-from bibtexparser.bparser import BibTexParser
-from config import Config
+import libbmc.tools as tools
+import bibtexparser
+from libbmc.config import Config


 config = Config()
+default_socket = socket.socket
+try:
+    stdout_encoding = sys.stdout.encoding
+    assert(stdout_encoding is not None)
+except (AttributeError, AssertionError):
+    stdout_encoding = 'UTF-8'


 def download(url):
@ -32,39 +46,81 @@ def download(url):
    false if it could not be downloaded.
    """
    for proxy in config.get("proxies"):
-        r_proxy = {
-            "http": proxy,
-            "https": proxy,
-        }
+        if proxy.startswith('socks'):
+            if proxy[5] == '4':
+                proxy_type = socks.SOCKS4
+            else:
+                proxy_type = socks.SOCKS5
+            proxy = proxy[proxy.find('://')+3:]
+            try:
+                proxy, port = proxy.split(':')
+            except ValueError:
+                port = None
+            socks.set_default_proxy(proxy_type, proxy, port)
+            socket.socket = socks.socksocket
+        elif proxy == '':
+            socket.socket = default_socket
+        else:
+            try:
+                proxy, port = proxy.split(':')
+            except ValueError:
+                port = None
+            socks.set_default_proxy(socks.HTTP, proxy, port)
+            socket.socket = socks.socksocket
        try:
-            r = requests.get(url, proxies=r_proxy)
-            size = int(r.headers['Content-Length'].strip())
-            dl = ""
+            r = urlopen(url)
+            try:
+                size = int(dict(r.info())['content-length'].strip())
+            except KeyError:
+                try:
+                    size = int(dict(r.info())['Content-Length'].strip())
+                except KeyError:
+                    size = 0
+            dl = b""
            dl_size = 0
-            for buf in r.iter_content(1024):
+            while True:
+                buf = r.read(1024)
                if buf:
                    dl += buf
                    dl_size += len(buf)
-                    done = int(50 * dl_size / size)
-                    sys.stdout.write("\r[%s%s]" % ('='*done, ' '*(50-done)))
-                    sys.stdout.write(" "+str(int(float(done)/52*100))+"%")
-                    sys.stdout.flush()
+                    if size != 0:
+                        done = int(50 * dl_size / size)
+                        sys.stdout.write("\r[%s%s]" % ('='*done, ' '*(50-done)))
+                        sys.stdout.write(" "+str(int(float(done)/52*100))+"%")
+                        sys.stdout.flush()
+                else:
+                    break
            contenttype = False
-            if 'pdf' in r.headers['content-type']:
-                contenttype = 'pdf'
-            elif 'djvu' in r.headers['content-type']:
-                contenttype = 'djvu'
+            contenttype_req = None
+            try:
+                contenttype_req = dict(r.info())['content-type']
+            except KeyError:
+                try:
+                    contenttype_req = dict(r.info())['Content-Type']
+                except KeyError:
+                    continue
+            try:
+                if 'pdf' in contenttype_req:
+                    contenttype = 'pdf'
+                elif 'djvu' in contenttype_req:
+                    contenttype = 'djvu'
+            except KeyError:
+                pass

-            if r.status_code != 200 or contenttype is False:
+            if r.getcode() != 200 or contenttype is False:
                continue

            return dl, contenttype
        except ValueError:
            tools.warning("Invalid URL")
            return False, None
-        except requests.exceptions.RequestException:
-            tools.warning("Unable to get "+url+" using proxy "+proxy+". It " +
-                          "may not be available.")
+        except (URLError, socket.error):
+            if proxy != "":
+                proxy_txt = "using proxy "+proxy
+            else:
+                proxy_txt = "without using any proxy"
+            tools.warning("Unable to get "+url+" "+proxy_txt+". It " +
+                          "may not be available at the moment.")
            continue
    return False, None

@ -91,7 +147,7 @@ def findISBN(src):
        return False

    while totext.poll() is None:
-        extractfull = ' '.join([i.strip() for i in totext.stdout.readlines()])
+        extractfull = ' '.join([i.decode(stdout_encoding).strip() for i in totext.stdout.readlines()])
        extractISBN = isbn_re.search(extractfull.lower().replace('&#338;',
                                                                 '-'))
        if extractISBN:
@ -117,7 +173,7 @@ def isbn2Bib(isbn):
    try:
        return isbnlib.registry.bibformatters['bibtex'](isbnlib.meta(isbn,
                                                                     'default'))
-    except (isbnlib.ISBNLibException, isbnlib.ISBNToolsException, TypeError):
+    except (isbnlib.ISBNLibException, TypeError):
        return ''


@ -128,13 +184,16 @@ clean_doi_re = re.compile('^/')
 clean_doi_fabse_re = re.compile('^10.1096')
 clean_doi_jcb_re = re.compile('^10.1083')
 clean_doi_len_re = re.compile(r'\d\.\d')
+arXiv_re = re.compile(r'arXiv:\s*([\w\.\/\-]+)', re.IGNORECASE)


-def findDOI(src):
-    """Search for a valid DOI in src.
+def findArticleID(src, only=["DOI", "arXiv"]):
+    """Search for a valid article ID (DOI or ArXiv) in src.

-    Returns the DOI or False if not found or an error occurred.
+    Returns a tuple (type, first matching ID) or False if not found
+    or an error occurred.
    From : http://en.dogeno.us/2010/02/release-a-python-script-for-organizing-scientific-papers-pyrenamepdf-py/
+    and https://github.com/minad/bibsync/blob/3fdf121016f6187a2fffc66a73cd33b45a20e55d/lib/bibsync/utils.rb
    """
    if src.endswith(".pdf"):
        totext = subprocess.Popen(["pdftotext", src, "-"],
@ -145,33 +204,48 @@ def findDOI(src):
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
    else:
-        return False
+        return (False, False)

    extractfull = ''
+    extract_type = False
+    extractID = None
    while totext.poll() is None:
-        extractfull += ' '.join([i.strip() for i in totext.stdout.readlines()])
-        extractDOI = doi_re.search(extractfull.lower().replace('&#338;', '-'))
-        if not extractDOI:
-            # PNAS fix
-            extractDOI = doi_pnas_re.search(extractfull.
-                                            lower().
-                                            replace('pnas', '/pnas'))
-            if not extractDOI:
-                # JSB fix
-                extractDOI = doi_jsb_re.search(extractfull.lower())
-        if extractDOI:
-            totext.terminate()
+        extractfull += ' '.join([i.decode(stdout_encoding).strip() for i in totext.stdout.readlines()])
+        # Try to extract DOI
+        if "DOI" in only:
+            extractlower = extractfull.lower().replace('digital object identifier', 'doi')
+            extractID = doi_re.search(extractlower.replace('&#338;', '-'))
+            if not extractID:
+                # PNAS fix
+                extractID = doi_pnas_re.search(extractlower.replace('pnas', '/pnas'))
+                if not extractID:
+                    # JSB fix
+                    extractID = doi_jsb_re.search(extractlower)
+            if extractID:
+                extract_type = "DOI"
+                totext.terminate()
+        # Try to extract arXiv
+        if "arXiv" in only:
+            tmp_extractID = arXiv_re.search(extractfull)
+            if tmp_extractID:
+                if not extractID or extractID.start(0) > tmp_extractID.start(1):
+                    # Only use arXiv id if it is before the DOI in the pdf
+                    extractID = tmp_extractID
+                    extract_type = "arXiv"
+                    totext.terminate()
+        if extract_type is not False:
            break

    err = totext.communicate()[1]
    if totext.returncode > 0:
        # Error happened
        tools.warning(err)
-        return False
+        return (False, False)

-    cleanDOI = False
-    if extractDOI:
-        cleanDOI = extractDOI.group(0).replace(':', '').replace(' ', '')
+    if extractID is not None and extract_type == "DOI":
+        # If DOI extracted, clean it and return it
+        cleanDOI = False
+        cleanDOI = extractID.group(0).replace(':', '').replace(' ', '')
        if clean_doi_re.search(cleanDOI):
            cleanDOI = cleanDOI[1:]
        # FABSE J fix
@ -191,7 +265,11 @@ def findDOI(src):
                    if cleanDOItemp[i].isalpha() and digitStart:
                        break
            cleanDOI = cleanDOI[0:(8+i)]
-    return cleanDOI
+        return ("DOI", cleanDOI)
+    elif extractID is not None and extract_type == "arXiv":
+        # If arXiv id is extracted, return it
+        return ("arXiv", extractID.group(1))
+    return (False, False)


 def doi2Bib(doi):
@ -201,58 +279,29 @@ def doi2Bib(doi):
    """
    url = "http://dx.doi.org/" + doi
    headers = {"accept": "application/x-bibtex"}
+    req = Request(url, headers=headers)
    try:
-        r = requests.get(url, headers=headers)
+        r = urlopen(req)

-        if r.headers['content-type'] == 'application/x-bibtex':
-            return r.text
-        else:
-            return ''
-    except requests.exceptions.ConnectionError:
+        try:
+            if dict(r.info())['content-type'] == 'application/x-bibtex':
+                return r.read().decode('utf-8')
+            else:
+                return ''
+        except KeyError:
+            try:
+                if dict(r.info())['Content-Type'] == 'application/x-bibtex':
+                    return r.read().decode('utf-8')
+                else:
+                    return ''
+            except KeyError:
+                return ''
+    except:
        tools.warning('Unable to contact remote server to get the bibtex ' +
                      'entry for doi '+doi)
        return ''


-arXiv_re = re.compile(r'arXiv:\s*([\w\.\/\-]+)', re.IGNORECASE)
-
-
-def findArXivId(src):
-    """Searches for a valid arXiv id in src.
-
-    Returns the arXiv id or False if not found or an error occurred.
-    From : https://github.com/minad/bibsync/blob/3fdf121016f6187a2fffc66a73cd33b45a20e55d/lib/bibsync/utils.rb
-    """
-    if src.endswith(".pdf"):
-        totext = subprocess.Popen(["pdftotext", src, "-"],
-                                  stdout=subprocess.PIPE,
-                                  stderr=subprocess.PIPE)
-    elif src.endswith(".djvu"):
-        totext = subprocess.Popen(["djvutxt", src],
-                                  stdout=subprocess.PIPE,
-                                  stderr=subprocess.PIPE)
-    else:
-        return False
-
-    extractfull = ''
-    while totext.poll() is None:
-        extractfull += ' '.join([i.strip() for i in totext.stdout.readlines()])
-        extractID = arXiv_re.search(extractfull)
-        if extractID:
-            totext.terminate()
-            break
-
-    err = totext.communicate()[1]
-    if totext.returncode > 0:
-        # Error happened
-        tools.warning(err)
-        return False
-    elif extractID is not None:
-        return extractID.group(1)
-    else:
-        return False
-
-
 def arXiv2Bib(arxiv):
    """Returns bibTeX string of metadata for a given arXiv id

@ -263,9 +312,9 @@ def arXiv2Bib(arxiv):
        if isinstance(bib, arxiv_metadata.ReferenceErrorInfo):
            continue
        else:
-            fetched_bibtex = BibTexParser(bib.bibtex())
-            fetched_bibtex = fetched_bibtex.get_entry_dict()
-            fetched_bibtex = fetched_bibtex[fetched_bibtex.keys()[0]]
+            fetched_bibtex = bibtexparser.loads(bib.bibtex())
+            fetched_bibtex = fetched_bibtex.entries_dict
+            fetched_bibtex = fetched_bibtex[list(fetched_bibtex.keys())[0]]
            try:
                del(fetched_bibtex['file'])
            except KeyError:
@ -295,7 +344,7 @@ def findHALId(src):
        return False

    while totext.poll() is None:
-        extractfull = ' '.join([i.strip() for i in totext.stdout.readlines()])
+        extractfull = ' '.join([i.decode(stdout_encoding).strip() for i in totext.stdout.readlines()])
        extractID = HAL_re.search(extractfull)
        if extractID:
            totext.terminate()
--- a/libbmc/search.py
+++ b/libbmc/search.py
@ -168,7 +168,7 @@ class SearchQueryParser:
        return self._methods[argument.getName()](argument)

    def Parse(self, query):
-        #print self._parser(query)[0]
+        #print(self._parser(query)[0])
        return self.evaluate(self._parser(query)[0])

    def GetWord(self, word):
@ -278,21 +278,21 @@ class ParserTest(SearchQueryParser):
    def Test(self):
        all_ok = True
        for item in self.tests.keys():
-            print item
+            print(item)
            r = self.Parse(item)
            e = self.tests[item]
-            print 'Result: %s' % r
-            print 'Expect: %s' % e
+            print('Result: %s' % r)
+            print('Expect: %s' % e)
            if e == r:
-                print 'Test OK'
+                print('Test OK')
            else:
                all_ok = False
-                print '>>>>>>>>>>>>>>>>>>>>>>Test ERROR<<<<<<<<<<<<<<<<<<<<<'
-            print ''
+                print('>>>>>>>>>>>>>>>>>>>>>>Test ERROR<<<<<<<<<<<<<<<<<<<<<')
+            print('')
        return all_ok

 if __name__=='__main__':
    if ParserTest().Test():
-        print 'All tests OK'
+        print('All tests OK')
    else:
-        print 'One or more tests FAILED'
+        print('One or more tests FAILED')
--- a/libbmc/tearpages.py
+++ b/libbmc/tearpages.py
@ -0,0 +1,57 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Author: Francois Boulogne
+
+import shutil
+import tempfile
+from PyPDF2 import PdfFileWriter, PdfFileReader
+from PyPDF2.utils import PdfReadError
+
+
+def _fixPdf(pdfFile, destination):
+    """
+    Fix malformed pdf files when data are present after '%%EOF'
+
+    :param pdfFile: PDF filepath
+    :param destination: destination
+    """
+    tmp = tempfile.NamedTemporaryFile()
+    output = open(tmp.name, 'wb')
+    with open(pdfFile, "rb") as fh:
+        with open(pdfFile, "rb") as fh:
+            for line in fh:
+                output.write(line)
+                if b'%%EOF' in line:
+                    break
+    output.close()
+    shutil.copy(tmp.name, destination)
+
+
+def tearpage(filename, startpage=1):
+    """
+    Copy filename to a tempfile, write pages startpage..N to filename.
+
+    :param filename: PDF filepath
+    :param startpage: page number for the new first page
+    """
+    # Copy the pdf to a tmp file
+    tmp = tempfile.NamedTemporaryFile()
+    shutil.copy(filename, tmp.name)
+
+    # Read the copied pdf
+    try:
+        input_file = PdfFileReader(open(tmp.name, 'rb'))
+    except PdfReadError:
+        _fixPdf(filename, tmp.name)
+        input_file = PdfFileReader(open(tmp.name, 'rb'))
+    # Seek for the number of pages
+    num_pages = input_file.getNumPages()
+
+    # Write pages excepted the first one
+    output_file = PdfFileWriter()
+    for i in range(startpage, num_pages):
+        output_file.addPage(input_file.getPage(i))
+
+    tmp.close()
+    outputStream = open(filename, "wb")
+    output_file.write(outputStream)
--- a/libbmc/tests/src/arxiv.bib
+++ b/libbmc/tests/src/arxiv.bib
--- a/libbmc/tests/src/doi.bib
+++ b/libbmc/tests/src/doi.bib
@ -2,11 +2,11 @@
 	doi = {10.1103/physreva.88.043630},
 	url = {http://dx.doi.org/10.1103/physreva.88.043630},
 	year = 2013,
-	month = {Oct},
-	publisher = {American Physical Society (APS)},
+	month = {oct},
+	publisher = {American Physical Society ({APS})},
 	volume = {88},
 	number = {4},
 	author = {Yan-Hua Hou and Lev P. Pitaevskii and Sandro Stringari},
 	title = {First and second sound in a highly elongated Fermi gas at unitarity},
-	journal = {Physical Review A}
+	journal = {Phys. Rev. A}
 }
--- a/libbmc/tests/src/isbn.bib
+++ b/libbmc/tests/src/isbn.bib
@ -1,6 +1,6 @@
@book{9780198507192,
     title = {Bose-Einstein Condensation},
-    author = {Lev Pitaevskii and Sandro Stringari},
+    author = {Lev. P. Pitaevskii and S. Stringari},
      isbn = {9780198507192},
      year = {2004},
 publisher = {Clarendon Press}
--- a/libbmc/tests/src/test.djvu
+++ b/libbmc/tests/src/test.djvu
--- a/libbmc/tests/src/test.pdf
+++ b/libbmc/tests/src/test.pdf
--- a/libbmc/tests/src/test_arxiv_doi_conflict.pdf
+++ b/libbmc/tests/src/test_arxiv_doi_conflict.pdf
--- a/libbmc/tests/src/test_arxiv_multi.pdf
+++ b/libbmc/tests/src/test_arxiv_multi.pdf
--- a/libbmc/tests/src/test_arxiv_published.pdf
+++ b/libbmc/tests/src/test_arxiv_published.pdf
--- a/libbmc/tests/src/test_book.djvu
+++ b/libbmc/tests/src/test_book.djvu
--- a/libbmc/tests/src/test_book.pdf
+++ b/libbmc/tests/src/test_book.pdf
--- a/libbmc/tests/src/test_hal.pdf
+++ b/libbmc/tests/src/test_hal.pdf
--- a/libbmc/tests/src/test_watermark.pdf
+++ b/libbmc/tests/src/test_watermark.pdf
--- a/libbmc/tests/test_backend.py
+++ b/libbmc/tests/test_backend.py
@ -8,9 +8,10 @@
 # <del>beer</del> soda in return.
 #                                                                   Phyks
 # -----------------------------------------------------------------------------
+from __future__ import unicode_literals
 import unittest
-from backend import *
-from bibtexparser.bparser import BibTexParser
+from libbmc.backend import *
+import bibtexparser
 import os
 import shutil
 import tempfile
@ -21,7 +22,7 @@ class TestFetcher(unittest.TestCase):
        config.set("folder", tempfile.mkdtemp()+"/")
        self.bibtex_article_string = """
@article{1303.3130v1,
-	abstract={We study the role of the dipolar interaction, correctly accounting for the
+        abstract={We study the role of the dipolar interaction, correctly accounting for the
 Dipolar-Induced Resonance (DIR), in a quasi-one-dimensional system of ultracold
 bosons. We first show how the DIR affects the lowest-energy states of two
 particles in a harmonic trap. Then, we consider a deep optical lattice loaded
@ -30,20 +31,20 @@ atom-dimer extended Bose-Hubbard model. We analyze the impact of the DIR on the
 phase diagram at T=0 by exact diagonalization of a small-sized system. In
 particular, the resonance strongly modifies the range of parameters for which a
 mass density wave should occur.},
-	archiveprefix={arXiv},
-	author={N. Bartolo and D. J. Papoular and L. Barbiero and C. Menotti and A. Recati},
-	eprint={1303.3130v1},
-	file={%sN_Bartolo_A_Recati-j-2013.pdf},
-	link={http://arxiv.org/abs/1303.3130v1},
-	month={Mar},
-	primaryclass={cond-mat.quant-gas},
-	tag={},
-	title={Dipolar-Induced Resonance for Ultracold Bosons in a Quasi-1D Optical
+        archiveprefix={arXiv},
+        author={N. Bartolo and D. J. Papoular and L. Barbiero and C. Menotti and A. Recati},
+        eprint={1303.3130v1},
+        file={%sN_Bartolo_A_Recati-j-2013.pdf},
+        link={http://arxiv.org/abs/1303.3130v1},
+        month={Mar},
+        primaryclass={cond-mat.quant-gas},
+        tag={},
+        title={Dipolar-Induced Resonance for Ultracold Bosons in a Quasi-1D Optical
 Lattice},
-	year={2013},
+        year={2013},
 }""" % config.get("folder")
-        self.bibtex_article = BibTexParser(self.bibtex_article_string).get_entry_dict()
-        self.bibtex_article = self.bibtex_article[self.bibtex_article.keys()[0]]
+        self.bibtex_article = bibtexparser.loads(self.bibtex_article_string).entries_dict
+        self.bibtex_article = self.bibtex_article[list(self.bibtex_article.keys())[0]]

        self.bibtex_book_string = """
@book{9780521846516,
@ -54,8 +55,8 @@ Lattice},
    year={2008},
 }
 """
-        self.bibtex_book = BibTexParser(self.bibtex_book_string).get_entry_dict()
-        self.bibtex_book = self.bibtex_book[self.bibtex_book.keys()[0]]
+        self.bibtex_book = bibtexparser.loads(self.bibtex_book_string).entries_dict
+        self.bibtex_book = self.bibtex_book[list(self.bibtex_book.keys())[0]]

    def test_getNewName_article(self):
        self.assertEqual(getNewName("test.pdf", self.bibtex_article),
@ -81,7 +82,7 @@ Lattice},

    def test_bibtexEdit(self):
        bibtexAppend(self.bibtex_article)
-        bibtexEdit(self.bibtex_article['id'], {'id': 'bidule'})
+        bibtexEdit(self.bibtex_article['ID'], {'ID': 'bidule'})
        with open(config.get("folder")+'index.bib', 'r') as fh:
            self.assertEqual(fh.read(),
                             '@article{bidule,\n\tabstract={We study the role of the dipolar interaction, correctly accounting for the\nDipolar-Induced Resonance (DIR), in a quasi-one-dimensional system of ultracold\nbosons. We first show how the DIR affects the lowest-energy states of two\nparticles in a harmonic trap. Then, we consider a deep optical lattice loaded\nwith ultracold dipolar bosons. We describe this many-body system using an\natom-dimer extended Bose-Hubbard model. We analyze the impact of the DIR on the\nphase diagram at T=0 by exact diagonalization of a small-sized system. In\nparticular, the resonance strongly modifies the range of parameters for which a\nmass density wave should occur.},\n\tarchiveprefix={arXiv},\n\tauthor={N. Bartolo and D. J. Papoular and L. Barbiero and C. Menotti and A. Recati},\n\teprint={1303.3130v1},\n\tfile={'+config.get("folder")+'N_Bartolo_A_Recati-j-2013.pdf},\n\tlink={http://arxiv.org/abs/1303.3130v1},\n\tmonth={Mar},\n\tprimaryclass={cond-mat.quant-gas},\n\ttag={},\n\ttitle={Dipolar-Induced Resonance for Ultracold Bosons in a Quasi-1D Optical\nLattice},\n\tyear={2013},\n}\n\n\n')
@ -97,9 +98,9 @@ Lattice},
        self.bibtex_article['file'] = config.get("folder")+'test.pdf'
        bibtexAppend(self.bibtex_article)
        open(config.get("folder")+'test.pdf', 'w').close()
-        deleteId(self.bibtex_article['id'])
+        deleteId(self.bibtex_article['ID'])
        with open(config.get("folder")+'index.bib', 'r') as fh:
-            self.assertEquals(fh.read().strip(), "")
+            self.assertEqual(fh.read().strip(), "")
        self.assertFalse(os.path.isfile(config.get("folder")+'test.pdf'))

    def test_deleteFile(self):
@ -108,7 +109,7 @@ Lattice},
        open(config.get("folder")+'test.pdf', 'w').close()
        deleteFile(self.bibtex_article['file'])
        with open(config.get("folder")+'index.bib', 'r') as fh:
-            self.assertEquals(fh.read().strip(), "")
+            self.assertEqual(fh.read().strip(), "")
        self.assertFalse(os.path.isfile(config.get("folder")+'test.pdf'))

    def test_diffFilesIndex(self):
@ -117,12 +118,12 @@ Lattice},

    def test_getBibtex(self):
        bibtexAppend(self.bibtex_article)
-        got = getBibtex(self.bibtex_article['id'])
+        got = getBibtex(self.bibtex_article['ID'])
        self.assertEqual(got, self.bibtex_article)

    def test_getBibtex_id(self):
        bibtexAppend(self.bibtex_article)
-        got = getBibtex(self.bibtex_article['id'], file_id='id')
+        got = getBibtex(self.bibtex_article['ID'], file_id='id')
        self.assertEqual(got, self.bibtex_article)

    def test_getBibtex_file(self):
@ -133,16 +134,16 @@ Lattice},
        self.assertEqual(got, self.bibtex_article)

    def test_getBibtex_clean(self):
-        config.set("ignore_fields", ['id', 'abstract'])
+        config.set("ignore_fields", ['ID', 'abstract'])
        bibtexAppend(self.bibtex_article)
-        got = getBibtex(self.bibtex_article['id'], clean=True)
+        got = getBibtex(self.bibtex_article['ID'], clean=True)
        for i in config.get("ignore_fields"):
            self.assertNotIn(i, got)

    def test_getEntries(self):
        bibtexAppend(self.bibtex_article)
        self.assertEqual(getEntries(),
-                         [self.bibtex_article['id']])
+                         [self.bibtex_article['ID']])

    def test_updateArxiv(self):
        # TODO
--- a/libbmc/tests/test_config.py
+++ b/libbmc/tests/test_config.py
@ -8,12 +8,13 @@
 # <del>beer</del> soda in return.
 #                                                                   Phyks
 # -----------------------------------------------------------------------------
+from __future__ import unicode_literals
 import unittest
-from config import Config
 import json
 import os
 import tempfile
 import shutil
+from libbmc.config import Config


 class TestConfig(unittest.TestCase):
--- a/libbmc/tests/test_fetcher.py
+++ b/libbmc/tests/test_fetcher.py
@ -10,16 +10,16 @@
 # -----------------------------------------------------------------------------

 import unittest
-from fetcher import *
+from libbmc.fetcher import *


 class TestFetcher(unittest.TestCase):
    def setUp(self):
-        with open("tests/src/doi.bib", 'r') as fh:
+        with open("libbmc/tests/src/doi.bib", 'r') as fh:
            self.doi_bib = fh.read()
-        with open("tests/src/arxiv.bib", 'r') as fh:
+        with open("libbmc/tests/src/arxiv.bib", 'r') as fh:
            self.arxiv_bib = fh.read()
-        with open("tests/src/isbn.bib", 'r') as fh:
+        with open("libbmc/tests/src/isbn.bib", 'r') as fh:
            self.isbn_bib = fh.read()

    def test_download(self):
@ -35,13 +35,13 @@ class TestFetcher(unittest.TestCase):

    def test_findISBN_DJVU(self):
        # ISBN is incomplete in this test because my djvu file is bad
-        self.assertEqual(findISBN("tests/src/test_book.djvu"), '978295391873')
+        self.assertEqual(findISBN("libbmc/tests/src/test_book.djvu"), '978295391873')

    def test_findISBN_PDF(self):
-        self.assertEqual(findISBN("tests/src/test_book.pdf"), '9782953918731')
+        self.assertEqual(findISBN("libbmc/tests/src/test_book.pdf"), '9782953918731')

    def test_findISBN_False(self):
-        self.assertFalse(findISBN("tests/src/test.pdf"))
+        self.assertFalse(findISBN("libbmc/tests/src/test.pdf"))

    def test_isbn2Bib(self):
        self.assertEqual(isbn2Bib('0198507194'), self.isbn_bib)
@ -50,16 +50,22 @@ class TestFetcher(unittest.TestCase):
        self.assertEqual(isbn2Bib('foo'), '')

    def test_findDOI_PDF(self):
-        self.assertEqual(findDOI("tests/src/test.pdf"),
-                         "10.1103/physrevlett.112.253201")
+        self.assertEqual(findArticleID("libbmc/tests/src/test.pdf"),
+                         ("DOI", "10.1103/physrevlett.112.253201"))

-    def test_findDOI_DJVU(self):
+    def test_findOnlyDOI(self):
+        self.assertEqual(findArticleID("libbmc/tests/src/test.pdf",
+                                only=["DOI"]),
+                         ("DOI", "10.1103/physrevlett.112.253201"))
+
+    def test_findDOID_DJVU(self):
        # DOI is incomplete in this test because my djvu file is bad
-        self.assertEqual(findDOI("tests/src/test.djvu"),
-                         "10.1103/physrevlett.112")
+        self.assertEqual(findArticleID("libbmc/tests/src/test.djvu"),
+                         ("DOI", "10.1103/physrevlett.112"))

    def test_findDOI_False(self):
-        self.assertFalse(findDOI("tests/src/test_arxiv_multi.pdf"))
+        self.assertFalse(findArticleID("libbmc/tests/src/test_arxiv_multi.pdf",
+                                       only=["DOI"])[0])

    def test_doi2Bib(self):
        self.assertEqual(doi2Bib('10.1103/physreva.88.043630'), self.doi_bib)
@ -68,8 +74,18 @@ class TestFetcher(unittest.TestCase):
        self.assertEqual(doi2Bib('blabla'), '')

    def test_findArXivId(self):
-        self.assertEqual(findArXivId("tests/src/test_arxiv_multi.pdf"),
-                         '1303.3130v1')
+        self.assertEqual(findArticleID("libbmc/tests/src/test_arxiv_multi.pdf"),
+                         ("arXiv", '1303.3130v1'))
+
+    def test_findOnlyArXivId(self):
+        self.assertEqual(findArticleID("libbmc/tests/src/test_arxiv_multi.pdf",
+                                only=["arXiv"]),
+                         ("arXiv", '1303.3130v1'))
+
+    def test_findArticleID(self):
+        # cf https://github.com/Phyks/BMC/issues/19
+        self.assertEqual(findArticleID("libbmc/tests/src/test_arxiv_doi_conflict.pdf"),
+                         ("arXiv", '1107.4487v1'))

    def test_arXiv2Bib(self):
        self.assertEqual(arXiv2Bib('1303.3130v1'), self.arxiv_bib)
@ -78,7 +94,7 @@ class TestFetcher(unittest.TestCase):
        self.assertEqual(arXiv2Bib('blabla'), '')

    def test_findHALId(self):
-        self.assertTupleEqual(findHALId("tests/src/test_hal.pdf"),
+        self.assertTupleEqual(findHALId("libbmc/tests/src/test_hal.pdf"),
                              ('hal-00750893', '3'))

 if __name__ == '__main__':
--- a/libbmc/tests/test_tools.py
+++ b/libbmc/tests/test_tools.py
@ -8,9 +8,10 @@
 # <del>beer</del> soda in return.
 #                                                                   Phyks
 # -----------------------------------------------------------------------------
+from __future__ import unicode_literals

 import unittest
-from tools import *
+from libbmc.tools import *


 class TestTools(unittest.TestCase):
@ -18,7 +19,7 @@ class TestTools(unittest.TestCase):
        self.assertEqual(slugify(u"à&é_truc.pdf"), "ae_trucpdf")

    def test_parsed2Bibtex(self):
-        parsed = {'type': 'article', 'id': 'test', 'field1': 'test1',
+        parsed = {'ENTRYTYPE': 'article', 'ID': 'test', 'field1': 'test1',
                  'field2': 'test2'}
        expected = ('@article{test,\n\tfield1={test1},\n' +
                    '\tfield2={test2},\n}\n\n')
--- a/libbmc/tools.py
+++ b/libbmc/tools.py
@ -10,11 +10,17 @@
 # -----------------------------------------------------------------------------


-from __future__ import print_function
+from __future__ import print_function, unicode_literals
 import os
 import re
 import sys
-from termios import tcflush, TCIOFLUSH
+if os.name == "posix":
+    from termios import tcflush, TCIOFLUSH
+
+try:
+    input = raw_input
+except NameError:
+    pass

 _slugify_strip_re = re.compile(r'[^\w\s-]')
 _slugify_hyphenate_re = re.compile(r'[\s]+')
@ -27,18 +33,22 @@ def slugify(value):
    From Django's "django/template/defaultfilters.py".
    """
    import unicodedata
-    if not isinstance(value, unicode):
-        value = unicode(value)
-    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
-    value = unicode(_slugify_strip_re.sub('', value).strip())
+    try:
+        unicode_type = unicode
+    except NameError:
+        unicode_type = str
+    if not isinstance(value, unicode_type):
+        value = unicode_type(value)
+    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
+    value = unicode_type(_slugify_strip_re.sub('', value).strip())
    return _slugify_hyphenate_re.sub('_', value)


 def parsed2Bibtex(parsed):
    """Convert a single bibtex entry dict to bibtex string"""
-    bibtex = '@'+parsed['type']+'{'+parsed['id']+",\n"
+    bibtex = '@'+parsed['ENTRYTYPE']+'{'+parsed['ID']+",\n"

-    for field in [i for i in sorted(parsed) if i not in ['type', 'id']]:
+    for field in [i for i in sorted(parsed) if i not in ['ENTRYTYPE', 'ID']]:
        bibtex += "\t"+field+"={"+parsed[field]+"},\n"
    bibtex += "}\n\n"
    return bibtex
@ -51,21 +61,21 @@ def getExtension(filename):

 def replaceAll(text, dic):
    """Replace all the dic keys by the associated item in text"""
-    for i, j in dic.iteritems():
+    for i, j in dic.items():
        text = text.replace(i, j)
    return text


 def rawInput(string):
    """Flush stdin and then prompt the user for something"""
-    tcflush(sys.stdin, TCIOFLUSH)
-    return raw_input(string).decode('utf-8')
+    if os.name == "posix":
+        tcflush(sys.stdin, TCIOFLUSH)
+    return input(string)


 def warning(*objs):
    """Write warnings to stderr"""
-    printed = [i.encode('utf-8') for i in objs]
-    print("WARNING: ", *printed, file=sys.stderr)
+    print("WARNING: ", *objs, file=sys.stderr)


 def listDir(path):
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,15 @@
+#!/usr/bin/env python
+
+from distutils.core import setup
+
+setup(
+    name         = 'BMC',
+    version      = "0.3dev",
+    url          = "https://github.com/Phyks/BMC",
+    author       = "",
+    license      = "no-alcohol beer-ware license",
+    author_email = "",
+    description  = "simple script to download and store your articles",
+    packages     = ['libbmc'],
+    scripts      = ['bmc.py'],
+)
Author	SHA1	Message	Date
Lucas Verney	96a85feec0	Merge branch 'master' of https://github.com/Phyks/BMC	2016-01-10 17:59:41 +01:00
Lucas Verney	4dbc13e44c	Add link to libbmc	2016-01-10 17:59:23 +01:00
Lucas Verney	c84159068c	Merge pull request #33 from bcbnz/fixdoisearch Search for Digital Object Identifier as well as DOI in text.	2015-12-07 19:08:20 +01:00
Blair Bonnett	330c2f2b5f	Search for Digital Object Identifier as well as DOI in text. If the paper identifier is marked with Digital Object Identifier, but one or more of its references has a DOI link in it, then the reference DOI is taken as the paper one. This change replaces the words Digital Object Identifier with DOI in the text being searched to pull out the correct ID.	2015-12-07 15:39:57 +13:00
Phyks	5f8665940d	Fix unittests for Python 2.7	2015-09-05 16:55:14 +02:00
Phyks	f7bcdece5f	Fix unittests for Python 2.7	2015-09-05 16:49:56 +02:00
Phyks	1b83f01581	Fix unittests	2015-09-05 16:46:11 +02:00
Phyks	851db96fa8	Fix entries names	2015-08-31 16:04:41 +02:00
Lucas Verney	a25f21f451	Rename type dict entry to entrytype according to change in bibtexparser 0.6.1	2015-08-21 23:45:19 +02:00
Lucas Verney	94c2771e4e	Merge pull request #28 from sciunto/license add LICENSE file	2015-06-11 18:09:29 +02:00
François Boulogne	80b4064396	add LICENSE file	2015-06-11 11:46:58 -04:00
Phyks	d2e415a1c5	Do not specify Python version by default	2015-06-11 16:56:14 +02:00
Phyks	b655c50f07	Add --keep argument for delete action, see issue #26	2015-06-11 16:54:52 +02:00
Phyks	3232fc68be	Ensure Papers dir exist, see issue #27	2015-06-11 16:54:52 +02:00
Phyks	82ed48a9e0	Fix on Windows + Fix issue #25	2015-06-11 16:54:43 +02:00
Phyks	84a7a1cd63	Absolute paths for bib index	2015-06-08 21:15:18 +02:00
Phyks	e7edf7e5bf	Fix remaining bug	2015-06-08 20:02:09 +02:00
Phyks	0ba55402d2	Trailing new line in isbn bib	2015-06-06 16:36:07 +02:00
Phyks	ab27d96f96	Fix Travis build	2015-06-06 16:28:39 +02:00
Phyks	7c54c9fd2e	Add an option to leave the imported file in place	2015-06-06 16:03:32 +02:00
Phyks	fbb158543b	Fix issue #21	2014-12-03 12:54:24 +01:00
Phyks	485516db07	Update doc for unittests	2014-12-03 12:18:39 +01:00
Phyks	ce619b9cfe	Fix issue #21 + encoding	2014-11-30 16:44:04 +01:00
Lucas Verney	f357f4600c	Merge pull request #22 from drvinceknight/master Adding build file to gitignore	2014-11-30 16:30:22 +01:00
vince	75dd7b4e57	Adding build file to gitignore	2014-11-30 12:18:00 +00:00
Phyks	3e6d5f490f	Forgot a comma	2014-11-05 14:46:39 +01:00
Phyks	2e6d9c0f79	Sort keys in config	2014-11-05 14:44:05 +01:00
Phyks	c1055ecb8c	Update config for pretty printing	2014-11-05 00:02:43 +01:00
Phyks	08f6b8846a	Catch socket.error exceptions as seen in issue #20	2014-11-04 21:51:44 +01:00
Phyks	0fb579a58c	Forgot to update function names in bmc.py	2014-11-04 21:46:47 +01:00
Phyks	2d949dd299	Forgot to push the PDF file for unittests	2014-11-04 21:12:22 +01:00
Phyks	a39c1d94d0	Solve issue #19	2014-11-04 21:11:19 +01:00
Phyks	9f821a409c	setup.py	2014-10-11 23:19:32 +02:00
Phyks	9a80f0c1fe	Update src tests files	2014-10-07 11:37:26 +02:00
Phyks	e7b409c8b7	Horrible file writing	2014-10-07 11:30:44 +02:00
Phyks	d506124f9d	Fix test files	2014-10-07 11:26:06 +02:00
Phyks	1ed4c4e623	Update README	2014-10-07 11:22:13 +02:00
Phyks	a8d9f4e7b5	Kick Python 3.2	2014-08-04 12:58:43 +02:00
Phyks	096241ea48	Merge branch 'master' of https://github.com/Phyks/BMC	2014-08-04 00:18:49 +02:00
Phyks	d059005946	Fix for Python 3	2014-08-03 23:34:17 +02:00
Phyks	5bf247205f	Update README for Python3 compatibility	2014-08-03 23:12:43 +02:00
Phyks	9ab00fdded	Small differences in between py2 and py3	2014-08-03 22:59:00 +02:00
Phyks	f311de7043	Fix	2014-08-03 21:52:01 +02:00
Phyks	a07c2ea292	Fix sys.stdout.encoding error	2014-08-03 21:37:34 +02:00
Phyks	3c88752cf9	Further bugfixes for python3	2014-08-03 21:20:48 +02:00
Phyks	ed449e17e2	Fix for python3	2014-08-03 19:10:30 +02:00
Phyks	bb297adfc5	Further fixes for python3	2014-08-03 12:38:40 +02:00
Phyks	07d8d43a7c	Fix tools.py for python3	2014-08-03 00:40:37 +02:00
Phyks	15ccbb95c9	Update Travis instructions	2014-08-03 00:22:08 +02:00
Phyks	7fda1bd5fa	Flake8 fixes	2014-08-03 00:17:01 +02:00
Phyks	35541a43e6	Fix imports + subparsers in python3	2014-08-03 00:09:07 +02:00
Phyks	ce9d13eafa	Edit README.md accordingly	2014-08-02 23:35:29 +02:00
Phyks	5f908a6d7b	Rewrite to use PySocks	2014-08-02 23:34:34 +02:00
Lucas Verney	da555c0bad	Update README.md Fix Travis icon	2014-08-02 22:03:50 +02:00
Phyks	1a03ab6d70	Fix imports	2014-08-01 01:33:32 +02:00
Phyks	8e79bf214e	Update tests	2014-08-01 01:00:16 +02:00
Phyks	229a3617ee	Update tests	2014-08-01 00:50:40 +02:00
Lucas Verney	cbc2a175a5	Update README.md Add Travis build status.	2014-08-01 00:45:04 +02:00
Lucas Verney	20517210ff	Merge pull request #14 from sciunto/master Tear pages	2014-07-13 17:44:16 +02:00
François Boulogne	61801c50f6	Update readme for tearpages	2014-07-12 23:03:02 -04:00
François Boulogne	f2bfdf5336	I, F. Boulogne, as the author, relicense this code.	2014-07-12 23:00:59 -04:00
Lucas Verney	818b811e24	Merge pull request #13 from sciunto/setup add a setup.py	2014-07-12 01:26:54 +02:00
François Boulogne	df4c929ef8	add a setup.py	2014-07-11 19:22:00 -04:00
Lucas Verney	0742f3f4c0	Update README.md Add @ßciunto in the thanks part.	2014-07-11 10:32:04 +02:00
Lucas Verney	ae66f3b04c	Merge pull request #12 from sciunto/lib Store libs in a specific directory	2014-07-11 10:29:54 +02:00
François Boulogne	7e570322c0	fix paths	2014-07-10 22:56:47 -04:00
François Boulogne	22e4a09bda	fix import	2014-07-10 22:52:49 -04:00
François Boulogne	f123bc3ad1	add lib directory	2014-07-10 22:50:16 -04:00