Go to file
2013-05-11 16:10:48 +02:00
translation-server@4d35648672 translation server submodule: switch over to kanzure's github and the paperbot branch specifically 2013-02-05 01:21:41 -08:00
.gitignore initial commit 2013-01-07 22:27:46 -08:00
.gitmodules translation server submodule: switch over to kanzure's github and the paperbot branch specifically 2013-02-05 01:21:41 -08:00
papers.py Config file, SOCKS support, multiple servers 2013-05-11 16:10:48 +02:00
params.py.example Config file, SOCKS support, multiple servers 2013-05-11 16:10:48 +02:00
README.md Config file, SOCKS support, multiple servers 2013-05-11 16:10:48 +02:00
requirements.txt Config file, SOCKS support, multiple servers 2013-05-11 16:10:48 +02:00

paperbot

Paperbot is an command line utility that fetches academic papers. When given a URL on stdin or as a CLI argument, it fetches the content and returns a public link on stdout. This seems to help enhance the quality of discussion and make us less ignorant.

Paperbot can easily be turned back into an IRC bot with irctk

## deets

All content is scraped using zotero/translators. These are javascript scrapers that work on a large number of academic publisher sites and are actively maintained. Paperbot offloads links to zotero/translation-server, which runs the zotero scrapers headlessly in a gecko and xulrunner environment. The scrapers return metadata and a link to the pdf. Then paperbot fetches that particular pdf. When given a link straight to a pdf, which paperbot is also happy to compulsively archive it.

Paperbot can try multiple instances of translation-server (configured to use different ways to access content) and different SOCKS proxies to retrieve the content.

## license

BSD. Original project is: https://github.com/kanzure/paperbot