Update doc and fix relationships
* Update README.md documentation. * Fix "cite" relationship import. Papers created when parsing references were not given a "cite" relationship. TODO: The latest fix creates too much recursion when adding a given paper. Adding a paper now basically means "crawling arXiv".
This commit is contained in:
parent
206369e0bb
commit
ca1b24e66a
241
README.md
241
README.md
@ -3,7 +3,41 @@ Metadata for arXiv
|
||||
|
||||
The goal of this repository is to provide a minimal API to put metadata on arXiv papers.
|
||||
|
||||
TODO: Better description + API description.
|
||||
## Introduction
|
||||
|
||||
Most of the published scientific papers are availabe online, as preprints. For
|
||||
physics and computer science, most of them are available on the
|
||||
[arXiv](http://arxiv.org/) repository. Published paper get a unique (global)
|
||||
identifier, a [DOI](https://en.wikipedia.org/wiki/Digital_object_identifier).
|
||||
Preprint papers released on arXiv get a unique
|
||||
[identifier](https://arxiv.org/help/arxiv_identifier). Correspondance between
|
||||
these two identifiers can be made quite easily once the preprint is published,
|
||||
as some publishers pushes back the DOIs to arXiv.
|
||||
|
||||
Then, all these articles can be easily identified and tracked. However, very
|
||||
small use of this is done, and especially there is no way to post metadata
|
||||
between articles. For example, getting a (usable) list of articles referencing
|
||||
a given article, or referenced by it, is very difficult (and a textual
|
||||
bibliography is not a usable list of articles, as it is truly difficult to
|
||||
parse).
|
||||
|
||||
This basic Python code offers a way to add some metadata between articles. One
|
||||
can import articles in it. It automatically tries to fetch referenced papers
|
||||
and add the corresponding relationships between these papers and the added
|
||||
paper. Relationships are reversible which means one can easily get the papers
|
||||
citing a given paper.
|
||||
|
||||
It offers an API to add extra metadata. One could for instance imagine adding
|
||||
others relations between papers to say they are similar, extra possible
|
||||
reference, or so on.
|
||||
|
||||
One could even imagine extending this further to tag papers, just as arXiv do
|
||||
(in some sort) with their "categories" (such as
|
||||
[cond-mat](http://arxiv.org/archive/cond-mat)) so that researchers could follow
|
||||
tags relative to their area of research, and get a narrower and better targeted
|
||||
list of papers everyday. Plus everyone could tag articles in a collaborative
|
||||
way, so that some papers which might be of interest for a field, but were not
|
||||
tagged as such, would reach it anyway.
|
||||
|
||||
|
||||
## Installation
|
||||
@ -23,11 +57,214 @@ For building `opendetex` (which is a necessary dependency), you will need
|
||||
|
||||
You can test it easily using the Bottle built-in webserver. This is the default configuration.
|
||||
|
||||
To start the app, just run `python3 ./main.py` and head to http://localhost:8080.
|
||||
To start the app, just run `python3 ./main.py` and head to [http://localhost:8080](http://localhost:8080).
|
||||
|
||||
|
||||
You should not use this server in production, and should edit `main.py` accordingly.
|
||||
|
||||
## API
|
||||
|
||||
### Index
|
||||
|
||||
```
|
||||
GET /
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"papers": "/papers/?id={id}&doi={doi}&arxiv_id={arxiv_id}",
|
||||
}
|
||||
```
|
||||
|
||||
### Get papers
|
||||
|
||||
```
|
||||
GET /papers
|
||||
Accept: application/vnd.api+json
|
||||
```
|
||||
|
||||
One can filter further using `id={id}`, `doi={doi}` or `arxiv_id={arxiv_id}`
|
||||
query parameters.
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"type": "papers",
|
||||
"id": 1,
|
||||
"attributes": {
|
||||
"doi": "10.1126/science.1252319",
|
||||
"arxiv_id": "1401.2910"
|
||||
},
|
||||
"links": {
|
||||
"self": "/papers/1"
|
||||
},
|
||||
"relationships": {
|
||||
"cite": {
|
||||
"links": {
|
||||
"related": "/papers/1/relationships/cite"
|
||||
}
|
||||
},
|
||||
…
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### Get a paper
|
||||
|
||||
```
|
||||
GET /papers/1
|
||||
Accept: application/vnd.api+json
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"type": "papers",
|
||||
"id": 1,
|
||||
"attributes": {
|
||||
"doi": "10.1126/science.1252319",
|
||||
"arxiv_id": "1401.2910"
|
||||
},
|
||||
"links": {
|
||||
"self": "/papers/1"
|
||||
},
|
||||
"relationships": {
|
||||
"cite": {
|
||||
"links": {
|
||||
"related": "/papers/1/relationships/cite"
|
||||
}
|
||||
},
|
||||
…
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### Get the relationships of a paper
|
||||
|
||||
```
|
||||
GET /papers/1/relationships/cite
|
||||
Accept: application/vnd.api+json
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"links": {
|
||||
"self": "/papers/1/relationships/cite",
|
||||
"related": "/papers/1/cite"
|
||||
},
|
||||
"data": [
|
||||
{
|
||||
"type": "papers",
|
||||
"id": 2,
|
||||
},
|
||||
…
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The previous relationship is to be understood as `paper 1 cites paper 2`.
|
||||
|
||||
Using `?reverse=1`, one can reverse the relationships (ie get results for
|
||||
papers that cites the paper identified by the id in the URL, in the previous
|
||||
case).
|
||||
|
||||
|
||||
### Post a paper
|
||||
|
||||
```
|
||||
POST /papers
|
||||
Content-Type: application/vnd.api+json
|
||||
Accept: application/vnd.api+json
|
||||
|
||||
{
|
||||
"data": {
|
||||
"doi": "10.1126/science.1252319",
|
||||
// OR
|
||||
"arxiv_id": "1401.2910"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`arxiv_id` (respectively `doi`) is fetched automatically if available.
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
{
|
||||
"type": "papers",
|
||||
"id": 1,
|
||||
"attributes": {
|
||||
"doi": "10.1126/science.1252319",
|
||||
"arxiv_id": "1401.2910"
|
||||
},
|
||||
"links": {
|
||||
"self": "/papers/1"
|
||||
},
|
||||
"relationships": {
|
||||
"cite": {
|
||||
"links": {
|
||||
"related": "/papers/1/relationships/cite"
|
||||
}
|
||||
},
|
||||
…
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### Create a relationship between two papers
|
||||
|
||||
```
|
||||
POST /papers/1/relationships/cite
|
||||
Content-Type: application/vnd.api+json
|
||||
Accept: application/vnd.api+json
|
||||
|
||||
{
|
||||
"data": [
|
||||
{ "type": "cite", "id": "2" },
|
||||
…
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Response is empty HTTP 204.
|
||||
|
||||
|
||||
### Delete a paper and associated relationships
|
||||
|
||||
```
|
||||
DELETE /papers/1
|
||||
Accept: application/vnd.api+json
|
||||
```
|
||||
|
||||
Response is empty HTTP 204.
|
||||
|
||||
|
||||
### Delete a relationship between two papers
|
||||
|
||||
```
|
||||
DELETE /papers/1/relationships/cite
|
||||
Content-Type: application/vnd.api+json
|
||||
Accept: application/vnd.api+json
|
||||
|
||||
{
|
||||
"data": [
|
||||
{ "type": "cite", "id": "2" },
|
||||
…
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Response is empty HTTP 204.
|
||||
|
||||
|
||||
## Associated library
|
||||
|
||||
|
@ -81,18 +81,7 @@ def create_paper(db):
|
||||
"data": paper.json_api_repr()
|
||||
}
|
||||
# Import "cite" relation
|
||||
if paper.arxiv_id is not None:
|
||||
# Get the cited DOIs
|
||||
cited_dois = arxiv.get_cited_dois(paper.arxiv_id)
|
||||
# Filter out the ones that were not matched
|
||||
cited_dois = [cited_dois[k]
|
||||
for k in cited_dois if cited_dois[k] is not None]
|
||||
for doi in cited_dois:
|
||||
right_paper = create_by_doi(doi, db)
|
||||
if right_paper is None:
|
||||
right_paper = (db.query(database.Paper).
|
||||
filter_by(doi=doi).first())
|
||||
update_relationship_backend(paper.id, right_paper.id, "cite", db)
|
||||
add_cite_relationship(paper, db)
|
||||
# Return 200 with the correct body
|
||||
headers = {"Location": "/papers/%d" % (paper.id,)}
|
||||
return tools.APIResponse(status=200,
|
||||
@ -157,10 +146,61 @@ def create_by_arxiv(arxiv, db):
|
||||
return paper
|
||||
|
||||
|
||||
def add_cite_relationship(paper, db):
|
||||
"""
|
||||
Add the "cite" relationships between the provided paper and the papers
|
||||
referenced by it.
|
||||
|
||||
:param paper: The paper to fetch references from.
|
||||
:param db: A database session
|
||||
:returns: Nothing.
|
||||
"""
|
||||
# TODO: Known bug: too many levels of recursion!
|
||||
# If paper is on arXiv
|
||||
if paper.arxiv_id is not None:
|
||||
# Get the cited DOIs
|
||||
cited_dois = arxiv.get_cited_dois(paper.arxiv_id)
|
||||
# Filter out the ones that were not matched
|
||||
cited_dois = [cited_dois[k]
|
||||
for k in cited_dois if cited_dois[k] is not None]
|
||||
for doi in cited_dois:
|
||||
# Get the associated paper in the db
|
||||
right_paper = (db.query(database.Paper).
|
||||
filter_by(doi=doi).first())
|
||||
if right_paper is None:
|
||||
# If paper does not exist in db, add it
|
||||
right_paper = create_by_doi(doi, db)
|
||||
# Update cite relationship for this paper, recursively
|
||||
add_cite_relationship(right_paper, db)
|
||||
# Update the relationships
|
||||
update_relationship_backend(paper.id, right_paper.id, "cite", db)
|
||||
# If paper is not on arXiv, nothing to do
|
||||
else:
|
||||
return
|
||||
|
||||
|
||||
def update_relationships(id, name, db):
|
||||
"""
|
||||
Update the relationships associated to a given paper.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
POST /papers/1/relationships/cite
|
||||
Content-Type: application/vnd.api+json
|
||||
Accept: application/vnd.api+json
|
||||
|
||||
{
|
||||
"data": [
|
||||
{ "type": "cite", "id": "2" },
|
||||
…
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
HTTP 204
|
||||
|
||||
:param id: The id of the paper to update relationships.
|
||||
:param name: The name of the relationship to update.
|
||||
:param db: A database session, passed by Bottle plugin.
|
||||
|
Loading…
Reference in New Issue
Block a user