cozyweboob/README.md

174 lines
6.6 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

CozyWeboob
==========
This is an attempt at using [Weboob](http://weboob.org/) as a
[Cozy](http://cozy.io/) [Konnector](https://github.com/cozy-labs/konnectors).
It wraps around Weboob, receiving a JSON description of the modules to fetch
on `stdin` and returning a JSON of the fetched results on `stdout`.
Although the primary goal is to wrap around Weboob to use it in Cozy, this
script might be of interest for anyone willing to wrap around Weboob and
communicate with JSON pipes.
## Usage
First, you need to have Weboob installed on your system.
## Cozyweboob script
Typical command-line usage for this script is:
```bash
cat konnectors.json | python -m cozyweboob.__main__
```
where `konnectors.json` is a valid JSON file defining konnectors to be used.
## Server script
Typical command-line usage for this script is:
```bash
./server.py
```
This script spawns a Bottle webserver, listening on `localhost:8080` (by
default).
It exposes a couple of routes:
* the `/fetch` route, which supports `POST` method to send a valid JSON string
defining konnectors to be used as the request body. Typical example to send
it some content is:
```bash
curl -X POST --data "$(cat konnectors.json)" "http://localhost:8080/"
```
where `konnectors.json` is a valid JSON file defining konnectors to be used.
Downloaded files will be stored in a temporary directory, and their file URI
will be passed back in the output JSON. If you do not have a direct access
to the filesystem, you can use the `/retrieve` endpoint below to retrieve
such downloaded files through the network.
* the `/list` route, which will provide you a JSON dump of all the available
modules, their descriptions and the configuration options you should provide
them.
* the `/retrieve` route, which supports `POST` method and a single `path` `POST`
parameter which is the path to the previously downloaded file to retrieve.
Note that this route will not delete the temporary file whose content has
been retrieved, and you should delete it manually.
* the `/clean` route, which will delete all temporary downloaded files. This
route will return a JSON list of deleted folders.
**IMPORTANT:** Note this small webserver is **not** production ready and only
here as a proof of concept and to be used in a controlled development
environment. The `/retrieve` route will basically provide anyone to access any
file from your temp directory, which is a real security concern in production.
Note: You can specify the host and port to listen on using the
`COZYWEBOOB_HOST` and `COZYWEBOOB_PORT` environment variables.
## Conversation script
There is another command-line script available if you would rather communicate
with it in a conversation manner, using `stdin` and `stdout` (typically to
integrate it with Node modules using
[Python-shell](https://github.com/extrabacon/python-shell)). To run it, use:
```bash
./stdin_conversation.py
```
Then, you can write on `stdin` and fetch the responses from `stdout`.
Available commands are:
* `GET /list` to list all available modules.
* `POST /fetch JSON_PARAMS` where `JSON_PARAMS` is an input JSON for module
parameters.
Downloaded files will be stored in a temporary directory, and their file URI
will be passed back in the output JSON.
* `GET /clean` to clean temporary downloaded files.
* `exit` to quit the script and end the conversation.
JSON responses are the same one as from the HTTP server script. It is
basically the same script without HTTP encapsulation.
_Note_: To simplify the script, note that it only supports single line
commands. Then, your `JSON_PARAMS` should be the same single `stdin` line as
the `GET /fetch` part.
## Notes concerning all the available scripts
Using `COZYWEBOOB_ENV=debug`, you can enable debug features for all of these
scripts, which might be useful for development. These features are:
* Logging
* If you pass a blank field in a JSON konnector description
(typically `password: ""`), the script will ask you its value at runtime,
using `getpass`.
## Input JSON file
The JSON file read on `stdin` should have a specific structure. A typical
example is given in `konnectors.json.sample`.
Basically, it consists of a list of maps. Each map corresponds to a given
Weboob module to run, with a given set of parameters (then allowing the script
to run multiple times the same module with different configurations). Each
map should have at the following three keys:
* `name` is the name of the Weboob module to run (same name as used in
Weboob).
* `parameters` is a map of parameters to use for this particular module, as
required by the associated Weboob backend.
* `id` should be a unique string of your choice, to uniquely identify this run
of the specified module with the specified set of parameters.
* `actions` is an optional list of actions to perform. It should contains two
keys, `fetch` and `download`. For each key, you can either pass `true` to
completely handle the actions, or a map of capabilities associated to list
of contents to fetch.
Typically, you can pass `"fetch": { "CapDocument": ["bills"]}` to fetch only
bills from the `CapDocuments` capability. You can also pass
`"download": { "CapDocument": ["someID"] }` to download a specific document,
identified by its ID.
If not provided, the default is to fetch only, and do not download anything.
## Output JSON file
The resulting JSON file, on `stdout` is a map associating the `id` fields as
provided in input JSON file to a map of fetched data by this module.
Each module map has a `cookies` entry containing the cookies used to fetch the
data, so that any program running afterwards can download documents.
**Important** note: Most of such websites have very short lived sessions,
meaning in most cases these `cookies` will be useless for extra download as
the session will most likely be destroyed on the server side.
The other entries in these maps depend on the module capabilities as defined
by Weboob. Detailed informations about these other entires can be found in the
`doc/capabilities` folder.
## Contributing
All contributions are welcome. Feel free to make a PR :)
Python code is currently Python 2, but should be Python 3 compatible as Weboob
is moving towards Python 3. All Python code should be PEP8 compliant. I use
some extra rules, taken from PyLint.
## License
The content of this repository is licensed under an MIT license, unless
explicitly mentionned otherwise.
## Credits
* [Cozy](http://cozy.io/) and the cozy guys on #cozycloud @ freenode
* [Weboob](http://weboob.org/) and the weboob guys on #weboob @ freenode
* [Kresus](https://github.com/bnjbvr/kresus/) for giving the original idea and
base code.