2016-09-30 05:24:35 +02:00
|
|
|
|
CozyWeboob
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
This is an attempt at using [Weboob](http://weboob.org/) as a
|
|
|
|
|
[Cozy](http://cozy.io/) [Konnector](https://github.com/cozy-labs/konnectors).
|
|
|
|
|
It wraps around Weboob, receiving a JSON description of the modules to fetch
|
|
|
|
|
on `stdin` and returning a JSON of the fetched results on `stdout`.
|
|
|
|
|
|
|
|
|
|
Although the primary goal is to wrap around Weboob to use it in Cozy, this
|
|
|
|
|
script might be of interest for anyone willing to wrap around Weboob and
|
|
|
|
|
communicate with JSON pipes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
|
|
First, you need to have Weboob installed on your system.
|
|
|
|
|
|
2016-09-30 21:43:32 +02:00
|
|
|
|
## Cozyweboob script
|
|
|
|
|
|
|
|
|
|
Typical command-line usage for this script is:
|
2016-09-30 05:24:35 +02:00
|
|
|
|
```bash
|
2016-10-07 17:27:32 +02:00
|
|
|
|
cat konnectors.json | python -m cozyweboob.__main__
|
2016-09-30 05:24:35 +02:00
|
|
|
|
```
|
2016-09-30 21:43:32 +02:00
|
|
|
|
where `konnectors.json` is a valid JSON file defining konnectors to be used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Server script
|
|
|
|
|
|
|
|
|
|
Typical command-line usage for this script is:
|
|
|
|
|
```bash
|
|
|
|
|
./server.py
|
|
|
|
|
```
|
|
|
|
|
This script spawns a Bottle webserver, listening on `localhost:8080` (by
|
|
|
|
|
default).
|
|
|
|
|
|
2016-10-12 22:07:42 +02:00
|
|
|
|
It exposes a couple of routes:
|
|
|
|
|
|
|
|
|
|
* the `/fetch` route, which supports `POST` method to send a valid JSON string
|
2016-10-26 22:42:01 +02:00
|
|
|
|
defining konnectors to be used as the request body. Typical example to send
|
2016-10-12 22:07:42 +02:00
|
|
|
|
it some content is:
|
|
|
|
|
|
|
|
|
|
```bash
|
2016-10-26 22:42:01 +02:00
|
|
|
|
curl -X POST --data "$(cat konnectors.json)" "http://localhost:8080/"
|
2016-10-12 22:07:42 +02:00
|
|
|
|
```
|
|
|
|
|
where `konnectors.json` is a valid JSON file defining konnectors to be used.
|
|
|
|
|
Downloaded files will be stored in a temporary directory, and their file URI
|
|
|
|
|
will be passed back in the output JSON. If you do not have a direct access
|
|
|
|
|
to the filesystem, you can use the `/retrieve` endpoint below to retrieve
|
|
|
|
|
such downloaded files through the network.
|
2016-09-30 21:43:32 +02:00
|
|
|
|
|
2016-10-12 22:07:42 +02:00
|
|
|
|
* the `/list` route, which will provide you a JSON dump of all the available
|
|
|
|
|
modules, their descriptions and the configuration options you should provide
|
|
|
|
|
them.
|
2016-10-04 03:32:03 +02:00
|
|
|
|
|
2016-10-12 22:07:42 +02:00
|
|
|
|
* the `/retrieve` route, which supports `POST` method and a single `path` `POST`
|
|
|
|
|
parameter which is the path to the previously downloaded file to retrieve.
|
2016-10-12 23:03:49 +02:00
|
|
|
|
Note that this route will not delete the temporary file whose content has
|
|
|
|
|
been retrieved, and you should delete it manually.
|
|
|
|
|
|
|
|
|
|
* the `/clean` route, which will delete all temporary downloaded files. This
|
|
|
|
|
route will return a JSON list of deleted folders.
|
2016-10-04 03:32:03 +02:00
|
|
|
|
|
2016-10-12 22:07:42 +02:00
|
|
|
|
**IMPORTANT:** Note this small webserver is **not** production ready and only
|
|
|
|
|
here as a proof of concept and to be used in a controlled development
|
|
|
|
|
environment. The `/retrieve` route will basically provide anyone to access any
|
|
|
|
|
file from your temp directory, which is a real security concern in production.
|
2016-10-04 03:32:03 +02:00
|
|
|
|
|
2016-09-30 21:43:32 +02:00
|
|
|
|
Note: You can specify the host and port to listen on using the
|
|
|
|
|
`COZYWEBOOB_HOST` and `COZYWEBOOB_PORT` environment variables.
|
|
|
|
|
|
|
|
|
|
|
2016-10-07 04:29:33 +02:00
|
|
|
|
## Conversation script
|
|
|
|
|
|
|
|
|
|
There is another command-line script available if you would rather communicate
|
|
|
|
|
with it in a conversation manner, using `stdin` and `stdout` (typically to
|
|
|
|
|
integrate it with Node modules using
|
|
|
|
|
[Python-shell](https://github.com/extrabacon/python-shell)). To run it, use:
|
|
|
|
|
```bash
|
|
|
|
|
./stdin_conversation.py
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Then, you can write on `stdin` and fetch the responses from `stdout`.
|
|
|
|
|
Available commands are:
|
|
|
|
|
* `GET /list` to list all available modules.
|
2016-10-07 15:52:05 +02:00
|
|
|
|
* `POST /fetch JSON_PARAMS` where `JSON_PARAMS` is an input JSON for module
|
2016-10-07 04:29:33 +02:00
|
|
|
|
parameters.
|
2016-10-12 22:07:42 +02:00
|
|
|
|
Downloaded files will be stored in a temporary directory, and their file URI
|
|
|
|
|
will be passed back in the output JSON.
|
2016-11-01 04:44:32 +01:00
|
|
|
|
* `GET /clean` to clean temporary downloaded files.
|
2016-10-07 04:29:33 +02:00
|
|
|
|
* `exit` to quit the script and end the conversation.
|
|
|
|
|
|
|
|
|
|
JSON responses are the same one as from the HTTP server script. It is
|
|
|
|
|
basically the same script without HTTP encapsulation.
|
|
|
|
|
|
|
|
|
|
_Note_: To simplify the script, note that it only supports single line
|
|
|
|
|
commands. Then, your `JSON_PARAMS` should be the same single `stdin` line as
|
|
|
|
|
the `GET /fetch` part.
|
|
|
|
|
|
|
|
|
|
|
2016-09-30 21:43:32 +02:00
|
|
|
|
## Notes concerning all the available scripts
|
|
|
|
|
|
|
|
|
|
Using `COZYWEBOOB_ENV=debug`, you can enable debug features for all of these
|
|
|
|
|
scripts, which might be useful for development. These features are:
|
|
|
|
|
* Logging
|
|
|
|
|
* If you pass a blank field in a JSON konnector description
|
|
|
|
|
(typically `password: ""`), the script will ask you its value at runtime,
|
|
|
|
|
using `getpass`.
|
2016-09-30 05:24:35 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Input JSON file
|
|
|
|
|
|
|
|
|
|
The JSON file read on `stdin` should have a specific structure. A typical
|
|
|
|
|
example is given in `konnectors.json.sample`.
|
|
|
|
|
|
|
|
|
|
Basically, it consists of a list of maps. Each map corresponds to a given
|
|
|
|
|
Weboob module to run, with a given set of parameters (then allowing the script
|
|
|
|
|
to run multiple times the same module with different configurations). Each
|
|
|
|
|
map should have at the following three keys:
|
|
|
|
|
* `name` is the name of the Weboob module to run (same name as used in
|
|
|
|
|
Weboob).
|
|
|
|
|
* `parameters` is a map of parameters to use for this particular module, as
|
|
|
|
|
required by the associated Weboob backend.
|
|
|
|
|
* `id` should be a unique string of your choice, to uniquely identify this run
|
|
|
|
|
of the specified module with the specified set of parameters.
|
2016-10-07 18:29:12 +02:00
|
|
|
|
* `actions` is an optional list of actions to perform. It should contains two
|
|
|
|
|
keys, `fetch` and `download`. For each key, you can either pass `true` to
|
|
|
|
|
completely handle the actions, or a map of capabilities associated to list
|
|
|
|
|
of contents to fetch.
|
|
|
|
|
Typically, you can pass `"fetch": { "CapDocument": ["bills"]}` to fetch only
|
|
|
|
|
bills from the `CapDocuments` capability. You can also pass
|
2016-10-12 22:07:42 +02:00
|
|
|
|
`"download": { "CapDocument": ["someID"] }` to download a specific document,
|
|
|
|
|
identified by its ID.
|
2016-10-07 18:29:12 +02:00
|
|
|
|
If not provided, the default is to fetch only, and do not download anything.
|
2016-09-30 05:24:35 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Output JSON file
|
|
|
|
|
|
|
|
|
|
The resulting JSON file, on `stdout` is a map associating the `id` fields as
|
|
|
|
|
provided in input JSON file to a map of fetched data by this module.
|
|
|
|
|
|
|
|
|
|
Each module map has a `cookies` entry containing the cookies used to fetch the
|
|
|
|
|
data, so that any program running afterwards can download documents.
|
|
|
|
|
|
2016-09-30 20:26:11 +02:00
|
|
|
|
**Important** note: Most of such websites have very short lived sessions,
|
|
|
|
|
meaning in most cases these `cookies` will be useless for extra download as
|
|
|
|
|
the session will most likely be destroyed on the server side.
|
|
|
|
|
|
2016-09-30 05:24:35 +02:00
|
|
|
|
The other entries in these maps depend on the module capabilities as defined
|
2016-09-30 17:37:29 +02:00
|
|
|
|
by Weboob. Detailed informations about these other entires can be found in the
|
|
|
|
|
`doc/capabilities` folder.
|
2016-09-30 05:24:35 +02:00
|
|
|
|
|
|
|
|
|
|
2016-09-30 21:43:32 +02:00
|
|
|
|
## Contributing
|
|
|
|
|
|
|
|
|
|
All contributions are welcome. Feel free to make a PR :)
|
|
|
|
|
|
|
|
|
|
Python code is currently Python 2, but should be Python 3 compatible as Weboob
|
|
|
|
|
is moving towards Python 3. All Python code should be PEP8 compliant. I use
|
|
|
|
|
some extra rules, taken from PyLint.
|
|
|
|
|
|
|
|
|
|
|
2016-09-30 05:24:35 +02:00
|
|
|
|
## License
|
|
|
|
|
|
|
|
|
|
The content of this repository is licensed under an MIT license, unless
|
|
|
|
|
explicitly mentionned otherwise.
|
2016-09-30 17:37:29 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Credits
|
|
|
|
|
|
|
|
|
|
* [Cozy](http://cozy.io/) and the cozy guys on #cozycloud @ freenode
|
|
|
|
|
* [Weboob](http://weboob.org/) and the weboob guys on #weboob @ freenode
|
|
|
|
|
* [Kresus](https://github.com/bnjbvr/kresus/) for giving the original idea and
|
|
|
|
|
base code.
|