flatisfy/doc/0.getting_started.md

Getting started
===============


## Dependency on Weboob

**Important**: Flatisfy relies on [Weboob](http://weboob.org/) to fetch
housing posts from housing websites. Then, you should install the [`devel`
branch](https://git.weboob.org/weboob/devel/) and update it regularly,
especially if Flatisfy suddenly stops fetching housing posts.

If you `pip install -r requirements.txt` it will install the latest
development version of [Weboob](https://git.weboob.org/weboob/devel/) and the
[Weboob modules](https://git.weboob.org/weboob/modules/), which should be the
best version available out there. You should update these packages regularly,
as they evolve quickly.

Weboob is made of two parts: a core and modules (which is the actual code
fetching data from websites). Modules tend to break often and are then updated
often, you should keep them up to date. This can be done by installing the
`weboob-modules` package listed in the `requirements.txt` and using the
default configuration.

This is a safe default configuration. However, a better option is usually to
clone [Weboob git repo](https://git.weboob.org/weboob/devel/) somewhere, on
your disk, to point `modules_path` configuration option to
`path_to_weboob_git/modules` (see the configuration section below) and to run
a `git pull; python setup.py install` in the Weboob git repo often.


## TL;DR

An alternative method is available using Docker. See [2.docker.md](2.docker.md).

1. Clone the repository.
2. Install required Python modules: `pip install -r requirements.txt`.
3. Init a configuration file: `python -m flatisfy init-config > config.json`.
   Edit it according to your needs (see below).
4. Build the required data files:
   `python -m flatisfy build-data --config config.json`.
5. Use it to `fetch` (and output a filtered JSON list of flats) or `import`
   (into an SQLite database, for the web visualization) a list of flats
   matching your criteria.
6. Install JS libraries and build the webapp:
   `npm install && npm run build:dev` (use `build:prod` in production).
7. Use `python -m flatisfy serve --config config.json` to serve the web app.


## Available commands

The available commands are:

* `init-config` to generate an empty configuration file, either on the `stdin`
  or in the specified file.
* `build-data` to rebuild OpenData datasets.
* `fetch` to load and filter housings posts and output a JSON dump.
* `filter` to filter again the flats in the database (and update their status)
  according to changes in config. It can also filter a previously fetched list
  of housings posts, provided as a JSON dump (with a `--input` argument).
* `import` to import and filter housing posts into the database.
* `serve` to serve the built-in webapp with the development server. Do not use
  in production.

_Note:_ Fetching flats can be quite long and take up to a few minutes. This
should be better optimized. To get a verbose output and have an hint about the
progress, use the `-v` argument.


### Common arguments

You can pass some command-line arguments to Flatisfy commands, common to all the available commands. These are

* `--help`/`-h` to get some help message about the current command.
* `--data-dir DIR` to overload the `data_directory` value from config.
* `--config CONFIG` to use the config file located at `CONFIG`.
* `--passes [0, 1, 2, 3]` to overload the `passes` value from config.
* `--max-entries N` to overload the `max_entries` value from config.
* `-v` to enable verbose output.
* `-vv` to enable debug output.
* `--constraints` to specify a list of constraints to use (e.g. to restrict
  import to a subset of available constraints from the config). This list
  should be passed as a comma-separated list.


## Configuration

List of configuration options:

* `data_directory` is the directory in which you want data files to be stored.
  `null` is the default value and means default `XDG` location (typically
  `~/.local/share/flatisfy/`)
* `max_entries` is the maximum number of entries to fetch.
* `passes` is the number of passes to run on the data. First pass is a basic
  filtering and using only the informations from the housings list page.
  Second pass loads any possible information about the filtered flats and does
  better filtering.
* `database` is an SQLAlchemy URI to a database file. Defaults to `null` which
  means that it will store the database in the default location, in
  `data_directory`.
* `navitia_api_key` is an API token for [Navitia](https://www.navitia.io/)
  which is required to compute travel times.
* `modules_path` is the path to the Weboob modules. It can be `null` if you
  want Weboob to use the locally installed [Weboob
  modules](https://git.weboob.org/weboob/modules), which you should install
  yourself. This is the default value. If it is a string, it should be an
  absolute path to the folder containing Weboob modules.
* `port` is the port on which the development webserver should be
  listening (default to `8080`).
* `host` is the host on which the development webserver should be listening
  (default to `127.0.0.1`).
* `webserver` is a server to use instead of the default Bottle built-in
  webserver, see [Bottle deployment
  doc](http://bottlepy.org/docs/dev/deployment.html).
* `backends` is a list of Weboob backends to enable. It defaults to any
  available and supported Weboob backend.

_Note:_ In production, you can either use the `serve` command with a reliable
webserver instead of the default Bottle webserver (specifying a `webserver`
value) or use the `wsgi.py` script at the root of the repository to use WSGI.


### Constraints

You should specify some constraints to filter the resulting housings list,
under the `constraints` key. The available constraints are:

* `type` is the type of housing you want, either `RENT` (to rent), `SALE` (to
  buy) or `SHARING` (for a shared housing).
* `housing_types` is a list of house types you are looking for. Values can be
  `APART` (flat), `HOUSE`, `PARKING`, `LAND`, `OTHER` (everything else) or
  `UNKNOWN` (anything which was not matched with one of the previous
  categories).
* `area` (in m²), `bedrooms`, `cost` (in currency unit), `rooms`: this is a
  tuple of `(min, max)` values, defining an interval in which the value should
  lie. A `null` value means that any value is within this bound.
* `postal_codes` is a list of postal codes. You should include any postal code
  you want, and especially the postal codes close to the precise location you
  want.
* `time_to` is a dictionary of places to compute travel time to them (using
  public transport, relies on [Navitia API](http://navitia.io/)).
  Typically,

  ```
  "time_to": {
    "foobar": {
        "gps": [LAT, LNG],
        "time": [min, max]
    }
  }
  ```

  means that the housings must be between the `min` and `max` bounds (possibly
  `null`) from the place identified by the GPS coordinates `LAT` and `LNG`
  (latitude and longitude), and we call this place `foobar` in human-readable
  form. Beware that `time` constraints are in **seconds**.
 * `minimum_nb_photos` lets you filter out posts with less than this number of
   photos.
 * `description_should_contain` lets you specify a list of terms that should
   be present in the posts descriptions. Typically, if you expect "parking" to
   be in all the posts Flatisfy fetches for you, you can set
   `description_should_contain: ["parking"]`.


You can think of constraints as "a set of criterias to filter out flats". You
can specify as many constraints as you want, in the configuration file,
provided that you name each of them uniquely.


## Building the web assets

If you want to build the web assets, you can use `npm run build:dev`
(respectively `npm run watch:dev` to build continuously and monitor changes in
source files). You can use `npm run build:prod` (`npm run watch:prod`) to do
the same in production mode (with minification etc).
Write some documentation 2017-04-25 15:58:06 +02:00			`Getting started`
			`===============`

Add weboob and weboob-modules in requirements.txt, fix #79. 2017-10-24 23:17:20 +02:00
			`## Dependency on Weboob`

Update doc 2017-04-27 11:41:51 +02:00			`Important: Flatisfy relies on [Weboob](http://weboob.org/) to fetch`
			housing posts from housing websites. Then, you should install the [`devel`
			`branch](https://git.weboob.org/weboob/devel/) and update it regularly,`
			`especially if Flatisfy suddenly stops fetching housing posts.`

Add weboob and weboob-modules in requirements.txt, fix #79. 2017-10-24 23:17:20 +02:00			If you `pip install -r requirements.txt` it will install the latest
			`development version of [Weboob](https://git.weboob.org/weboob/devel/) and the`
			`[Weboob modules](https://git.weboob.org/weboob/modules/), which should be the`
			`best version available out there. You should update these packages regularly,`
			`as they evolve quickly.`

			`Weboob is made of two parts: a core and modules (which is the actual code`
			`fetching data from websites). Modules tend to break often and are then updated`
			`often, you should keep them up to date. This can be done by installing the`
			`weboob-modules` package listed in the `requirements.txt` and using the
			`default configuration.`

			`This is a safe default configuration. However, a better option is usually to`
			`clone [Weboob git repo](https://git.weboob.org/weboob/devel/) somewhere, on`
			your disk, to point `modules_path` configuration option to
			`path_to_weboob_git/modules` (see the configuration section below) and to run
			a `git pull; python setup.py install` in the Weboob git repo often.

Update doc 2017-04-27 11:41:51 +02:00
Write some documentation 2017-04-25 15:58:06 +02:00			`## TL;DR`

Rework Docker image and add some doc. Thanks @bnjbvr for the contribution! This closes #15. 2017-06-21 15:09:49 +02:00			`An alternative method is available using Docker. See [2.docker.md](2.docker.md).`

Write some documentation 2017-04-25 15:58:06 +02:00			`1. Clone the repository.`
			2. Install required Python modules: `pip install -r requirements.txt`.
			3. Init a configuration file: `python -m flatisfy init-config > config.json`.
			`Edit it according to your needs (see below).`
			`4. Build the required data files:`
			`python -m flatisfy build-data --config config.json`.
			5. Use it to `fetch` (and output a filtered JSON list of flats) or `import`
			`(into an SQLite database, for the web visualization) a list of flats`
			`matching your criteria.`
			`6. Install JS libraries and build the webapp:`
			`npm install && npm run build:dev` (use `build:prod` in production).
			7. Use `python -m flatisfy serve --config config.json` to serve the web app.


			`## Available commands`

			`The available commands are:`

			* `init-config` to generate an empty configuration file, either on the `stdin`
			`or in the specified file.`
Refilter command and backends in config * Add a refilter command * Add a backend option in config to only enable some backends. 2017-04-27 16:37:39 +02:00			* `build-data` to rebuild OpenData datasets.
Write some documentation 2017-04-25 15:58:06 +02:00			* `fetch` to load and filter housings posts and output a JSON dump.
Refilter command and backends in config * Add a refilter command * Add a backend option in config to only enable some backends. 2017-04-27 16:37:39 +02:00			* `filter` to filter again the flats in the database (and update their status)
			`according to changes in config. It can also filter a previously fetched list`
			of housings posts, provided as a JSON dump (with a `--input` argument).
Write some documentation 2017-04-25 15:58:06 +02:00			* `import` to import and filter housing posts into the database.
			* `serve` to serve the built-in webapp with the development server. Do not use
			`in production.`

Add a note about running time, see #72 2017-09-27 18:23:19 +02:00			`_Note:_ Fetching flats can be quite long and take up to a few minutes. This`
			`should be better optimized. To get a verbose output and have an hint about the`
			progress, use the `-v` argument.

Write some documentation 2017-04-25 15:58:06 +02:00
Update documentation Add a section about common arguments for all subcommands. Closes #56. 2017-06-14 10:46:38 +02:00			`### Common arguments`

			`You can pass some command-line arguments to Flatisfy commands, common to all the available commands. These are`

			* `--help`/`-h` to get some help message about the current command.
			* `--data-dir DIR` to overload the `data_directory` value from config.
			* `--config CONFIG` to use the config file located at `CONFIG`.
			* `--passes [0, 1, 2, 3]` to overload the `passes` value from config.
			* `--max-entries N` to overload the `max_entries` value from config.
			* `-v` to enable verbose output.
			* `-vv` to enable debug output.
Add some doc about CLI option and better informative message 2017-06-19 17:20:53 +02:00			* `--constraints` to specify a list of constraints to use (e.g. to restrict
			`import to a subset of available constraints from the config). This list`
			`should be passed as a comma-separated list.`
Update documentation Add a section about common arguments for all subcommands. Closes #56. 2017-06-14 10:46:38 +02:00

Write some documentation 2017-04-25 15:58:06 +02:00			`## Configuration`

			`List of configuration options:`

			* `data_directory` is the directory in which you want data files to be stored.
			`null` is the default value and means default `XDG` location (typically
			`~/.local/share/flatisfy/`)
			* `max_entries` is the maximum number of entries to fetch.
			* `passes` is the number of passes to run on the data. First pass is a basic
			`filtering and using only the informations from the housings list page.`
			`Second pass loads any possible information about the filtered flats and does`
			`better filtering.`
			* `database` is an SQLAlchemy URI to a database file. Defaults to `null` which
			`means that it will store the database in the default location, in`
			`data_directory`.
			* `navitia_api_key` is an API token for [Navitia](https://www.navitia.io/)
			`which is required to compute travel times.`
Clarify doc about modules_path config option, see #71 2017-09-25 16:41:49 +02:00			* `modules_path` is the path to the Weboob modules. It can be `null` if you
			`want Weboob to use the locally installed [Weboob`
			`modules](https://git.weboob.org/weboob/modules), which you should install`
			`yourself. This is the default value. If it is a string, it should be an`
			`absolute path to the folder containing Weboob modules.`
Write some documentation 2017-04-25 15:58:06 +02:00			* `port` is the port on which the development webserver should be
			listening (default to `8080`).
			* `host` is the host on which the development webserver should be listening
			(default to `127.0.0.1`).
			* `webserver` is a server to use instead of the default Bottle built-in
			`webserver, see [Bottle deployment`
			`doc](http://bottlepy.org/docs/dev/deployment.html).`
Better deduplication * Improve deduplication on URLs (match sets). * Keep track of duplicates and update their status on refiltering. 2017-04-27 17:08:10 +02:00			* `backends` is a list of Weboob backends to enable. It defaults to any
			`available and supported Weboob backend.`
Write some documentation 2017-04-25 15:58:06 +02:00
			_Note:_ In production, you can either use the `serve` command with a reliable
			webserver instead of the default Bottle webserver (specifying a `webserver`
			value) or use the `wsgi.py` script at the root of the repository to use WSGI.


			`### Constraints`

			`You should specify some constraints to filter the resulting housings list,`
			under the `constraints` key. The available constraints are:

			* `type` is the type of housing you want, either `RENT` (to rent), `SALE` (to
			buy) or `SHARING` (for a shared housing).
			* `housing_types` is a list of house types you are looking for. Values can be
			`APART` (flat), `HOUSE`, `PARKING`, `LAND`, `OTHER` (everything else) or
			`UNKNOWN` (anything which was not matched with one of the previous
			`categories).`
			* `area` (in m²), `bedrooms`, `cost` (in currency unit), `rooms`: this is a
			tuple of `(min, max)` values, defining an interval in which the value should
			lie. A `null` value means that any value is within this bound.
			* `postal_codes` is a list of postal codes. You should include any postal code
			`you want, and especially the postal codes close to the precise location you`
			`want.`
Partial fix for #62 (documentation) 2017-06-19 16:24:51 +02:00			* `time_to` is a dictionary of places to compute travel time to them (using
			`public transport, relies on [Navitia API](http://navitia.io/)).`
Write some documentation 2017-04-25 15:58:06 +02:00			`Typically,`

			```
			`"time_to": {`
			`"foobar": {`
			`"gps": [LAT, LNG],`
			`"time": [min, max]`
			`}`
			`}`
			```

			means that the housings must be between the `min` and `max` bounds (possibly
			`null`) from the place identified by the GPS coordinates `LAT` and `LNG`
			(latitude and longitude), and we call this place `foobar` in human-readable
			form. Beware that `time` constraints are in seconds.
Add a config option to filter on terms in the description. Fix #77. 2017-10-29 20:16:33 +01:00			* `minimum_nb_photos` lets you filter out posts with less than this number of
Rename minimum_photos config option to minimum_nb_photos 2017-10-29 20:03:39 +01:00			`photos.`
Add a config option to filter on terms in the description. Fix #77. 2017-10-29 20:16:33 +01:00			* `description_should_contain` lets you specify a list of terms that should
			`be present in the posts descriptions. Typically, if you expect "parking" to`
			`be in all the posts Flatisfy fetches for you, you can set`
			`description_should_contain: ["parking"]`.
Write some documentation 2017-04-25 15:58:06 +02:00

Rewrite doc and frontend to match new constraints in config 2017-06-16 16:56:59 +02:00			`You can think of constraints as "a set of criterias to filter out flats". You`
			`can specify as many constraints as you want, in the configuration file,`
			`provided that you name each of them uniquely.`


Write some documentation 2017-04-25 15:58:06 +02:00			`## Building the web assets`

			If you want to build the web assets, you can use `npm run build:dev`
			(respectively `npm run watch:dev` to build continuously and monitor changes in
			source files). You can use `npm run build:prod` (`npm run watch:prod`) to do
			`the same in production mode (with minification etc).`