194 lines
8.8 KiB
Markdown
194 lines
8.8 KiB
Markdown
Getting started
|
|
===============
|
|
|
|
|
|
## Dependency on Weboob
|
|
|
|
**Important**: Flatisfy relies on [Weboob](http://weboob.org/) to fetch
|
|
housing posts from housing websites. Then, you should install the [`devel`
|
|
branch](https://git.weboob.org/weboob/devel/) and update it regularly,
|
|
especially if Flatisfy suddenly stops fetching housing posts.
|
|
|
|
If you `pip install -r requirements.txt` it will install the latest
|
|
development version of [Weboob](https://git.weboob.org/weboob/devel/) and the
|
|
[Weboob modules](https://git.weboob.org/weboob/modules/), which should be the
|
|
best version available out there. You should update these packages regularly,
|
|
as they evolve quickly.
|
|
|
|
Weboob is made of two parts: a core and modules (which is the actual code
|
|
fetching data from websites). Modules tend to break often and are then updated
|
|
often, you should keep them up to date. This can be done by installing the
|
|
`weboob-modules` package listed in the `requirements.txt` and using the
|
|
default configuration.
|
|
|
|
This is a safe default configuration. However, a better option is usually to
|
|
clone [Weboob git repo](https://git.weboob.org/weboob/devel/) somewhere, on
|
|
your disk, to point `modules_path` configuration option to
|
|
`path_to_weboob_git/modules` (see the configuration section below) and to run
|
|
a `git pull; python setup.py install` in the Weboob git repo often.
|
|
|
|
|
|
## TL;DR
|
|
|
|
An alternative method is available using Docker. See [2.docker.md](2.docker.md).
|
|
|
|
1. Clone the repository.
|
|
2. Install required Python modules: `pip install -r requirements.txt`.
|
|
3. Init a configuration file: `python -m flatisfy init-config > config.json`.
|
|
Edit it according to your needs (see below).
|
|
4. Build the required data files:
|
|
`python -m flatisfy build-data --config config.json`.
|
|
5. You can now run `python -m flatisfy import --config config.json` to fetch
|
|
available flats, filter them and import everything in a SQLite database,
|
|
usable with the web visualization.
|
|
6. Install JS libraries and build the webapp:
|
|
`npm install && npm run build:dev` (use `build:prod` in production).
|
|
7. Use `python -m flatisfy serve --config config.json` to serve the web app.
|
|
|
|
_Note_: `Flatisfy` requires an up-to-date Node version. You can find
|
|
instructions on the [NodeJS website](https://nodejs.org/en/) to install latest
|
|
LTS version.
|
|
|
|
_Note_: Alternatively, you can `python -m flatisfy fetch --config config.json`
|
|
to fetch available flats, filter them and output them as a filtered JSON list
|
|
(the web visualization will not be able to display them). This is mainly
|
|
useful if you plan in integrating Flatisfy in your own pipeline.
|
|
|
|
|
|
|
|
## Available commands
|
|
|
|
The available commands are:
|
|
|
|
* `init-config` to generate an empty configuration file, either on the `stdin`
|
|
or in the specified file.
|
|
* `build-data` to rebuild OpenData datasets.
|
|
* `fetch` to load and filter housings posts and output a JSON dump.
|
|
* `filter` to filter again the flats in the database (and update their status)
|
|
according to changes in config. It can also filter a previously fetched list
|
|
of housings posts, provided as a JSON dump (with a `--input` argument).
|
|
* `import` to import and filter housing posts into the database.
|
|
* `serve` to serve the built-in webapp with the development server. Do not use
|
|
in production.
|
|
|
|
_Note:_ Fetching flats can be quite long and take up to a few minutes. This
|
|
should be better optimized. To get a verbose output and have an hint about the
|
|
progress, use the `-v` argument.
|
|
|
|
|
|
### Common arguments
|
|
|
|
You can pass some command-line arguments to Flatisfy commands, common to all the available commands. These are
|
|
|
|
* `--help`/`-h` to get some help message about the current command.
|
|
* `--data-dir DIR` to overload the `data_directory` value from config.
|
|
* `--config CONFIG` to use the config file located at `CONFIG`.
|
|
* `--passes [0, 1, 2, 3]` to overload the `passes` value from config.
|
|
* `--max-entries N` to overload the `max_entries` value from config.
|
|
* `-v` to enable verbose output.
|
|
* `-vv` to enable debug output.
|
|
* `--constraints` to specify a list of constraints to use (e.g. to restrict
|
|
import to a subset of available constraints from the config). This list
|
|
should be passed as a comma-separated list.
|
|
|
|
|
|
## Configuration
|
|
|
|
List of configuration options:
|
|
|
|
* `data_directory` is the directory in which you want data files to be stored.
|
|
`null` is the default value and means default `XDG` location (typically
|
|
`~/.local/share/flatisfy/`)
|
|
* `max_entries` is the maximum number of entries to fetch.
|
|
* `passes` is the number of passes to run on the data. First pass is a basic
|
|
filtering and using only the informations from the housings list page.
|
|
Second pass loads any possible information about the filtered flats and does
|
|
better filtering.
|
|
* `database` is an SQLAlchemy URI to a database file. Defaults to `null` which
|
|
means that it will store the database in the default location, in
|
|
`data_directory`.
|
|
* `navitia_api_key` is an API token for [Navitia](https://www.navitia.io/)
|
|
which is required to compute travel times.
|
|
* `modules_path` is the path to the Weboob modules. It can be `null` if you
|
|
want Weboob to use the locally installed [Weboob
|
|
modules](https://git.weboob.org/weboob/modules), which you should install
|
|
yourself. This is the default value. If it is a string, it should be an
|
|
absolute path to the folder containing Weboob modules.
|
|
* `port` is the port on which the development webserver should be
|
|
listening (default to `8080`).
|
|
* `host` is the host on which the development webserver should be listening
|
|
(default to `127.0.0.1`).
|
|
* `webserver` is a server to use instead of the default Bottle built-in
|
|
webserver, see [Bottle deployment
|
|
doc](http://bottlepy.org/docs/dev/deployment.html).
|
|
* `backends` is a list of Weboob backends to enable. It defaults to any
|
|
available and supported Weboob backend.
|
|
* `store_personal_data` is a boolean indicated whether or not Flatisfy should
|
|
fetch personal data from housing posts and store them in database. Such
|
|
personal data include contact phone number for instance. By default,
|
|
Flatisfy does not store such personal data.
|
|
* `max_distance_housing_station` is the maximum distance (in meters) between
|
|
an housing and a public transport station found for this housing (default is
|
|
`1500`). This is useful to avoid false-positive.
|
|
* `duplicate_threshold` is the minimum score in the deep duplicate detection
|
|
step to consider two flats as being duplicates (defaults to `15`).
|
|
|
|
_Note:_ In production, you can either use the `serve` command with a reliable
|
|
webserver instead of the default Bottle webserver (specifying a `webserver`
|
|
value) or use the `wsgi.py` script at the root of the repository to use WSGI.
|
|
|
|
|
|
### Constraints
|
|
|
|
You should specify some constraints to filter the resulting housings list,
|
|
under the `constraints` key. The available constraints are:
|
|
|
|
* `type` is the type of housing you want, either `RENT` (to rent), `SALE` (to
|
|
buy) or `SHARING` (for a shared housing).
|
|
* `house_types` is a list of house types you are looking for. Values can be
|
|
`APART` (flat), `HOUSE`, `PARKING`, `LAND`, `OTHER` (everything else) or
|
|
`UNKNOWN` (anything which was not matched with one of the previous
|
|
categories).
|
|
* `area` (in m²), `bedrooms`, `cost` (in currency unit), `rooms`: this is a
|
|
tuple of `(min, max)` values, defining an interval in which the value should
|
|
lie. A `null` value means that any value is within this bound.
|
|
* `postal_codes` (as strings) is a list of postal codes. You should include any postal code
|
|
you want, and especially the postal codes close to the precise location you
|
|
want.
|
|
* `time_to` is a dictionary of places to compute travel time to them (using
|
|
public transport, relies on [Navitia API](http://navitia.io/)).
|
|
Typically,
|
|
|
|
```
|
|
"time_to": {
|
|
"foobar": {
|
|
"gps": [LAT, LNG],
|
|
"time": [min, max]
|
|
}
|
|
}
|
|
```
|
|
|
|
means that the housings must be between the `min` and `max` bounds (possibly
|
|
`null`) from the place identified by the GPS coordinates `LAT` and `LNG`
|
|
(latitude and longitude), and we call this place `foobar` in human-readable
|
|
form. Beware that `time` constraints are in **seconds**.
|
|
* `minimum_nb_photos` lets you filter out posts with less than this number of
|
|
photos.
|
|
* `description_should_contain` lets you specify a list of terms that should
|
|
be present in the posts descriptions. Typically, if you expect "parking" to
|
|
be in all the posts Flatisfy fetches for you, you can set
|
|
`description_should_contain: ["parking"]`.
|
|
|
|
|
|
You can think of constraints as "a set of criterias to filter out flats". You
|
|
can specify as many constraints as you want, in the configuration file,
|
|
provided that you name each of them uniquely.
|
|
|
|
|
|
## Building the web assets
|
|
|
|
If you want to build the web assets, you can use `npm run build:dev`
|
|
(respectively `npm run watch:dev` to build continuously and monitor changes in
|
|
source files). You can use `npm run build:prod` (`npm run watch:prod`) to do
|
|
the same in production mode (with minification etc).
|