245 lines
11 KiB
Markdown
245 lines
11 KiB
Markdown
Getting started
|
|
===============
|
|
|
|
|
|
## Dependency on Woob
|
|
|
|
**Important**: Flatisfy relies on [Woob](https://gitlab.com/woob/woob/) to fetch
|
|
housing posts from housing websites.
|
|
|
|
If you `pip install -r requirements.txt` it will install the latest
|
|
development version of [Woob](https://gitlab.com/woob/woob/) and the
|
|
[Woob modules](https://gitlab.com/woob/modules/), which should be the
|
|
best version available out there. You should update these packages regularly,
|
|
as they evolve quickly.
|
|
|
|
Woob is made of two parts: a core and modules (which is the actual code
|
|
fetching data from websites). Modules tend to break often and are then updated
|
|
often, you should keep them up to date. This can be done by installing and
|
|
upgrading the packages listed in the `requirements.txt` and using the default
|
|
configuration.
|
|
|
|
This is a safe default configuration. However, a better option is usually to
|
|
clone [Woob git repo](https://gitlab.com/woob/woob/) somewhere, on
|
|
your disk, to point `modules_path` configuration option to
|
|
`path_to_woob_git/modules` (see the configuration section below) and to run
|
|
a `git pull; python setup.py install` in the Woob git repo often.
|
|
|
|
A copy of the Woob modules is available in the `modules` directory at the
|
|
root of this repository, you can use `"modules_path": "/path/to/flatisfy/modules"` to use them.
|
|
This copy may or may not be more up to date than the current state of official
|
|
Woob modules. Some changes are made there, which are not backported
|
|
upstream. Woob official modules are not synced in the `modules` folder on a
|
|
regular basis, so try both and see which ones match your needs! :)
|
|
|
|
|
|
## TL;DR
|
|
|
|
An alternative method is available using Docker. See [2.docker.md](2.docker.md).
|
|
|
|
1. Clone the repository.
|
|
2. Install required Python modules: `pip install -r requirements.txt`.
|
|
3. Init a configuration file: `python -m flatisfy init-config > config.json`.
|
|
Edit it according to your needs (see below).
|
|
4. Build the required data files:
|
|
`python -m flatisfy build-data --config config.json`.
|
|
5. You can now run `python -m flatisfy import --config config.json` to fetch
|
|
available flats, filter them and import everything in a SQLite database,
|
|
usable with the web visualization.
|
|
6. Install JS libraries and build the webapp:
|
|
`npm install && npm run build:dev` (use `build:prod` in production).
|
|
7. Use `python -m flatisfy serve --config config.json` to serve the web app.
|
|
|
|
_Note_: `Flatisfy` requires an up-to-date Node version. You can find
|
|
instructions on the [NodeJS website](https://nodejs.org/en/) to install latest
|
|
LTS version.
|
|
|
|
_Note_: Alternatively, you can `python -m flatisfy fetch --config config.json`
|
|
to fetch available flats, filter them and output them as a filtered JSON list
|
|
(the web visualization will not be able to display them). This is mainly
|
|
useful if you plan in integrating Flatisfy in your own pipeline.
|
|
|
|
|
|
|
|
## Available commands
|
|
|
|
The available commands are:
|
|
|
|
* `init-config` to generate an empty configuration file, either on the `stdin`
|
|
or in the specified file.
|
|
* `build-data` to rebuild OpenData datasets.
|
|
* `fetch` to load and filter housings posts and output a JSON dump.
|
|
* `filter` to filter again the flats in the database (and update their status)
|
|
according to changes in config. It can also filter a previously fetched list
|
|
of housings posts, provided as a JSON dump (with a `--input` argument).
|
|
* `import` to import and filter housing posts into the database.
|
|
* `serve` to serve the built-in webapp with the development server. Do not use
|
|
in production.
|
|
|
|
_Note:_ Fetching flats can be quite long and take up to a few minutes. This
|
|
should be better optimized. To get a verbose output and have an hint about the
|
|
progress, use the `-v` argument. It can remain stuck at "Loading flats for
|
|
constraint XXX...", which simply means it is fetching flats (using Woob
|
|
under the hood) and this step can be super long if there are lots of flats to
|
|
fetch. If this happens to you, you can set `max_entries` in your config to
|
|
limit the number of flats to fetch.
|
|
|
|
|
|
### Common arguments
|
|
|
|
You can pass some command-line arguments to Flatisfy commands, common to all the available commands. These are
|
|
|
|
* `--help`/`-h` to get some help message about the current command.
|
|
* `--data-dir DIR` to overload the `data_directory` value from config.
|
|
* `--config CONFIG` to use the config file located at `CONFIG`.
|
|
* `--passes [0, 1, 2, 3]` to overload the `passes` value from config.
|
|
* `--max-entries N` to overload the `max_entries` value from config.
|
|
* `-v` to enable verbose output.
|
|
* `-vv` to enable debug output.
|
|
* `--constraints` to specify a list of constraints to use (e.g. to restrict
|
|
import to a subset of available constraints from the config). This list
|
|
should be passed as a comma-separated list.
|
|
|
|
|
|
## Configuration
|
|
|
|
List of configuration options:
|
|
|
|
* `data_directory` is the directory in which you want data files to be stored.
|
|
`null` is the default value and means default `XDG` location (typically
|
|
`~/.local/share/flatisfy/`)
|
|
* `max_entries` is the maximum number of entries to fetch.
|
|
* `passes` is the number of passes to run on the data. First pass is a basic
|
|
filtering and using only the informations from the housings list page.
|
|
Second pass loads any possible information about the filtered flats and does
|
|
better filtering.
|
|
* `database` is an SQLAlchemy URI to a database file. Defaults to `null` which
|
|
means that it will store the database in the default location, in
|
|
`data_directory`.
|
|
* `navitia_api_key` is an API token for [Navitia](https://www.navitia.io/)
|
|
which is required to compute travel times for `PUBLIC_TRANSPORT` mode.
|
|
* `mapbox_api_key` is an API token for [Mapbox](http://mapbox.com/)
|
|
which is required to compute travel times for `WALK`, `BIKE` and `CAR`
|
|
modes.
|
|
* `modules_path` is the path to the Woob modules. It can be `null` if you
|
|
want Woob to use the locally installed [Woob
|
|
modules](https://gitlab.com/woob/modules/), which you should install
|
|
yourself. This is the default value. If it is a string, it should be an
|
|
absolute path to the folder containing Woob modules.
|
|
* `port` is the port on which the development webserver should be
|
|
listening (default to `8080`).
|
|
* `host` is the host on which the development webserver should be listening
|
|
(default to `127.0.0.1`).
|
|
* `webserver` is a server to use instead of the default Bottle built-in
|
|
webserver, see [Bottle deployment
|
|
doc](http://bottlepy.org/docs/dev/deployment.html).
|
|
* `backends` is a list of Woob backends to enable. It defaults to any
|
|
available and supported Woob backend.
|
|
* `store_personal_data` is a boolean indicated whether or not Flatisfy should
|
|
fetch personal data from housing posts and store them in database. Such
|
|
personal data include contact phone number for instance. By default,
|
|
Flatisfy does not store such personal data.
|
|
* `max_distance_housing_station` is the maximum distance (in meters) between
|
|
an housing and a public transport station found for this housing (default is
|
|
`1500`). This is useful to avoid false-positive.
|
|
* `duplicate_threshold` is the minimum score in the deep duplicate detection
|
|
step to consider two flats as being duplicates (defaults to `15`).
|
|
* `serve_images_locally` lets you download all the images from the housings
|
|
websites when importing the posts. Then, all your Flatisfy works standalone,
|
|
serving the local copy of the images instead of fetching the images from the
|
|
remote websites every time you look through the fetched housing posts.
|
|
|
|
_Note:_ In production, you can either use the `serve` command with a reliable
|
|
webserver instead of the default Bottle webserver (specifying a `webserver`
|
|
value) or use the `wsgi.py` script at the root of the repository to use WSGI.
|
|
|
|
|
|
### Constraints
|
|
|
|
You should specify some constraints to filter the resulting housings list,
|
|
under the `constraints` key. The available constraints are:
|
|
|
|
* `type` is the type of housing you want, either `RENT` (to rent), `SALE` (to
|
|
buy), `SHARING` (for a shared housing), `FURNISHED_RENT` (for a furnished
|
|
rent), `VIAGER` (for a viager, lifetime sale).
|
|
* `house_types` is a list of house types you are looking for. Values can be
|
|
`APART` (flat), `HOUSE`, `PARKING`, `LAND`, `OTHER` (everything else) or
|
|
`UNKNOWN` (anything which was not matched with one of the previous
|
|
categories).
|
|
* `area` (in m²), `bedrooms`, `cost` (in currency unit), `rooms`: this is a
|
|
tuple of `(min, max)` values, defining an interval in which the value should
|
|
lie. A `null` value means that any value is within this bound.
|
|
* `postal_codes` (as strings) is a list of postal codes. You should include any postal code
|
|
you want, and especially the postal codes close to the precise location you
|
|
want.
|
|
* `time_to` is a dictionary of places to compute travel time to them.
|
|
Typically,
|
|
|
|
```
|
|
"time_to": {
|
|
"foobar": {
|
|
"gps": [LAT, LNG],
|
|
"mode": A transport mode,
|
|
"time": [min, max]
|
|
}
|
|
}
|
|
```
|
|
|
|
means that the housings must be between the `min` and `max` bounds (possibly
|
|
`null`) from the place identified by the GPS coordinates `LAT` and `LNG`
|
|
(latitude and longitude), and we call this place `foobar` in human-readable
|
|
form. `mode` should be either `PUBLIC_TRANSPORT`, `WALK`, `BIKE` or `CAR`.
|
|
Beware that `time` constraints are in **seconds**. You should take
|
|
some margin as the travel time computation is done with found nearby public
|
|
transport stations, which is only a rough estimate of the flat position. For
|
|
`PUBLIC_TRANSPORT` the travel time is computed assuming a route the next
|
|
Monday at 8am.
|
|
* `minimum_nb_photos` lets you filter out posts with less than this number of
|
|
photos.
|
|
* `description_should_contain` lets you specify a list of terms that should
|
|
be present in the posts descriptions. Typically, if you expect "parking" to
|
|
be in all the posts Flatisfy fetches for you, you can set
|
|
`description_should_contain: ["parking"]`. You can also use list of terms
|
|
which acts as an "or" operation. For example, if you are looking for a flat
|
|
with a parking and with either a balcony or a terrace, you can use
|
|
`description_should_contain: ["parking", ["balcony", "terrace"]]`
|
|
* `description_should_not_contain` lets you specify a list of terms that should
|
|
never occur in the posts descriptions. Typically, if you wish to avoid
|
|
"coloc" in the posts Flatisfy fetches for you, you can set
|
|
`description_should_not_contain: ["coloc"]`.
|
|
|
|
|
|
You can think of constraints as "a set of criterias to filter out flats". You
|
|
can specify as many constraints as you want, in the configuration file,
|
|
provided that you name each of them uniquely.
|
|
|
|
|
|
## Building the web assets
|
|
|
|
If you want to build the web assets, you can use `npm run build:dev`
|
|
(respectively `npm run watch:dev` to build continuously and monitor changes in
|
|
source files). You can use `npm run build:prod` (`npm run watch:prod`) to do
|
|
the same in production mode (with minification etc).
|
|
|
|
|
|
## Upgrading
|
|
|
|
To update the app, you can simply `git pull` the latest version. The database
|
|
schema might change from time to time. Here is how to update it automatically:
|
|
|
|
* First, edit the `alembic.ini` file and ensure the `sqlalchemy.url` entry
|
|
points to the database URI you are actually using for Flatisfy.
|
|
* Then, run `alembic upgrade head` to run the required migrations.
|
|
|
|
## Misc
|
|
|
|
### Other tools more or less connected with Flatisfy
|
|
|
|
+ [ZipAround](https://github.com/guix77/ziparound) generates a list of ZIP codes centered on a city name, within a radius of N kilometers and within a certain travel time by car (France only). You can invoke it with:
|
|
|
|
```sh
|
|
npm ziparound
|
|
# or alternatively
|
|
npm ziparound --code 75001 --distance 3
|
|
```
|