8.8 KiB
Getting started
Dependency on Weboob
Important: Flatisfy relies on Weboob to fetch
housing posts from housing websites. Then, you should install the devel
branch and update it regularly,
especially if Flatisfy suddenly stops fetching housing posts.
If you pip install -r requirements.txt
it will install the latest
development version of Weboob and the
Weboob modules, which should be the
best version available out there. You should update these packages regularly,
as they evolve quickly.
Weboob is made of two parts: a core and modules (which is the actual code
fetching data from websites). Modules tend to break often and are then updated
often, you should keep them up to date. This can be done by installing the
weboob-modules
package listed in the requirements.txt
and using the
default configuration.
This is a safe default configuration. However, a better option is usually to
clone Weboob git repo somewhere, on
your disk, to point modules_path
configuration option to
path_to_weboob_git/modules
(see the configuration section below) and to run
a git pull; python setup.py install
in the Weboob git repo often.
TL;DR
An alternative method is available using Docker. See 2.docker.md.
- Clone the repository.
- Install required Python modules:
pip install -r requirements.txt
. - Init a configuration file:
python -m flatisfy init-config > config.json
. Edit it according to your needs (see below). - Build the required data files:
python -m flatisfy build-data --config config.json
. - You can now run
python -m flatisfy import --config config.json
to fetch available flats, filter them and import everything in a SQLite database, usable with the web visualization. - Install JS libraries and build the webapp:
npm install && npm run build:dev
(usebuild:prod
in production). - Use
python -m flatisfy serve --config config.json
to serve the web app.
Note: Flatisfy
requires an up-to-date Node version. You can find
instructions on the NodeJS website to install latest
LTS version.
Note: Alternatively, you can python -m flatisfy fetch --config config.json
to fetch available flats, filter them and output them as a filtered JSON list
(the web visualization will not be able to display them). This is mainly
useful if you plan in integrating Flatisfy in your own pipeline.
Available commands
The available commands are:
init-config
to generate an empty configuration file, either on thestdin
or in the specified file.build-data
to rebuild OpenData datasets.fetch
to load and filter housings posts and output a JSON dump.filter
to filter again the flats in the database (and update their status) according to changes in config. It can also filter a previously fetched list of housings posts, provided as a JSON dump (with a--input
argument).import
to import and filter housing posts into the database.serve
to serve the built-in webapp with the development server. Do not use in production.
Note: Fetching flats can be quite long and take up to a few minutes. This
should be better optimized. To get a verbose output and have an hint about the
progress, use the -v
argument.
Common arguments
You can pass some command-line arguments to Flatisfy commands, common to all the available commands. These are
--help
/-h
to get some help message about the current command.--data-dir DIR
to overload thedata_directory
value from config.--config CONFIG
to use the config file located atCONFIG
.--passes [0, 1, 2, 3]
to overload thepasses
value from config.--max-entries N
to overload themax_entries
value from config.-v
to enable verbose output.-vv
to enable debug output.--constraints
to specify a list of constraints to use (e.g. to restrict import to a subset of available constraints from the config). This list should be passed as a comma-separated list.
Configuration
List of configuration options:
data_directory
is the directory in which you want data files to be stored.null
is the default value and means defaultXDG
location (typically~/.local/share/flatisfy/
)max_entries
is the maximum number of entries to fetch.passes
is the number of passes to run on the data. First pass is a basic filtering and using only the informations from the housings list page. Second pass loads any possible information about the filtered flats and does better filtering.database
is an SQLAlchemy URI to a database file. Defaults tonull
which means that it will store the database in the default location, indata_directory
.navitia_api_key
is an API token for Navitia which is required to compute travel times.modules_path
is the path to the Weboob modules. It can benull
if you want Weboob to use the locally installed Weboob modules, which you should install yourself. This is the default value. If it is a string, it should be an absolute path to the folder containing Weboob modules.port
is the port on which the development webserver should be listening (default to8080
).host
is the host on which the development webserver should be listening (default to127.0.0.1
).webserver
is a server to use instead of the default Bottle built-in webserver, see Bottle deployment doc.backends
is a list of Weboob backends to enable. It defaults to any available and supported Weboob backend.store_personal_data
is a boolean indicated whether or not Flatisfy should fetch personal data from housing posts and store them in database. Such personal data include contact phone number for instance. By default, Flatisfy does not store such personal data.max_distance_housing_station
is the maximum distance (in meters) between an housing and a public transport station found for this housing (default is1500
). This is useful to avoid false-positive.duplicate_threshold
is the minimum score in the deep duplicate detection step to consider two flats as being duplicates (defaults to15
).
Note: In production, you can either use the serve
command with a reliable
webserver instead of the default Bottle webserver (specifying a webserver
value) or use the wsgi.py
script at the root of the repository to use WSGI.
Constraints
You should specify some constraints to filter the resulting housings list,
under the constraints
key. The available constraints are:
-
type
is the type of housing you want, eitherRENT
(to rent),SALE
(to buy) orSHARING
(for a shared housing). -
house_types
is a list of house types you are looking for. Values can beAPART
(flat),HOUSE
,PARKING
,LAND
,OTHER
(everything else) orUNKNOWN
(anything which was not matched with one of the previous categories). -
area
(in m²),bedrooms
,cost
(in currency unit),rooms
: this is a tuple of(min, max)
values, defining an interval in which the value should lie. Anull
value means that any value is within this bound. -
postal_codes
(as strings) is a list of postal codes. You should include any postal code you want, and especially the postal codes close to the precise location you want. -
time_to
is a dictionary of places to compute travel time to them (using public transport, relies on Navitia API). Typically,"time_to": { "foobar": { "gps": [LAT, LNG], "time": [min, max] } }
means that the housings must be between the
min
andmax
bounds (possiblynull
) from the place identified by the GPS coordinatesLAT
andLNG
(latitude and longitude), and we call this placefoobar
in human-readable form. Beware thattime
constraints are in seconds. -
minimum_nb_photos
lets you filter out posts with less than this number of photos. -
description_should_contain
lets you specify a list of terms that should be present in the posts descriptions. Typically, if you expect "parking" to be in all the posts Flatisfy fetches for you, you can setdescription_should_contain: ["parking"]
.
You can think of constraints as "a set of criterias to filter out flats". You can specify as many constraints as you want, in the configuration file, provided that you name each of them uniquely.
Building the web assets
If you want to build the web assets, you can use npm run build:dev
(respectively npm run watch:dev
to build continuously and monitor changes in
source files). You can use npm run build:prod
(npm run watch:prod
) to do
the same in production mode (with minification etc).