|Phyks (Lucas Verney) 837199ef9d Update doc + remove useless files||5 years ago|
|css||5 years ago|
|data||5 years ago|
|index_generation||5 years ago|
|js||5 years ago|
|samples||5 years ago|
|.gitignore||5 years ago|
|README.md||5 years ago|
|bg.png||5 years ago|
|index.html||5 years ago|
|test.html||5 years ago|
To preserve bandwith, the index is stored in a binary file, using BloomFilters, instead of using a JSON index as Lunr.JS does.
For full details about BloomySearch, please refer to this blog post.
I have a static weblog, generated thanks to Blogit and, as I only want to have html files on my server, I needed to find a way to enable users to search my blog.
An index is generated by a Python script, upon generation of the pages, and is dynamically downloaded by the client when he wants to search for contents.
pybloom.py: Library to handle bloom filters in Python
stemmer.py: Implementation of Porter Stemming algorithm in Python, from Vivake Gupta.
js/bloom.js: main JS code
js/bloomfilters.js: JS library to use BloomFilters
samples/: samples for testing purpose (taken from my blog articles)
Data from the python script is just the array of bloomfilters bitarray written as a binary file (
data/search_index), which I open with JS. The list of articles is also written in JSON form in a specific file (
Here’s the format of the output from the python script:
But I wasn’t fully satisfied by the first one, and I found the second one too heavy and complicated for my purpose, so I ended up coding this.
This code is mainly a proof of concept. As such, it is not fully optimized (actually, I just tweaked until the resulted files and calculations could be considered “acceptable”). For those looking for more effective solutions, here are a few things I found while looking for information on the web:
TLDR; I don’t give a damn to anything you can do using this code. It would just be nice to quote where the original code comes from. All the included libraries (pybloom and the stemming library) have their own license.