From eaaa64bdea93c11ee24833deb2eeb486c3dea294 Mon Sep 17 00:00:00 2001 From: Phyks Date: Thu, 26 Dec 2013 17:16:12 +0100 Subject: [PATCH] Initial commit Python script generates the index correctly, but not optimized at all... --- README.md | 22 ++++++++++++++++ generate_index.py | 48 ++++++++++++++++++++++++++++++++++ samples/cryptdevice_multi.html | 22 ++++++++++++++++ samples/highmon_weechat.html | 42 +++++++++++++++++++++++++++++ 4 files changed, 134 insertions(+) create mode 100644 README.md create mode 100755 generate_index.py create mode 100644 samples/cryptdevice_multi.html create mode 100644 samples/highmon_weechat.html diff --git a/README.md b/README.md new file mode 100644 index 0000000..c03c6ec --- /dev/null +++ b/README.md @@ -0,0 +1,22 @@ +BloomJS +==== + +A javascript search engine. + +## Basic idea +I have a static weblog, generated thanks to [Blogit](https://github.com/phyks/blogit, caution this code is ugly) and, as I only want to have html files on my server, I needed to find a way to enable users to search my blog. + +An index is generated by a Python script, upon generation of the pages, and is dynamically downloaded by the client when he wants to search for contents. + +## Files + +* `generate_index.py` : Python script to generate the index (runs only at page generation) in a nice format for Javascript +* `samples/` : samples for testing purpose (taken from my blog articles) + +## Notes +I got the idea while reading [this page](http://www.stavros.io/posts/bloom-filter-search-engine/?print) found on [Sebsauvage's shaarli](http://sebsauvage.net/links/). I searched a bit for code doing what I wanted and found these ones : + +* https://github.com/olivernn/lunr.js +* https://github.com/reyesr/fullproof + +But I wasn't fully satisfied by the first one, and I found the second one too heavy and complicated for my purpose, so I ended up coding this. diff --git a/generate_index.py b/generate_index.py new file mode 100755 index 0000000..6b3ebd0 --- /dev/null +++ b/generate_index.py @@ -0,0 +1,48 @@ +#!/usr/bin/env python3 + +import os +from lxml import html +import re +from collections import defaultdict + + +# List all files in path directory +def list_directory(path): + fichier = [] + for root, dirs, files in os.walk(path): + for i in files: + fichier.append(os.path.join(root, i)) + return fichier + + +def remove_common_words(words): + returned = [word for word in words if len(word) > 3] + return returned + +# ============================================================================= +samples = list_directory("samples/") +index = defaultdict(list) + +i = 0 +for sample in samples: + with open(sample, 'r') as sample_fh: + content = sample_fh.read() + + # Get text from HTML content + words = html.fromstring(content).text_content().replace("\n", "") + words = re.findall(r"[\w]+", words) + # Remove all punctuation etc., convert words to lower and delete duplicates + words = list(set([word.lower() for word in words])) + + # Remove common words + words = remove_common_words(words) + # Stemming to reduce the number of words + # TODO : Could use http://tartarus.org/martin/PorterStemmer/ + + for word in words: + index[word].append(i) + + i += 1 + +print(samples) +print(index.items()) diff --git a/samples/cryptdevice_multi.html b/samples/cryptdevice_multi.html new file mode 100644 index 0000000..9723f41 --- /dev/null +++ b/samples/cryptdevice_multi.html @@ -0,0 +1,22 @@ + +

I installed Arch on my laptop with a LVM on LUKS setup. But I've two disks on my laptop (so this means at least two LUKS container) and my LVM install extended over the two disks. So, I needed to unlock two devices at boot to be able to mount my system (which is something the default encrypt hook doesn't support in Arch). Here's a way to proceed in order to achieve unlocking of multiple encrypted devices (presented with 2 devices, but can be used for more).

+ +

First, you need to install the necessary stuff to use cryptsetup and set the encrypt hook to be load (in mkinitcpio.conf) as described in Arch wiki.

+ +

Then, copy the file /usr/lib/initcpio/hooks/encrypt to /usr/lib/initcpio/hooks/encrypt2. Edit this last file and change any occurrence of cryptdevice and cryptkey by cryptdevice2 and cryptkey2. Also change the line

+
mkdir /ckey
+

by

+
if [ -d /ckey ]; then
+    mkdir /ckey
+fi
+
+

in order to avoid the display of a warning on boot. Load this encrypt2 hook in your mkinitcpio.conf.

+ +

Finally, edit your command line parameters (in Grub for example), adding the required cryptdevice, cryptkey (for first device) and cryptdevice2, cryptkey2 (for second device).

+ +

This is the best solution I've found so far, but it requires to manually update the second hook when updates are available (cryptsetup package, not all updates concern encrypt hook). Another solution was provided by the package cryptsetup-multi but this one is now obsolete and this setup is the one that works best for me.

diff --git a/samples/highmon_weechat.html b/samples/highmon_weechat.html new file mode 100644 index 0000000..14a3567 --- /dev/null +++ b/samples/highmon_weechat.html @@ -0,0 +1,42 @@ + +

I recently moved from Irssi+Screen to Weechat+Screen (and I'm planning to look at weechat interfaces in the future, to have a local irc client connecting to my server and avoid any latency while typing on low speed internet connection). My first step was to get almost the same setup as irssi. I'm very pleased with what I achieved, and weechat is definitely an excellent irc client, although it lacks a bit of usable documentation sometimes…

+ +

To get something like my old irssi, I had to install some extensions, including :

+ + +

I extensively used this link and the other articles on weechat on this website, which is a reference in my opinion, to get a working base weechat configuration.

+ +

But, one point that wasn't documented very well, is the use of a hilight window without dedicating a buffer to it. Dedicating a buffer to the hilight window means having an opened buffer in the main window, which is useless. You always select it accidentally by typing the wrong number for another buffer, and it's hidous in your buffer list (even though you can hide it from there). I don't know if this could be done in irssi, but in weechat, you can set highmon to use a bar instead of a buffer to display the "hilight window" and this is what we'll see in the following. I will assume you start with highmon plugin installed and configured, with a hilight window such as the one from Pascal Poitras.

+ +

So, first step is to tell highmon to use a bar for output instead of the standard buffer :

+
/set plugins.var.perl.highmon.output bar
+ +

Highmon should have created a bar automatically, to put the messages in. Check weechat.bar.highmon.* options to make sure it did. Next, type :

+
/set plugins.var.perl.highmon.bar_lines 250
+

to set the number of lines to be stored in your freshly created bar.

+ +

Then, you can edit all the preferences for the bar (size, size_max, position, priority, hide etc.) as for a standard bar, using weechat.bar.highmon.* options. Note that priority is important if you have to bars having the same position. For instance, if two bars are positioned at the top, the priority property will determine which one is above the other one.

+ +

One last point is that we'd like to have a title for the new hilight bar (which by default doesn't have any title). The hack is to use another plugin, text_item.py to display a bar with some text. To make a title "[Hilight Monitor]", just run (after having installed text_item.py):

+
/set plugins.var.python.text_item.hilight_monitor_title_text all "[Hilight Monitor]"
+/bar add highmon_title top 1 0 hilight_monitor_title_text
+
+

And play with the position, priority and colors for the newly created bar to have a nice setup :)

+ +

One last important thing is that, contrary to the buffer solution, you won't be able to clean easily the hilight window and to scroll in it. But, I found two aliases on #weechat (ty @silverd for the aliases) that you can bind to any key if you want:

+
/alias clear_highmon /mute /set plugins.var.perl.highmon.bar_lines -1;/mute /set weechat.bar.highmon.items "";/mute /set weechat.bar.highmon.items "highmon";/mute /set plugins.var.perl.highmon.bar_lines 250
+/alias scroll_highmon_down /bar scroll highmon * y+100%
+/alias scroll_highmon_up /bar scroll highmon * y-100%
+
+ +

You can now clear the hilight window with /clear_highmon and scroll in it with the other aliases. So, I think you are good to go for a (quite) perfect weechat setup :)