Initial commit

Python script generates the index correctly, but not optimized at all...
This commit is contained in:
Phyks 2013-12-26 17:16:12 +01:00
commit eaaa64bdea
4 changed files with 134 additions and 0 deletions

22
README.md Normal file
View File

@ -0,0 +1,22 @@
BloomJS
====
A javascript search engine.
## Basic idea
I have a static weblog, generated thanks to [Blogit](https://github.com/phyks/blogit, caution this code is ugly) and, as I only want to have html files on my server, I needed to find a way to enable users to search my blog.
An index is generated by a Python script, upon generation of the pages, and is dynamically downloaded by the client when he wants to search for contents.
## Files
* `generate_index.py` : Python script to generate the index (runs only at page generation) in a nice format for Javascript
* `samples/` : samples for testing purpose (taken from my blog articles)
## Notes
I got the idea while reading [this page](http://www.stavros.io/posts/bloom-filter-search-engine/?print) found on [Sebsauvage's shaarli](http://sebsauvage.net/links/). I searched a bit for code doing what I wanted and found these ones :
* https://github.com/olivernn/lunr.js
* https://github.com/reyesr/fullproof
But I wasn't fully satisfied by the first one, and I found the second one too heavy and complicated for my purpose, so I ended up coding this.

48
generate_index.py Executable file
View File

@ -0,0 +1,48 @@
#!/usr/bin/env python3
import os
from lxml import html
import re
from collections import defaultdict
# List all files in path directory
def list_directory(path):
fichier = []
for root, dirs, files in os.walk(path):
for i in files:
fichier.append(os.path.join(root, i))
return fichier
def remove_common_words(words):
returned = [word for word in words if len(word) > 3]
return returned
# =============================================================================
samples = list_directory("samples/")
index = defaultdict(list)
i = 0
for sample in samples:
with open(sample, 'r') as sample_fh:
content = sample_fh.read()
# Get text from HTML content
words = html.fromstring(content).text_content().replace("\n", "")
words = re.findall(r"[\w]+", words)
# Remove all punctuation etc., convert words to lower and delete duplicates
words = list(set([word.lower() for word in words]))
# Remove common words
words = remove_common_words(words)
# Stemming to reduce the number of words
# TODO : Could use http://tartarus.org/martin/PorterStemmer/
for word in words:
index[word].append(i)
i += 1
print(samples)
print(index.items())

View File

@ -0,0 +1,22 @@
<!--
@author=Phyks
@date=17112013-0800
@title=Decrypt multiple LUKS containers at boot on Arch
@tags=Arch, Linux
-->
<p>I installed Arch on my laptop with a LVM on LUKS setup. But I've two disks on my laptop (so this means at least two LUKS container) and my LVM install extended over the two disks. So, I needed to unlock two devices at boot to be able to mount my system (which is something the default encrypt hook doesn't support in Arch). Here's a way to proceed in order to achieve unlocking of multiple encrypted devices (presented with 2 devices, but can be used for more).</p>
<p>First, you need to install the necessary stuff to use cryptsetup and set the encrypt hook to be load (in mkinitcpio.conf) as described in <a href="https://wiki.archlinux.org/index.php/Dm-crypt_with_LUKS#.2Fetc.2Fmkinitcpio.conf">Arch wiki</a>.</p>
<p>Then, copy the file /usr/lib/initcpio/hooks/encrypt to /usr/lib/initcpio/hooks/encrypt2. Edit this last file and change any occurrence of cryptdevice and cryptkey by cryptdevice2 and cryptkey2. Also change the line</p>
<pre>mkdir /ckey</pre>
<p>by</p>
<pre>if [ -d /ckey ]; then
mkdir /ckey
fi
</pre>
<p>in order to avoid the display of a warning on boot. Load this encrypt2 hook in your mkinitcpio.conf.</p>
<p>Finally, edit your command line parameters (in Grub for example), adding the required cryptdevice, cryptkey (for first device) and cryptdevice2, cryptkey2 (for second device).</p>
<p>This is the best solution I've found so far, but it requires to manually update the second hook when updates are available (cryptsetup package, not all updates concern encrypt hook). Another solution was provided by the package cryptsetup-multi but this one is now obsolete and this setup is the one that works best for me.</p>

View File

@ -0,0 +1,42 @@
<!--
@author=Phyks
@date=25122013-0133
@title=Hilight window in weechat
@tags=Weechat
-->
<p>I recently moved from Irssi+Screen to Weechat+Screen (and I'm planning to look at weechat interfaces in the future, to have a local irc client connecting to my server and avoid any latency while typing on low speed internet connection). My first step was to get almost the same setup as irssi. I'm very pleased with what I achieved, and weechat is definitely an excellent irc client, although it lacks a bit of usable documentation sometimes…</p>
<p>To get something like my old irssi, I had to install some extensions, including :</p>
<ul>
<li>text_effects.lua to have some inline text decoration such as *bold* to display bold in bold</li>
<li>buffers.pl to have a list of opened buffers</li>
<li>iset.pl to set configuration options easily</li>
<li>screen_away.py (which is very efficient !) to auto away when I detach my screen session</li>
</ul>
<p>I extensively used <a href="http://pascalpoitras.com/2013/05/25/my-weechat-configuration/">this link</a> and the other articles on weechat on this website, which is a reference in my opinion, to get a working base weechat configuration.</p>
<p>But, one point that wasn't documented very well, is the use of a hilight window without dedicating a buffer to it. Dedicating a buffer to the hilight window means having an opened buffer in the main window, which is useless. You always select it accidentally by typing the wrong number for another buffer, and it's hidous in your buffer list (even though you can hide it from there). I don't know if this could be done in irssi, but in weechat, you can set highmon to use a bar instead of a buffer to display the "hilight window" and this is what we'll see in the following. I will assume you start with highmon plugin installed and configured, with a hilight window such as the one from Pascal Poitras.</p>
<p>So, first step is to tell highmon to use a bar for output instead of the standard buffer :</p>
<pre>/set plugins.var.perl.highmon.output bar</pre>
<p>Highmon should have created a bar automatically, to put the messages in. Check weechat.bar.highmon.* options to make sure it did. Next, type :</p>
<pre>/set plugins.var.perl.highmon.bar_lines 250</pre>
<p>to set the number of lines to be stored in your freshly created bar.</p>
<p>Then, you can edit all the preferences for the bar (size, size_max, position, priority, hide etc.) as for a standard bar, using weechat.bar.highmon.* options. Note that priority is important if you have to bars having the same position. For instance, if two bars are positioned at the top, the priority property will determine which one is above the other one.</p>
<p>One last point is that we'd like to have a title for the new hilight bar (which by default doesn't have any title). The hack is to use another plugin, text_item.py to display a bar with some text. To make a title "[Hilight Monitor]", just run (after having installed text_item.py):</p>
<pre>/set plugins.var.python.text_item.hilight_monitor_title_text all "[Hilight Monitor]"
/bar add highmon_title top 1 0 hilight_monitor_title_text
</pre>
<p>And play with the position, priority and colors for the newly created bar to have a nice setup :)</p>
<p>One last important thing is that, contrary to the buffer solution, you won't be able to clean easily the hilight window and to scroll in it. But, I found two aliases on #weechat (ty @silverd for the aliases) that you can bind to any key if you want:</p>
<pre>/alias clear_highmon /mute /set plugins.var.perl.highmon.bar_lines -1;/mute /set weechat.bar.highmon.items "";/mute /set weechat.bar.highmon.items "highmon";/mute /set plugins.var.perl.highmon.bar_lines 250
/alias scroll_highmon_down /bar scroll highmon * y+100%
/alias scroll_highmon_up /bar scroll highmon * y-100%
</pre>
<p>You can now clear the hilight window with /clear_highmon and scroll in it with the other aliases. So, I think you are good to go for a (quite) perfect weechat setup :)</p>