The git repo behind my blog.

index.html 14KB

  1. <!DOCTYPE html>
  2. <html lang="fr">
  3. <head>
  4. <meta charset="utf-8">
  5. <title>Phyks' blog - 2014/11</title>
  6. <link rel="stylesheet" href="//"/>
  7. <link type="text/plain" rel="author" href="//"/>
  8. <meta name="viewport" content="width=device-width, initial-scale=1.0">
  9. <link rel="alternate" type="application/rss+xml" title="RSS" href="//" />
  10. </head>
  11. <body>
  12. <div id="wrapper">
  13. <!-- Sidebar -->
  14. <aside id="sidebar-wrapper">
  15. <header><h1><a href="//">~Phyks</a></h1></header>
  16. <h2>Catégories</h2>
  17. <nav id="sidebar-tags">
  18. <div class="tag"><a href="//">/Arch (6)</a> </div><div class="tag"><a href="//ébergement.html">/Autohébergement (6)</a> </div><div class="tag"><a href="//">/Dev (16)</a> </div><div class="tag"><a href="//">/DIY (4)</a> </div><div class="tag"><a href="//Électronique.html">/Électronique (4)</a> </div><div class="tag"><a href="//">/Libre (14)</a> </div><div class="tag"><a href="//">/Linux (12)</a> </div><div class="tag"><a href="//">/Phyks (16)</a> </div><div class="tag"><a href="//">/Smartphone (4)</a> </div><div class="tag"><a href="//">/Vim (2)</a> </div><div class="tag"><a href="//">/Web (14)</a> </div><div class="tag"><a href="//">/Weechat (4)</a> </div>
  19. </nav>
  20. <h2>Derniers articles</h2>
  21. <ul id="sidebar-articles">
  22. <li><a href="//">Getting ipv6 to work with a Kimsufi server</a></li><li><a href="//">Proof-of-concept: BloomySearch, a (JavaScript) client-side search engine for static websites</a></li><li><a href="//">Balancer le son de ses hauts-parleurs sur le réseau</a></li><li><a href="//">Utiliser son PC sous Arch pour connecter un Raspberry Pi à Internet</a></li><li><a href="//">Sortez vos emails, c'est pas sale&nbsp;!</a></li><li><a href="//">Archives</a></li>
  23. </ul>
  24. <h2>Liens</h2>
  25. <ul id="sidebar-links">
  26. <li><a href="//" title="Contact">Me contacter</a></li>
  27. <li class="monospace"><a href="//" title="Mon Shaarli">find ~phyks -type l</a></li>
  28. <li><a href="" rel="me" title="Github">Mon Github</a></li>
  29. <li><a href="//" title="Divers">Divers</a></li>
  30. </ul>
  31. </aside>
  32. <!-- Page content -->
  33. <header id="header">
  34. <h1><a href="//">~Phyks</a></h1>
  35. </header>
  36. <div id="note_responsive">
  37. <p><em>Note</em> : Cliquez sur la bande bleue à gauche pour faire apparaître le menu.</p>
  38. </div>
  39. <div id="articles">
  40. <article>
  41. <aside>
  42. <p class="day">09</p>
  43. <p class="month">Novembre</p>
  44. </aside>
  45. <div class="article">
  46. <header><h1 class="article_title"><a href="//">Getting ipv6 to work with a Kimsufi server</a></h1></header>
  47. <!--
  48. @author=Phyks
  49. @date=09112014-1945
  50. @title=Getting ipv6 to work with a Kimsufi server
  51. @tags=Phyks
  52. -->
  53. <p>Starting from yesterday, my server (<a href=""></a>) should be available using ipv6. This was not the case before due to laziness and a lack of configuration. However, as setting ipv6 on a Kimsufi seems to not be really straight-forward (out of date documentation, information disseminated over the web and difficult to find between basic mistakes and real errors…), I think it may be useful to keep some notes here. Hope it can help anyone.</p>
  54. <p>The <a href="">doc</a> explains that OVH has not set ipv6 autoconfig on their servers, and that you should configure the default route and IP address yourself.</p>
  55. <p>To find your ip address, it is pretty easy: just go to your manager and look for the ipv6 address in the IP section.</p>
  56. <p><code>ip -6 addr add YOUR_IPV6_ADDRESS/64 dev eth0</code></p>
  57. <p>This will add the ipv6 address to your network device. Then, you have to manually add the default gateway. To get its address, you should remove the last two digits of your ipv6 address and put <code>FF:FF:FF:FF:FF</code> instead. This means that <code>2001:41d0:1:4462::1/64</code> will give you a default gateway <code>2001:41d0:1:44FF:FF:FF:FF:FF</code>.</p>
  58. <p>Then, you should add a default route <em>via</em> this gateway</p>
  59. <p><code>ip -6 r a default via 2001:41d0:1:44FF:FF:FF:FF:FF</code></p>
  60. <p>This is the standard procedure explained in OVH guide and many posts around the web such as <a href="">this one</a> (in French). It may work in some cases, however, in my case, I could not reach the default gateway and then, I could not add this route.</p>
  61. <p>I found <a href=";p=44965&amp;viewfull=1#post44965">a comment</a> on the OVH forum giving a solution.</p>
  62. <p>You should first add a route to reach the gateway</p>
  63. <p><code>ip -6 r a 2001:41d0:1:44FF:FF:FF:FF:FF dev eth0</code></p>
  64. <p>and then, you can add the default route <em>via</em> this gateway</p>
  65. <p><code>ip -6 r a default via 2001:41d0:1:44FF:FF:FF:FF:FF</code></p>
  66. <footer><p class="date">Le 09/11/2014 à 19:45</p>
  67. <p class="tags">Tags : <a href="//">Phyks</a></p></footer>
  68. </div>
  69. </article>
  70. <article>
  71. <aside>
  72. <p class="day">08</p>
  73. <p class="month">Novembre</p>
  74. </aside>
  75. <div class="article">
  76. <header><h1 class="article_title"><a href="//">Proof-of-concept: BloomySearch, a (JavaScript) client-side search engine for static websites</a></h1></header>
  77. <!--
  78. @author=Phyks
  79. @date=08112014-1845
  80. @title=Proof-of-concept: BloomySearch, a (JavaScript) client-side search engine for static websites
  81. @tags=Dev, Web
  82. -->
  83. <h2>Overview</h2>
  84. <p>Many websites and blogs are statically generated and the webserver only serves static files. It is the case of many doc websites and more and more blogs, starting from this one, as <a href="">Jekyll</a>&nbsp;/ <a href="">Pelican</a> develops.</p>
  85. <p>This is really useful to reduce the complexity of the website and the load on the webserver. All the complex logic is done at the generation.</p>
  86. <p>However, this also means you do not have dynamic pages on your website to handle search queries. Then, you are left with two (or three) choices&nbsp;:</p>
  87. <ol>
  88. <li>Use an external search engine, such as an embedded Google search box. This raises some privacy concerns and make you depends on an external service. </li>
  89. <li>(Use a JS search engine such as the <a href="">filters</a> provided by Angular.JS. This only works on the displayed content, and is not a real solution). </li>
  90. <li>Stop worrying about search engine on your website and let the users <code>wget</code>-ing and <code>grep</code>-ing your website on their computers. This is not the most user-friendly solution…</li>
  91. </ol>
  92. <p>There are a couple of solutions around, mostly based on <a href="">Lunr.js</a> which generates an index from the articles available, and use this index for fulltext search. This is the best solution I found so far but it is still not perfect. Although there is a stemmer and an index generation to reduce the amount of data to be transferred, the data is not stored in a very efficient way, and the full index is sent as JSON. An example implementation for Jekyll is available through <a href="">the jekyll-lunr-js-search plugin</a>.</p>
  93. <p>I had the idea of a client side search engine in mind for a while, but was facing the same problem as Lunr.js: how not to send a full (very large) index over the network to every single client&nbsp;? Not having an optimized data structure would basically mean sending twice the content of the website to the client. It may not be a practical problem nowadays, as transfer speed is not always the limiting resource, but it is still not to be considered as a good practice, in my opinion, especially if your website might be accessed from mobile devices.</p>
  94. <p>I came accross <a href="">this article</a> from Stavros Korokithakis and thought something similar could be achieved directly in the browser. Instead of using a standard dictionary to store the index, this article proposes to use a Bloom filter per article. Bloom filters are very interesting probabilistic structures which can store whether an element is or not in a set, with a fixed number of bits. It can return false positives: if an element is in the set, it always returns <code>True</code>, but if an element is not in the set, it may say it is actually in, with a small probability. <a href="">Wikipedia page</a> on the subject has all the necessary stuff to understand these data structures.</p>
  95. <p>I wrote it in the context of my blog, which means a Python script to generate the index at pages generation, and a client side search engine in JavaScript, running in browser.</p>
  96. <p>A demo is available <a href="">here</a>. It contains all the articles of my blog, as of writing this article, totalizing 160k characters, and only 7kB of index, allowing 10% of false positives, which may be a bit too much for a really reliable search engine. Reducing the error rate will lead to an increase in the index size (11kB for 1% of false positives and the same amount of characters).</p>
  97. <h2>Details of the implementation</h2>
  98. <p>As JavaScript is not the easier language to use for hashing and binary data manipulation, I started by implementing the client side search engine. Then, it would be easier to adapt the Python code to the JS lib than doing the contrary. Actually, I found <a href="">this bloomfilters.js library</a> from Jason Davies which was doing most of the job and did not need many modifications. I edited it a bit to support a construction with a <code>capacity</code> and an <code>error_rate</code>, instead of an explicit number of bits and times to apply the hashing function. This forked version is available <a href="">here</a>.</p>
  99. <p>Then, I reimplemented this library in Python, to generate readable Bloom filters for the JavaScript script.</p>
  100. <h3>Server side</h3>
  101. <p>The generation script takes every articles in a given directory and for each of them:</p>
  102. <ol>
  103. <li>It gets a set of all the words in this article, ignoring too short words. </li>
  104. <li>It applies <a href="">Porter Stemming Algorithm</a> to reduce drastically the number of words to keep. </li>
  105. <li>It generates a Bloom filters containing all of these words.</li>
  106. </ol>
  107. <p>Finally, it concatenates all the per article Bloom filters in a binary file, to be sent to the client. It also generates a JSON index mapping the id of the Bloom filter in the binary file to the corresponding URL and title for each article.</p>
  108. <h3>Client side</h3>
  109. <p>Upon loading, the JavaScript script downloads the binary file (see <a href="">this MDN doc</a> for more details) containing the Bloom filters and the JSON index, and regenerate BloomFilters on the client side.</p>
  110. <p>When the client searches for something, the JavaScript script splits the query in words and iterate over the Bloom filters to search for the words. That's it =)</p>
  111. <h2>(Fun) facts found while reimplementing the Bloom filters library in Python</h2>
  112. <p>First problem I had to deal with&nbsp;: the difference between JavaScript <code>Number</code> type and Python <code>int</code>. JavaScript has only one type for all numbers (<code>int</code> or <code>floats</code>) and it is <code>Number</code> (see <a href="">this SO thread</a>). They are 64-bit floating point values, with a magnitude no greater than 2<sup>53</sup>. However, when doing bitwise operations, they are casted to 32 bits before doing the operation. This is something to take care of, because Python's <code>int</code> can be 64 bits (<a href=""></a>). Then, when a bitwise operation overflows in JavaScript, it may not overflow the same way in Python.</p>
  113. <p>The solution to this problem was to use <code>ctypes.c_int</code> in Python for bitwise operations, as proposed <a href="">here</a>.</p>
  114. <p>Another problem was the difference between modulo behaviour with negative numbers in Python and in JavaScript. Unlike C, C++ and JavaScript, Python's modulo operator (%) always return a number having the same sign as the divisor (<a href="">Source</a>). Then, we have to reimplement the C behaviour in a modulo function in Python.</p>
  115. <p>Finally, there was no “shift right adding zeros” (logical right shift) in Python, contrary to JS, see <a href="">this SO thread</a>. </p>
  116. <footer><p class="date">Le 08/11/2014 à 18:45</p>
  117. <p class="tags">Tags : <a href="//">Dev</a>, <a href="//">Web</a></p></footer>
  118. </div>
  119. </article>
  120. </div>
  121. <footer id="rss">
  122. <p><a href="//"><img src="//" alt="RSS"/></a></p>
  123. </footer>
  124. </div>
  125. </body>
  126. </html>