Friday, March 27, 2015

Diceware for polyglots — Ultra-secure passphrases

The Intercept had an interesting article yesterday explaining how to set up a good master passphrase using Diceware. In a nutshell, in Diceware you roll a dice several times to choose words from a large word list. These words make up your passphrase, and the more words you choose this way the higher the entropy of your passphrase.

While the standard Diceware list contains words of the English language, lists for several other languages are now available as well. Which is nice, especially if English isn't your first language and you might have trouble remembering some of the words. On the other hand, if you have no trouble remembering words in a second language, why stop there? Let's combine several languages. Hurray for polyglots!

Diceware for polyglots

Diceware for polyglots adds one layer to the standard Diceware protocol. Here's how it works:
  1. Choose the word lists for the languages you are comfortable with.
  2. Roll the dice to select a word list.
  3. Roll the dice several times to select a word from that list (as in standard Diceware).
  4. Go back to step 2 and repeat until you have selected enough words.
"Enough words" in step 4 is around 7 nowadays in standard Diceware, although for polyglot Diceware it may be possible to obtain the same entropy with shorter lists.

Here's a random seven word passphrase I just generated that way using five languages:
i've lauf ugh heuvel lanudo myope 31ยบ
Entropy should be pretty good on this one.

Addendum: Diceware for coders

If you're a coder, why stop at natural languages? Key to Diceware is that the words are easy to remember. So if you're a coder you can add lists for programming languages that you're very familiar with as well, though they probably have less words.

Here's a random seven word passphrase combining two natural languages and one programming language:
volt typedef gordon dedans static_cast foyers
Again, very nice. Sorting them into a "sentence" may even improve memorability without harming entropy too much.

Tuesday, May 27, 2014

Desaturation bookmarklet

I recently needed to protect my eyes from a web site whose editors just love saturation. I figured there should be an easy fix for that, so I quickly found the Desaturate extension for Chrome. Nevertheless, nowadays I prefer to install as few extensions as possible since malicious parties figured out how to exploit Chrome extensions for profit. Luckily, many extensions are simple Javascripts that can be replaced by a bookmarklet. So here it goes, a bookmarklet to desaturate a page:

Just drag it to your bookmarks bar and you're done.

Update: clicking the button a second time now returns the page to full color.

Wednesday, March 19, 2014

Machine learning and cryptocurrencies

Here's a thought on cryptocurrencies. It's fairly well known that the proof-of-work required to discover new blocks is actually just a bunch of calculations that typically don't serve any other purpose than producing a hash. And it is a lot of computation power, check for yourself. Some cryptocurrencies like Riecoin try to make block discovery useful outside of the cryptocurrency world as well by making it solve an additional purpose, like finding prime numbers.

Discovering a block is basically done by solving a problem that is hard to solve, but easy to check (think a really hard sudoku). There are still a myriad of problems that fit this description but that are not used in cryptocurrency hashing.

So here's the thought. Why not use the proof-of-work calculations to solve a problem in machine learning? Many problems in machine learning are hard to solve but easy to check. One such problem is training a machine, e.g. a big neural network. Discovering a block would be mean producing a set of weights that allow the network to validate a test data set correctly. (I'm leaving out many practical details here, such as how to treat hashes.)

Another approach may be to treat the entire network of crypto miners as the machine itself, and use it e.g. to detect interesting instances, such as in SETI, though that example may be too esoteric.

[Update 2014/07/29: I'll be pasting some remarks here that have appeared on other social networks in an effort to centralize ideas.]

Specific properties of the computations in typical proof-of-work:

  • Solution is hard to obtain but easy to verify;
  • Input data for the computation depends on the block / transaction data;
  • A small change in the input data triggers a big change in the solution.
A meta-remark by Kevin Peno:
"Block-finding is like winning the lottery and mining throughput is proportional to the number of tickets bought."
So how about formulating a problem that rewards intelligence instead of computational power, possibly maintaining some of the properties listed above?

Tuesday, February 25, 2014

New Chrome extension: 'Nuff Tabs

I wrote a Chrome extension to limit the amount of tabs in Google Chrome. It's called 'Nuff tabs. There were already some extensions in the wild that do something similar, but none with the properties I was looking for (lightweight, specific options, etc.).

It works as follows. In the options you can specify the maximum number of open tabs. When you open one more tab, the extension will automatically close a previous tab. In the options you can also specify which rule it must follow to determine which tab to close: either close the oldest tab, the least recently used tab, the least frequently used tab, or a random tab. My current favorite setting is the "least recently used" tab, as this way I don't have to worry about losing older tabs that are still relevant.

You can also choose to display the number of open tabs in the extension icon.

Install it and let me know what you think.

Chrome store link:

Source code:

Wednesday, September 18, 2013

New Drupal web site for our research group

A few months ago we finished installing a new web site for our research group. The previous site was in Drupal 5, and we made an improved version in Drupal 7. You can visit the site here.


These are the most important parts of the site:
  1. Personal profiles for each researcher and professor
  2. A general list of publications and individual listings for each researcher
  3. Information pages for courses
  4. A news section

With respect to the previous version, we made several big changes:
  • We went back to a single language. Previous versions were in English and in Spanish (as we're located in Spain) but we decided to go English-only as this is the language of research. That removed the hassle of having to update profiles in two languages. We still have Spanish-only pages for the courses though, but we are not using the internationalization system anymore.
  • We removed any anonymous user input. There is still a contact form, but we removed the guestbook and comments on announcements.
  • Previously we used a custom-made publication management system. We now changed to the Drupal module Biblio. Biblio has some great out-of-the-box features like publication listings per author and several types of sorting.
  • We made more use of the views module and added some tighter interaction between the different content types on the site. We now have
    • news pages and news blocks showing up in several parts of the site;
    • a page for research projects, with links to related Biblio publications on each project and vice-versa;
    • an area dedicated to reproducible research / open science in which we post code to reproduce experiments from our papers (with links back from the paper pages) and several tools we are building.

Here's a short rundown of how we built the site.


The non-core module footprint was relatively small, since Drupal 7 comes standard with many features. The major non-core extensions we used are the following:
Sticky footer!


Apart from configuring the modules, there has been some getting our hands dirty in 3 areas:
  1. We had to hack the Biblio module as it doesn't allow for hooking / theming properly. Mainly, this was for sorting the publications by type with a custom order (not alphabetic) and some aesthetic aspects.
  2. The theme is a zen subtheme. The user profiles are built with Profile2 and some additional fields, and on top of that they have some CSS but no theming templates.
  3. Some of the views are using complex combinations of contextual filters and relationships, e.g. to show selected publications in our member profiles.


Drupal 7 was a good choice for our research group web site:
  1. The core of this web site is content management, and our content management requirements which were nicely met by Drupal.
  2. We kept the back-end clean and simple, and we didn't bother with any additional graphics or fancy buttons. For instance, text markup is done code-like with BUEditor which is perfect for us technical people.
  3. Drupal has some great extensibility in whatever direction, which meant that we could find almost any component we needed as a contributed module. It also came in handy for linking external services and doing experimental stuff.
The development took a few months, as this is something we did besides our normal research tasks, but we're very glad with the results. Now let's see if it lasts another 5 years.

Tuesday, May 21, 2013

Matlab code and demo for kernel density estimation

I've made it a habit to release the source code publicly every time somebody asks me for help with a publicly available algorithm. Having my source code in public actually also showed to improve its readability, and it helps me find it back (because, let's face it, everybody knows it is easier to find a file on Google than on your own computer).

So today I am releasing some Matlab code to perform Parzen's kernel density estimation of one-dimensional data. It's included in the KMBOX toolbox now, and you can download a standalone version with a demo from here:

The image shows the output of the included demo.

Friday, May 17, 2013

Logos in vector format

I started a small github repo to organize all the logos I've been using in my work lately, mainly in conference posters. As it takes a few minutes to extract a vector logo from a pdf, I though other people might benefit too from the result.

You can find them here:

As an example, here goes the ICASSP 2013 logo in vector format. Click the image for the SVG file, use Inkscape or any other software to edit/convert.