Thursday, July 08, 2021

Oposiciones y probabilidades

 [The following is a guest post by my Spanish alter ego, Steven, who writes in Spanish.]

Son fechas de oposiciones de secundaria en España, y en este post voy a describir un cálculo que cada opositor debería hacer.

Foto por Elijah Hail en Unsplash

La primera prueba de la mayoría de las oposiciones consiste en desarrollar un tema completo del temario de la oposición. Este temario puede contener fácilmente más de 70 temas, cada uno con su complejidad correspondiente, lo cual hace que su memorización sea una hazaña importante o incluso imposible. Afortunadamente, no es necesario estudiarse todos los temas, puesto el opositor puede elegir un tema de una selección pequeña. En particular, la dinámica típica es la siguiente: De todos los temas disponibles se sacan al azar 5, y de esos 5 el opositor elige uno para desarrollar. De ese modo, si el temario contiene 70 temas, no tiene sentido estudiarse todos esos temas, ni 69 ni 68, porque claramente el número mínimo que hay que estudiar para que sea seguro que salga uno de los estudiados es 66, o bien 70 - 4. Incluso no es descabellado estudiarse alguno menos, porque estudiando por ejemplo 60 temas, sigue siendo muy probable de que entre los 5 seleccionados salga uno de los estudiados. Lo cual nos lleva al maravilloso mundo de las probabilidades, la combinatoria, y en particular la siguiente pregunta:

"Si estudio 20 temas de un temario de 70 temas, ¿cuál es la probabilidad de que cuando se escogen 5 salga por lo menos uno de los que he estudiado?"

Esta pregunta es bien importante, porque nos permitirá definir una cantidad de temas a estudiar con suficiente rigor. Por ejemplo, si resulta que dicha probabilidad es de 95%, seguramente valdrá la pena estudiarse solo esos 20 temas, en mucha profundidad. Por otro lado, si esta probabilidad es baja, por ejemplo 30%, deberíamos estudiar más temas para incrementarla, aunque como el tiempo disponible es fijo esto también significará que el tiempo dedicado a cada tema será menos que con esos 20 temas.

En el caso del ejemplo el cálculo no es muy complicado:

P(cae por lo menos 1) = 1 - P(no cae ninguno)

Aquí, P(no cae ninguno) es la probabilidad de que en el sorteo de los 5 números no caiga ninguno de los estudiados, y se obtiene como 

P(no cae ninguno) = P(ninguno en 1º sorteo)·P(ninguno en 2º sorteo)· ... · P(ninguno en 5º sorteo)

Las probabilidades de la mano derecha no son independientes, puesto que al sacar el primer número ya no se puede sacar este número en el segundo sorteo. (Se forman lo que se conoce como "combinaciones sin reemplazo".) Tenemos las siguientes probabilidades:

P(ninguno en 1º sorteo) = 50/70

porque hay 50 posibilidades de sacar un número que no pertenece a los 20 estudiados, de un total de 70. Como no hay reemplazo, obtenemos las otras probabilidades de la siguiente forma:

P(ninguno en 2º sorteo) = 49/69

P(ninguno en 3º sorteo) = 48/68

P(ninguno en 4º sorteo) = 47/67

P(ninguno en 5º sorteo) = 46/66

Juntando todo, obtenemos la probabilidad que buscábamos:

P(cae por lo menos 1) = 1-50/70·49/69·48/68·47/67·46/66 = 82%

Resulta que estudiando solo 20 temas de un total de 70 la probabilidad es del 82%, lo cual es bastante alto. Esta probabilidad es aproximadamente 4/5, lo cual significa que si nos presentásemos 5 veces que solo 1 vez tendríamos la suerte de no haber estudiado ningún tema de los sacados, en promedio.

Pero aunque 82% es una probabilidad razonablemente alta, el tema de la oposición es lo suficientemente importante para intentar garantizar el éxito algo más. Podríamos dar la vuelta a la pregunta y preguntarnos, ¿cuál es el número mínimo de temas que debería estudiar para que la probabilidad de que caiga por lo menos uno de los estudiados sea por lo menos el 95%? La forma más rápida de obtener este número es calcular la probabilidad para 21 temas, 22 temas, etc. por ejemplo con una hoja de cálculo, e identificar en qué caso alcanzamos la probabilidad buscada. En este caso, los números que salen son los siguientes:


O, gráficamente:

En otras palabras, para que la probabilidad de que caiga por lo menos un tema de los estudiados sea del 95% hay que estudiarse 31 temas. Curiosamente, si por ejemplo solo nos estudiamos 9 temas esa probabilidad ya es del 50%. Esto resulta poco intuitivo, y enfatiza la necesidad de hacer un buen cálculo de probabilidades al hacer este tipo de decisiones. No confiemos en nuestra intuición cuando se trata de probabilidades.




Tuesday, January 19, 2021

There was a bug...

... in the latest update of my Chrome extension, RevEye v1.5.0, which went live today. Thanks to the alertness of several users, the bug is fixed now and the new, corrected version v1.5.1 will be rolled out soon.

In any case, if you need to perform reverse image searches right now and are stuck with the buggy version, you can follow the next steps to fix the bug until the extension updates on your machine:

  1. Go to the options screen (right click on icon > options)
  2. Open DevTools (F12 on Windows, Command+Option+I on Mac).
  3. In the tab "Console", write the following code and press enter:

    localStorage.services = ['google', 'bing', 'yandex', 'tineye'];

  4. Visit the page chrome://extensions/, disable the extension and enable it again.
Now it should work correctly, including the new options introduced in v1.5.0.

Tuesday, May 27, 2014

Desaturation bookmarklet

I recently needed to protect my eyes from a web site whose editors just love saturation. I figured there should be an easy fix for that, so I quickly found the Desaturate extension for Chrome. Nevertheless, nowadays I prefer to install as few extensions as possible since malicious parties figured out how to exploit Chrome extensions for profit. Luckily, many extensions are simple Javascripts that can be replaced by a bookmarklet. So here it goes, a bookmarklet to desaturate a page:


Just drag it to your bookmarks bar and you're done.

Update: clicking the button a second time now returns the page to full color.

Wednesday, March 19, 2014

Machine learning and cryptocurrencies

Here's a thought on cryptocurrencies. It's fairly well known that the proof-of-work required to discover new blocks is actually just a bunch of calculations that typically don't serve any other purpose than producing a hash. And it is a lot of computation power, check for yourself. Some cryptocurrencies like Riecoin try to make block discovery useful outside of the cryptocurrency world as well by making it solve an additional purpose, like finding prime numbers.

Discovering a block is basically done by solving a problem that is hard to solve, but easy to check (think a really hard sudoku). There are still a myriad of problems that fit this description but that are not used in cryptocurrency hashing.

So here's the thought. Why not use the proof-of-work calculations to solve a problem in machine learning? Many problems in machine learning are hard to solve but easy to check. One such problem is training a machine, e.g. a big neural network. Discovering a block would be mean producing a set of weights that allow the network to validate a test data set correctly. (I'm leaving out many practical details here, such as how to treat hashes.)

Another approach may be to treat the entire network of crypto miners as the machine itself, and use it e.g. to detect interesting instances, such as in SETI, though that example may be too esoteric.


[Update 2014/07/29: I'll be pasting some remarks here that have appeared on other social networks in an effort to centralize ideas.]

Specific properties of the computations in typical proof-of-work:

  • Solution is hard to obtain but easy to verify;
  • Input data for the computation depends on the block / transaction data;
  • A small change in the input data triggers a big change in the solution.
A meta-remark by Kevin Peno:
"Block-finding is like winning the lottery and mining throughput is proportional to the number of tickets bought."
So how about formulating a problem that rewards intelligence instead of computational power, possibly maintaining some of the properties listed above?

Tuesday, February 25, 2014

New Chrome extension: 'Nuff Tabs

I wrote a Chrome extension to limit the amount of tabs in Google Chrome. It's called 'Nuff tabs. There were already some extensions in the wild that do something similar, but none with the properties I was looking for (lightweight, specific options, etc.).

It works as follows. In the options you can specify the maximum number of open tabs. When you open one more tab, the extension will automatically close a previous tab. In the options you can also specify which rule it must follow to determine which tab to close: either close the oldest tab, the least recently used tab, the least frequently used tab, or a random tab. My current favorite setting is the "least recently used" tab, as this way I don't have to worry about losing older tabs that are still relevant.

You can also choose to display the number of open tabs in the extension icon.

Install it and let me know what you think.

Chrome store link: https://chrome.google.com/webstore/detail/nuff-tabs/kemeihccgedidlokcbfhdekcfojpjjmp

Source code: https://github.com/steven2358/NuffTabs


Wednesday, September 18, 2013

New Drupal web site for our research group

A few months ago we finished installing a new web site for our research group. The previous site was in Drupal 5, and we made an improved version in Drupal 7. You can visit the site here.


Overview

These are the most important parts of the site:
  1. Personal profiles for each researcher and professor
  2. A general list of publications and individual listings for each researcher
  3. Information pages for courses
  4. A news section


With respect to the previous version, we made several big changes:
  • We went back to a single language. Previous versions were in English and in Spanish (as we're located in Spain) but we decided to go English-only as this is the language of research. That removed the hassle of having to update profiles in two languages. We still have Spanish-only pages for the courses though, but we are not using the internationalization system anymore.
  • We removed any anonymous user input. There is still a contact form, but we removed the guestbook and comments on announcements.
  • Previously we used a custom-made publication management system. We now changed to the Drupal module Biblio. Biblio has some great out-of-the-box features like publication listings per author and several types of sorting.
  • We made more use of the views module and added some tighter interaction between the different content types on the site. We now have
    • news pages and news blocks showing up in several parts of the site;
    • a page for research projects, with links to related Biblio publications on each project and vice-versa;
    • an area dedicated to reproducible research / open science in which we post code to reproduce experiments from our papers (with links back from the paper pages) and several tools we are building.


Here's a short rundown of how we built the site.

Modules

The non-core module footprint was relatively small, since Drupal 7 comes standard with many features. The major non-core extensions we used are the following:
Sticky footer!

Customizations

Apart from configuring the modules, there has been some getting our hands dirty in 3 areas:
  1. We had to hack the Biblio module as it doesn't allow for hooking / theming properly. Mainly, this was for sorting the publications by type with a custom order (not alphabetic) and some aesthetic aspects.
  2. The theme is a zen subtheme. The user profiles are built with Profile2 and some additional fields, and on top of that they have some CSS but no theming templates.
  3. Some of the views are using complex combinations of contextual filters and relationships, e.g. to show selected publications in our member profiles.


Conclusion

Drupal 7 was a good choice for our research group web site:
  1. The core of this web site is content management, and our content management requirements which were nicely met by Drupal.
  2. We kept the back-end clean and simple, and we didn't bother with any additional graphics or fancy buttons. For instance, text markup is done code-like with BUEditor which is perfect for us technical people.
  3. Drupal has some great extensibility in whatever direction, which meant that we could find almost any component we needed as a contributed module. It also came in handy for linking external services and doing experimental stuff.
The development took a few months, as this is something we did besides our normal research tasks, but we're very glad with the results. Now let's see if it lasts another 5 years.

Tuesday, May 21, 2013

Matlab code and demo for kernel density estimation


I've made it a habit to release the source code publicly every time somebody asks me for help with a publicly available algorithm. Having my source code in public actually also showed to improve its readability, and it helps me find it back (because, let's face it, everybody knows it is easier to find a file on Google than on your own computer).

So today I am releasing some Matlab code to perform Parzen's kernel density estimation of one-dimensional data. It's included in the KMBOX toolbox now, and you can download a standalone version with a demo from here: https://sourceforge.net/projects/kmbox/files/packs/


The image shows the output of the included demo.