Jochen's Blog

By Jochen Voss, last updated 2015-04-17

Fast-dm version 30.2 released

I am happy to announce the release of fast-dm version 30.2. This release fixes some memory leaks found by Valgrind. You can download the program from the Fast-dm homepage.

"new" paper

A new paper, together with my PhD student Mark Webster and my colleague Stuart Barber, has been published today!

This paper took a long time to be published, but I think it is worth the wait.

slow article

I am very happy to announce that a very old (nearly forgotten?) article has finally appeared. We chose the journal because we had heard the rumour that they have a fast turnaround, but this plan didn't work out: the article was submitted 23 September 2011 and accepted on 21 February 2012. But then came a two year wait, and the article only has appeared just now:

New PhD position

Do you want to do a PhD on Bayesian Uncertainty Quantification with me? Are you good with maths and computers? As part of the new, NERC funded, doctoral training centre I am advertising for a PhD position. Details can be found on the DTP web page, the deadline for applications is 24 January 2014, start of the PhD is October 2014.

The Rate of Convergence for Approximate Bayesian Computation

I am very happy to report that after a long struggle we have finally finished (and submitted) a new paper today:

The paper studies the convergence of ABC estimates to the correct value. In particular,

Hopefully, these results prove useful for others, too.

strange random number generator

I just tried to find out what the default random number generator of the Go programming language is. The source code for this RNG can be found in the file rng.go; the only reference in the source code is the cryptic comment algorithm by DP Mitchell and JA Reeds. Does anybody have a reference for this, or can at least identify DP Mitchell or JA Reeds? Some things I tried:

Any hints about where the algorithm comes from would be most welcome.

the book is out!!!


I am very happy to report that my book has finally appeared. While the book officially is out for a few days already, it was only very recently that it appeared on Amazon. You can get your copy by clicking on the thingy below (if it shows; otherwise by clicking here).

The book is written as a text-book for postgraduate students and covers basic topics around the area of Monte Carlo methods. Some more information can be found on the publisher's web page.

This is quite exciting for me. It's a book!!! I have a paper copy: you can touch it and all, it's really real!

Frontier Research

Today I got an email from a Dr. R. K. Dixit (HON), bringing good news:

I came across to your research paper titled "Upper and Lower Bounds in Exponential Tauberian Theorems" and feel that your research is having a very good impact. With a view to begin a long-term fruitful association with you, I invite you to submit your upcoming research articles / papers for publication in Global Journal of Science Frontier Research (GJSFR), an international double blind peer reviewed research journal.

It seems that I could be an expert in Frontier Research!!! The email refers to the following paper:

While this is a nice paper (it got much better during the refereeing process; had I known this in advance I would have submitted it to a better known journal), it is still a relief to hear that the paper has very good impact. Google currently doesn't know of any citations of my paper.

As it happens, the publisher of GJSFR, Global Journals Inc. (US), is also listed on Beall's list of potential, possible, or probable predatory scholarly open-access publishers.

about the "Gem Forge" in Infinity Blade 2

A while ago, the computer game Infinity Blade 2 was given away for free as part of some advertisement campaign and I got myself a copy at the time.

One of the features in the game is a "gem forge", where one can put in three gems and, after a few hours, gets a different gem in return. Gems have both financial value (gems can be sold) and practical value (gems can be used to enhance fighting skills and equipment). I searched for advice about how to best use the gem forge, but found only vague rumours and incomprehensible to me lists.

To understand the gem forge better, I started taking notes about which output I got for which combination of inputs. The procedure is a bit slow, so I have only 16 data points so far, but I got a few results:

I haven't looked at other properties (shape, colour, function, etc.) of the gems yet.

If anybody knows more about the gem forge, or if there is a good reference available on the internet, I'd be happy to learn about this.

The Fortuna Random Number Generator

During the last days I finished an implementation of the Fortuna random number generator in the Go programming language. The result can be found on the new Fortuna home page and on

This is very different from the random number generators I usually deal with: Normally I am interested in random number generators for use in Monte Carlo simulations; this is, for example, the topic of my upcoming book (click the image below for more information). In contrast, the Fortuna random number generator is made for use in cryptographic algorithms; for example it could be used to generate session tokens for a web app.

new software package: a simple tracing framework for Go

I have just published a small package for the excellent Go programming language: The new package, trace, implements a simple tracing framework. Code using this framework can emit diagnostic messages using the trace.T() function. In normal operation, calls to trace.T() have no effect and are very fast. When tracing is enabled (in order to track down a problem or to explore the inner workings of a program), listeners can record/display the trace messages.

Details can be found on the tracke package home page. Comments are very welcome!

new paper

Today I submitted a new paper. Preparing this took forever, but it's finally out:

my first book!!!

I am very happy to report that I have finished writing my first-ever book! The title is An Introduction to Statistical Computing and the book will be published by Wiley. This book is a textbook, mostly for postgraduate students, and it covers random number generation, Monte Carlo methods, and a few more advanced topics.

happy 2013 to everybody!


Police and Crime Commissioner elections

The Police and Crime Commissioner elections will take place on Thursday. The question is whom to vote for.

Some thoughts:

The candidates for Leeds are the following (summarised from the choose my PCC web page):

3D graphics in R (updated)

For (my own) future reference, here are example commands to create a 3D (surface) plot in R:


x <- seq(-10, 10, length=20)
y <- seq(-10, 10, length=20)
z <- outer(x,y, function(x,y) dnorm(x, 2, 3)*dnorm(y, 3, 7))

palette <- colorRampPalette(c("blue", "green", "yellow", "red"))
col.table <- palette(256)
col.ind <- cut(z, 256)
persp3d(x, y, z, col=col.table[col.ind])

These commands open the plot in a new window. The windowRect parameter in the call to open3d determines the initial position and size of the window (many thanks to Jem Corcoran for pointing this out). The output can be rotated and scaled using the mouse.

[a 3D plot, generated with R]

publication-quality figures with R, part 2

Following up on my recent blog entry about generating publication quality figures with R, here are some more hints, this time about setting the figure margins.

An R figure consists of several regions. From the outside in, these are as follows:

[figure regions in R]

figure 1. The different regions of a figure generated in R

These different regions are illustrated in figure 1, above. The figure was created using the following R code:

par(oma=c(2, 2, 2, 2), mar=c(4, 4, 2, 2))

plot(rnorm(10), xlab="x label", ylab="y label")

box("outer", col="red", lwd=2)
mtext("device region", side=3, line=1, col="red", outer=TRUE)

box("inner", col="green")
mtext("figure region", side=3, line=1, col="green")

box("plot", col="blue")
text(5, 0, "plot region", col="blue")

A plot for use in a publication will normally need to make efficient use of the available space. For this purpose, the default margins used by R are too generous. This is illustrated in figure 2.

[default plot margins for R plots]

figure 2. A simple R plot, using the default margin settings. The red box was added to show the total area covered by the figure. There are unnecessarily large margins on the top and the right of the figure.

To maximise the area available for displaying information, unnecessary white space around the plot should be avoided. To achieve this, the following suggestions may be useful:

A version of the above plot, with the margins adjusted as shown above, is displayed in figure 3:

[R plot with adjusted margins]

figure 3. The plot from figure 2, but with the margin sizes adjusted to maximise the area available for displaying information. This figure will take up the same amount of space as figure 2 in the enclosing document, but provides much more space for displaying the plot data.


Finite Element Discretisation for SPDEs

I am happy to notice that one of my papers has been published, after a bit of a wait:

If you are interested in numerical solution of stochastics partial differential equations, this may be of interest. Comments about the article are very welcome.

time zone aware timestamps in JavaScript and Python

For one of my programming experiments, I want to compute timestamps by using the number of seconds since the beginning of 1970. The resulting value will be used for communication between a web browser and my web server. My aim is to find a method to get the very similar values for the timestamp in JavaScript code running on the web browser and in a Python program running on my web server.

To solve this problem, I need a method which does not depend on the time zone the web user is in, and which does not break when the daylight saving time (DST) starts or ends. One solution to this problem is, to perform the computation of the timestamp in Coordinated Universal Time (UTC, a successor of the Greenwich Mean Time). For reference, this post describes what I found.

In the web browser, I am using the following JavaScript code:

  now = new Date()
  timestamp = now.getTime() / 1000

Here, the getTime method does all the work; since JavaScript timestamps use milliseconds since the beginning of 1970, we have to divide by 1000 to convert to seconds. According to the ECMAScript specification (also governing JavaScript), the resulting value should be independent of time zone and DST. For testing, I changed the clock of my computer to a different time zone; everything still worked: after restarting the browser, now reflects the new time zone, but timestamp is only changed by the few seconds I needed to restart the browser.

The code on the web server is written in Python. After a lot of experimenting, I've found two working combinations: The more straightforeward way of computing a timestamp is as follows.

  import datetime
  import time

  now =
  timestamp = time.mktime(now.timetuple()) + now.microsecond * 1e-6

Despite the fact that there is no explicit conversion to UTC in this code, the computed timestamp gives the number of seconds since the start of 1970 in UTC. This depends on time.mktime's ability to magically know the current time zone of the computer this code is running on. The same computation can also be done using times in UTC directly.

  import datetime
  import calendar

  now = datetime.datetime.utcnow()
  timestamp = calendar.timegm(now.utctimetuple()) + now.microsecond * 1e-6

This time, the code depends on the utcnow function knowing the current timezone. An ugly point of the second solution is, that it requires the slightly obscure calendar module from the Python standard library. Again, I tested this code by changing the time zone of the computer's clock and all worked as expected.

neat trick!

I just learned a new trick: the "texdoc" program automatically finds and displays information for LaTeX packages. On my shiny Mac, using the MacTeX distribution, I can type commands like the following:

texdoc geometry
texdoc tikz
texdoc jvlisting

This shows the documentation for the corresponding package as a PDF file.

using R to generate publication-quality figures

I am currently in the process of writing a book, and I plan to produce several figures for this project using the statistical computing package R. While R is generally quite good at creating scientific graphics, some adjustments help to create high-quality images for inclusion in another document. This blog entry gives some simple hints about the required steps.

Wisent release 0.6.2

I am happy to announce release 0.6.2 of Wisent, the Python parser generator. This release fixes a few minor issues which have been found in the previous release of Wisent. The source code of Wisent 0.6.2 can be downloaded from the Wisent homepage or from

[cave painting of a wisent]

jvqplot release 0.2

I am happy to announce release 0.2 of the jvqplot data plotting program. One of the main features of jvqplot is that the plot is automatically updated whenever the data in the input file changes.

Changes in this release:

The new jvqplot release can be downloaded from the jvqplot homepage.

[jvqplot demo with squares]

jvjsdoc, a documentation generator for JavaScript

This weekend, I finished a new program: JvJsDoc, a program to extract documentation from JavaScript source code and to present the collected information in a set of HTML pages. It is meant to be used together with the Google Closure Library and the Google Closure Compiler. As an example: JvJsDoc can convert the JavaScript source file goog/disposable/disposable.js into the HTML documentation in goog/Disposable.html.

More information can be found on the JvJsDoc homepage, and you can download the program here:

jvjsdoc version 0.5, 2011-12-11

First public release.

new paper

After taking much longer than I intended to, today I finally submitted a new paper (about a year later than planned). You can have a look at the preprint here:

new jvlisting release

jvlisting is a LaTeX package which provides an alternative to LaTeX's built-in verbatim environment. The new release, version 0.5, fixes some bugs and cleans up the TeX source code a bit.

I hope you find this package useful. Bug reports and comments are very welcome!

Some Fundamental Properties of Multivariate von Mises Distributions

Last week, Kanti and I submitted a new paper:

I am particularly curious about how the referees will react to my nice figures on pages 6 and 7.

statistical computing

I have typed my lectures notes for the statistical computing module I am teaching this term. The notes are available online:

Comments are very welcome!

playing with Google+

I just started playing with Google+, not sure yet whether this will be useful or interesting to me. If you also want to play, and don't have an account yet, you can probably sign up at this link.

watching DVDs on an Apple iPad

Just for reference, here is a recipe about how to watch your DVDs on an Apple iPad. I have read that some of the required steps may be illegal to perform when you are in the USA (or an American citizen?); don't follow these instructions if you are not allowed to!

The instructions provided here use MPlayer/MEncoder on Ubuntu Linux to convert the DVD content into a file of the required format, but many other options are available.

new LaTeX package "jvlisting"

A while ago I finished my first home-made LaTeX package. The new package is called "jvlisting" and provides a replacement for LaTeX's verbatim environment.

This package provides the LaTeX environment listing, an alternative to the built-in verbatim environment. The listing environment is specially taylored for including listings of computer program source code into documents. The main advantages over the original verbatim environment are that

You can download the package and its documentation from my webserver or from the jvlisting page at the Common TeX Archive Network (CTAN). I'd be happy if you could give my package a try. Comments about jvlisting are very welcome!

LaTeX: calling a macro for every line of input

Recently I spent some time to understand how one can execute a macro repeatedly, once for every line of text in a LaTeX environment. Since the solution is a bit tricky and I found it diffcult to find answers on the web, here is a summary of what I learned.

Step 1. In order to prevent input lines from being concatenated by TeX before we get access to them, we can use the \obeylines macro. This allows to define a macro which matches everything until the end of line:

\def\doline#1{line found: `#1'\par}

\getline This is the \textbf{first} line.
This is the second line.}

Note that \obeylines must be in effect both while we define the \getline macro and while scanning the text. The outer pairs of enclosing brackets are there to contain the effect of \obeylines. To make \getline visible outside these brackets we use \gdef to define the macro in the global namespace. The output of the above TeX code is

line found: ‘This is the first line.’
This is the second line.

Step 2. We would like to have the \getline macro called for every line instead of just for the first one. This can be achieved by putting a \getline (without arguments) at the end of the \getline replacement text. The only complication is that we somehow need to stop this procedure once the end of the region of interest has been reached:

  \ifx\text\marker \let\next\empty
    \else \doline{#1}\let\next\getlines \fi

\getlines This is the \textbf{first} line.
This is the second line.

The \ifx command is used to test whether the matched line is the same as \marker. Until we find the maker, we put a new call to \getlines at the end of the \getlines replacement text, thereby looping over all lines until the marker is found. For this to work, the END needs to occur on a line on its own; therefore we have to move the closing brackets down one line. The TeX code above leads to the following output.

line found: ‘This is the first line.’
line found: ‘This is the second line.’

Step 3. We are now ready to pack the commands from above into the definition of a LaTeX environment:

  \ifx\text\marker \let\next\text
    \else \doline{#1}\let\next\getlines \fi

This is the \textbf{first} line.
This is the second line.

Here we had to change the definition of \getlines in order to include the \end{dolines} in the replacement text when the \ifx is true; otherwise it would have been swallowed by \dolines. The resulting output is, a bit surprisingly, as follows:

line found: ‘’
line found: ‘This is the first line.’
line found: ‘This is the second line.’

The extra empty line at the beginning is caused by the \getlines macro matching the (empty) text between \begin{dolines} and the end of line. The \end{listing} must be on a line on its own for this environment to work.

Step 4. If we want to prevent processing of the TeX commands inside the dolines environment, we can do so by changing the category codes of special characters like \ to 12 (other). A complication with this plan is, that \ifx also compares category codes. Therefore, when we try to detect the end of our environment while special characters are switched off, we need to define \marker as the string \end{dolines}, but with the category codes of all special charcters (the backslash and the curly brackets) set to 12. This can be done by using the following, contorted sequence of commands:

\catcode`|=0 \catcode`[=1 \catcode`]=2
\catcode`\{=12 \catcode`\}=12 \catcode`\\=12

Two more changes are required to make this work: first, we need to disable all special characters at the beginning of the environment:


If you want to use this command outside a style file, you will need to turn @ into a letter by bracketing your definitions with


Secondly, inside \getlines, the expansion of \text still has the category codes as found in the input. Therefore, if the input was read with special characters switched of, we cannot write \let\next\text to get the closing \end{dolines} (since the category codes of the backslash and the curly brackets would be wrong). Instead, we need to use a definition like \def\next{\end{dolines}}.

Example. Using the techniques explained above, we can define a replacement for the LaTeX comment environment (contained in the verbatim package) as follows:



\catcode`|=0 \catcode`[=1 \catcode`]=2
\catcode`\{=12 \catcode`\}=12 \catcode`\\=12

  \ifx\text\marker \def\next{\end{comment}}
    \else \let\next\getlines \fi
    \let\do=\@makeother\dospecials \obeylines\getlines}%



  Everything here will be ignored.
  \Invalid LaTeX code and incomplete constructs like
  without the closing end are no problem.


old papers

Funny coincidence: Yesterday I learned that the following paper which we first submitted in 2010 has been accepted:

And today I received an email notice that a paper we submitted in 2009(!) is finally going into print now:

Advert: new lectureship in Leeds

We are currently advertising a new lectureship in the statistics department of the University of Leeds. Thus, if you are a PostDoc and want to become a collegue of me, have a look at the job advert. The closing date for applications is 31st March 2011.

Jvqplot version 0.1 released

I am happy to announce the release of Jvqplot version 0.1.

Jvqplot is a data plotting program, resembling a simplified version of gnuplot. It is very simple to use, but it can only plot simple data files. The format and scaling of the plot is automatically chosen and cannot be changed. One of the main features of jvqplot is that the plot is automatically updated whenever the data in the input file changes. More information can be found on the Jvqplot homepage.

jvqplot version 0.2, 2012-04-09

empty lines in data files separate data sets, clean up the program sources

jvqplot version 0.1, 2010-11-29 (experimental version)

first public release

This is the first public release of jvqplot. Any feedback about the program would be most welcome.

[jvqplot screenshot]

TV licensing

Since I don't watch TV programmes and also don't own a TV, I do not need a TV license. The TV licensing people seem to find this similarly difficult to believe than their German counterparts, so they keep hassling me about this.

Their latest letter seems to allow to declare online that I don't need a license. Unfortunatly, the web form quickly forces me to choose one of the following possibilities:

The last option seems like a possible choice for me, except for the fact that my computer probably qualifies as equipment. So which option am I meant to choose????

What Every Computer Scientist Should Know About Floating-Point Arithmetic

I just discovered the (1991) article What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. Very useful reading!

Wisent version 0.6.1 released

I am happy to announce the release of Wisent version 0.6.1. Wisent is a parser generator for Python. The new release fixes a problem (introduced in version 0.6) where comments and multi-line strings in grammar files were broken. You can download the program from the wisent homepage.

birthday paradox

I am slightly embarrassed by the fact that I've been caught out by the birthday paradox yesterday. The encounter went as follows:

While testing a new random number generator by analysing a list of 1000 generated standard normal distributed random numbers, I discovered that the list contained one of the numbers twice!!! This is suspicious, because this event has probability 0 in theory. After an (unsuccessful) hunt for bugs in my program, I finally found the following explanation.

The program prints the numbers using a C command like

printf("%f\n", normal(0, 1));

By default, the %f format string outputs numbers with a precision of six significant digits:


Most of the numbers will lay between -2 and 2, i.e. they are concentrated in a set of about 4 million possible values. A quick check reveals that 1000 independent uniform draws out of a set of this size contains a number twice with a probability of more than 10%, and for the normal distribution the probability will be even higher because the numbers are more concentrated around 0. Thus, seeing a number twice is something which will actually happen from time to time and is no indication that the program is malfunctioning!

scary phone message

Just now, my (not very fancy) phone showed the following corrupted message to me:

[phone screen]

I'm a bit at a loss about what is going on? If you have any idea, I would be happy to receive hints via email. Do I need to worry?

using DSSP in the "bio3d" R package

The program DSSP is used to determine the secondary structure of a protein, taking the three dimensional coordinates of its atoms as the input. Bio3d is a library for the statistical software package R which makes it easier to analyse protein structure; as part of this, bio3d contains an interface to DSSP (in the function dssp). Since I had a bit of trouble using this interface, here are some hints.

  1. Get the DSSP program from the DSSP webpage. For academic use, the program is free of charge, but you need to fill in a license agreement.
  2. Rename the executable to dssp (mine was called DSSP_MAC.EXE after I had downloaded it).
  3. Move the file to the directory where it will live. For my system (and for the examples below) I chose /usr/local/bin/. The name of this directory is not allowed to contain white space.
  4. Call the dssp function with this directory as the exepath argument. The value of the exepath argument must end in / (on Linux/Unix/MacOS) or \ (on Microsoft Windows). Also note that every \ in an R string constant needs to be doubled, e.g. on Microsoft Windows you will probably need to write something like exepath="C:\\path\\to\\file\\".

Example. In R one can now use the following commands.

pdb <- read.pdb("12as")
x <- dssp(pdb, exepath="/usr/local/bin/")

Then the secondary structure information of the protein 12AS can be accessed as follows:

> x$helix
  1   2   3   4   5   6   7   8   9   1   2
  5  76 130 170 182 258 277 297 320 271 310

  1   2   3   4   5   6   7   8   9   1   2
 27  83 155 176 193 268 283 305 325 275 312

 1  2  3  4  5  6  7  8  9  1  2
23  8 26  7 12 11  7  9  6  5  3

 [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"

This tells us that the first helix reaches from residue 5 to residue 27 (both inclusive).

cereal grains

I learned about the different kinds of grains long ago, when I was in primary school. Since then I have somehow managed to confuse myself about which grain is which. Here's to unconfusion! (All information and pictures are from Wikipedia; follow the links for details.)

name picture comments
wheat / Weizen
(Triticum aestivum, Triticum durum, etc.)
[wheat] Used for bread, flour, couscous, beer, biofuel, …

Needs good soil and climate, several species of wheat are used, main source of vegetable protein in human food.

barley / Gerste
(Hordeum vulgare)
[barley] Used for beer, whisky, animal feeding, …

One of the first cultivated crops (together with einkorn and emmer).

oat / Hafer
(Avena sativa)
[oat] Used for animal feeding (e.g. horses), oat flakes, porridge, …

Can be grown in regions with relatively cold, wet summers.

rye / Roggen
(Secale cereale)
[rye] Used for bread, animal feeding, …

Can grow on relatively poor soil and in colder climates.

new puzzle game

In the last days I have finished my first ever browser-based game: a puzzle, based on the classical fifteen puzzle. Cou can play the game here and download the source code from the program's homepage.

[webpuzzle screenshot]


[Jochen on the Cam]

new paper

Today, Martin and I finally finished our paper "Approximations to the Stochastic Burgers Equation". A preprint can be downloaded below, comments are very welcome.

flouridation of drinking water

It is of course well known that water fluoridation is a Communist conspiracy to sap and impurify all of our precious bodily fluids. Still, I am surprised to learn that one of the political parties running in the UK's upcoming general elections uses their opposition to water fluoridation as a campaign topic.

Can you guess which party this is? (for the answer click here)

Sampling Conditioned Hypoelliptic Diffusions

Only one day after we submitted the revised version of our paper Sampling Conditioned Hypoelliptic Diffusions (see my previous blog post), the article was accepted! It will appear in the Annals of Applied Probability.

updated paper

In order to address the referee's comments, we have prepared an updated version the following article:

Talk at Maths2010

I put up the slides from my talk at the Maths2010 conference in Edinburgh. Since I had only 15 minutes, the talk is nearly free of contents, but in compensation there are many nice pictures. I'm particularly fond of the torus on page 8 (which was surprisingly difficult to generate).

[3D torus]

Ilkley Moor

This is a (some weeks old) photo of all twelve of me on Ilkley Moor.




fast vector operations using SSE2

Yesterday I learned how to speed up the vector operation y := y + αx by using SSE2 instructions.

The function I was trying to speed up was as follows:

static void
daxpy0(int n, double a, const double *x, double *y)
  int i;
  for (i=0; i<n; ++i) {
    y[i] += a * x[i];

Using SSE2 instructions, operating on two double precision numbers at a time, this can be written as follows:

#include <emmintrin.h>

static void
daxpy1(int n, double a, const double *x, double *y)
  __m128d aa, xx, yy;
  int i;

  aa = _mm_set1_pd(a);       /* store a in both registers */
  for (i=0; i+1<n; i+=2) {
    xx = _mm_load_pd(x+i);   /* load two elements from x ... */
    xx = _mm_mul_pd(aa, xx); /* ... and multiply both by a */
    yy = _mm_load_pd(y+i);   /* load two elements from y ... */
    yy = _mm_add_pd(yy, xx); /* ... and add */
    _mm_store_pd(y+i, yy)    /* write back both elements of y */
  if (i < n) {
    y[i] += a * x[i];        /* handle last element if n is odd */

This code required that the vectors x and y are 16-byte aligned (otherwise a segmentation fault will occur). This assumption holds, for example, for memory blocks allocated by malloc on 64bit Linux and MacOS X systems. Also, obviously, this only works on CPUs which support the SSE2 instruction set. A description of the SSE2 instructions used here can be found in the Intrinsics Reference of the Intel C++ Compiler Documentation (also seems to apply to the GNU C compiler; direct link here).

For comparison, I also tried to use the daxpy function provided by BLAS:

static void
daxpy2(int n, double a, const double *x, double *y)
  extern void daxpy_(int *Np, double *DAp,
                     const double *X, int *INCXp,
                     double *Y, int *INCYp);
  int inc = 1;
  daxpy_(&n, &a, x, &inc, y, &inc);

Details about this approach are described on my page about linear algebra packages on Linux.

Results. I tried the three functions using two vectors of length 2000. The following table gives the execution time for a million calls to daxpy (on a newish quad-core Linux machine):

method function time [s]
direct daxpy0 1.52
SSE2 daxpy1 0.91
BLAS daxpy2 2.47

As can be seen, the SSE2 code in daxpy1 is fastest, compared to the naive implementation daxpy0 it takes 40% off the execution time! For some reason, the BLAS version seems to be very slow; and the results on my MacOS machine are similar. Currently I have no explanation for this effect.

tunnelling http over ssh

Recently I learned how to tunnel http traffic (e.g. web surfing) over an ssh connection. The effect of this is that you can browse the web on one computer A, say, but for the web servers you are visiting it will look like your requests originate from a different host B. You need to be able to log into host B via ssh for this to work.

There are several situations where such tunneling is useful:

Setting up a tunnel is done in two steps:

  1. Create a tunnel using the following command:
    ssh -D 8080 -f -q -N login@host

    You will need to replace login and host with your login details. The machine you type this command on is machine A in the description above, and host is machine B. This command will start an ssh process which will run in the background and will act as a tunnel to forward the web traffic.

  2. Set up your web browser to use a SOCKS web proxy on host localhost and port 8080. Both SOCKS4 and SOCKS5 should work. This will tell the web browser to connect to the local end of the ssh tunnel.


Over the weekend I finished a helper which allows to easily distribute a number of programs over the available CPUs on a multi-core system. You can find the program on the Parallel homepage.

juggling is good for you

I just found somebody's blog post stating 8 Reasons Normal People Should Juggle and I agree very much with his reasons.

Wisent release 0.6

Yesterday, I completed version 0.6 of my Python parser generator Wisent. The new version allows to use UTF-8 encoded grammar files (originally, only ASCII characters could be used in grammar files), adds a CSS parser as an additional example, and fixes some minor bugs. You can download the program from the Wisent homepage.

Upper and Lower Bounds in Exponential Tauberian Theorems

Yesterday I finished a revised version of my article about exponential Tauberian theorems. The main change is, that it transpired that my proof could be trivially modified to get a more general statement. Many thanks to the (anonymous) referee for pointing this out.

paper accepted

Today, one of our papers was accepted:

colorspace conversion in Python (updated)

Just in case this is useful for anybody: here is an implementation of the HSV to RGB colorspace conversion in Python:

from __future__ import division

def hsv2rgb(h, s, v):
    hi = int(h//60 % 6)
    f = h/60 - h//60
    p = v * (1-s)
    q = v * (1-f*s)
    t = v * (1-(1-f)*s)
    return [ (v,t,p), (q,v,p), (p,v,t), (p,q,v), (t,p,v), (v,p,q) ][hi]

Update. In the meantime I learned that the Python standard library has a built-in version of this function in the colorsys module.

Sampling Conditioned Hypoelliptic Diffusions

Today, I finally submitted the SPDE-paper I wrote with Martin Hairer and Andrew Stuart. This paper took a long time to complete (and it changed from an applied maths paper into a pure maths paper in the process). You can download a pre-print here; comments are very welcome:

new paper

A while ago, Andreas and I submitted another paper. Here is a pre-print:

Wisent release 0.5

Over the week-end I completed the work on release 0.5 of my Python parser generator Wisent. You can now download the program from the Wisent homepage. The main news is the addition of a users' manual.

Amazon is weird

I just found out that Amazon sells Uranium ore. Not sure whether I should get some …

web browser differences

Today I discovered by chance the page. Somebody there has spent lots and lots of time to work out the differences between different web-browsers, in particular in CSS and JavaScript support. Very useful!!!

smuggling $134,000,000,000 (updated)

Some days ago I found this strange story about two men trying to smuggle bonds worth 134 billion US dollars across the Italian-Swiss border. Or maybe it was fake bonds? Everything about this case seems quite mysterious to me.

Media coverage is strangly spotty, but summaries are published by the online editions of the Times and the German newspaper Die Welt (which I don't normally read). Of course there are also lots of conspiracy theories.

Update: It transpires that the bonds were fakes.

Hudson River location

A sentence from the BBC web page:

Although passengers survived a landing on the Hudson River in New York in January — it is rarely successful, especially in the middle of an ocean the size of the Atlantic.

New York!!!

I added a few photos from New York to my Photo album:

[a road in Manhattan] [Manhattan] [Union Square, New York] [houses in New York] [Times Square] [central park] [nice figures made from rubbish]

Upper and Lower Bounds in Exponential Tauberian Theorems

Recently I submitted a new paper:

: Upper and Lower Bounds in Exponential Tauberian Theorems. Tbilisi Mathematical Journal, vol. 2, pp. 41–50, 2009.
link, arXiv:0908.0642, more…

This is another leftover bit from my PhD thesis. Comments are most welcome!

new job

Since 1st April 2009 I'm a lecturer at the statistics department of the University of Leeds.

New Book Chapter

Recently we submitted a book chapter for the upcoming Oxford Handbook of Nonlinear Filtering:

, and : Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods. Pages 833–873 in The Oxford Handbook of Nonlinear Filtering, Dan Crisan and Boris Rozovsky (editors), Oxford University Press, 2011.
link, preprint:pdf, more…

figures with matplotlib

I like to use the matplotlib Python library to create figures for use in my articles. Unfortunately, the matplotlib default settings seem to be optimised for screen-viewing, so creating pictures for inclusion in a scientific paper needs some fiddling. Here is an example script which sets up things for such a figure.

#! /usr/bin/env python

import matplotlib
from pylab import *

rc("font", family="serif")
rc("font", size=10)

width = 4.5
height = 1.4
rc("figure.subplot", left=(22/72.27)/width)
rc("figure.subplot", right=(width-10/72.27)/width)
rc("figure.subplot", bottom=(14/72.27)/height)
rc("figure.subplot", top=(height-7/72.27)/height)
figure(figsize=(width, height))

x = loadtxt("x.dat")
plot(x[:,0], x[:,1], "k-")


The script performs the following steps:

CO2 emissions for different modes of transportation (updated)

Today I found the British government's guidelines for computing the CO2 emission for different modes of transportation. I like the careful explanation of the authors' methodology.

The results are summarised in the table below. All data is for 2008, and I rounded and simplified the values to make them easier to digest. You can find the true results in the report. The values for cars (marked with an asterisk) are given in g/km, you need to divide them by the number of passengers in the car.

mode of transportation CO2 emission
[g / passenger km]
national rail 60 table 21 (page 25)
london underground 65 table 21 (page 25)
trams etc. 78 table 21 (page 25)
Eurostar 18 table 21 (page 25)
bus 107 table 17 (page 21)
coach 29 table 17 (page 21)
petrol car 207* table 11 (page 18)
diesel car 198* table 11 (page 18)
motor cycle 106* table 18 (page 22)
domestic flights 175 table 3 (page 9)
short-haul flights 98 table 3 (page 9)
long-haul flights 111 table 3 (page 9)

Update (2010-04-13). DEFRA reorganised their web page, so the above link does no longer work. Their reports can now be found on the Greenhouse gas (GHG) conversion factor methodology papers page. I believe that the 2008 report there is identical to what I used for the table.


It is Christmas very soon, now. Why don't you get me a Christmas present from my Amazon wishlist?

debian-vote release 0.8

I released version 0.8 of my implementation of the voting algorithm used by the Debian project. The new version should be functionally equivalent to version 0.7, but the source code is greatly cleaned up. Many thanks to Neil Schemenauer for providing patches! You can download the program from my page about the Debian voting system.

North Carolina state fair 2008

Some photos from the North Carolina state fair are now on my photo album:

[steak on a stick] [pig racing audience] [pig racing] [baby pig] [people] [more people] [Julien and Jochen] [giant horse] [goat] [goats] [North Carolina's finest] [Ana, Megan and Jochen] [a potter's wheel] [roller coaster] [demolition derby] [roller coaster in the dark]

Durham, NC

Last Saturday I finally managed to visit Durham. Some photos are on my photo album:

[Duke University campus] [Duke University campus] [bus stop] [Duke University campus] [Duke Gardens] [Banana tree] [red palm tree] [man eating plant] [Duke botanical gardens] [Christmas decoration plant] [Durham downtown] [Durham downtown] [Durham downtown] [Durham Centerfest 2008] [old motherboard] [mechanical knight]

My favourite is the last one: it shows a mechanical knight guarding a railway crossing.

Parsing Apache Log Files with Python

The common log format of the Apache web server contains information about page views on a web page. Today I figured out how this format can be parsed in a Python program.

Typical log entries look like this (I added line breaks for better readability): - - [20/Sep/2008:10:09:09 +0100]
  "GET / HTTP/1.1"
  200 4235 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" - - [20/Sep/2008:13:35:06 +0100]
  "GET //errors.php?error= HTTP/1.1"
  404 2272 "-" "libwww-perl/5.79"

The first step of parsing these lines is easy, the regular expression library allows to break a line into the individual fields:

import re

parts = [
    r'(?P<host>\S+)',                   # host %h
    r'\S+',                             # indent %l (unused)
    r'(?P<user>\S+)',                   # user %u
    r'\[(?P<time>.+)\]',                # time %t
    r'"(?P<request>.+)"',               # request "%r"
    r'(?P<status>[0-9]+)',              # status %>s
    r'(?P<size>\S+)',                   # size %b (careful, can be '-')
    r'"(?P<referer>.*)"',               # referer "%{Referer}i"
    r'"(?P<agent>.*)"',                 # user agent "%{User-agent}i"
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')

line = ... a line from the log file ...
m = pattern.match(line)
res = m.groupdict()

This results in a Python dictionary, containing the individual fields as strings.

The second step is to convert the field values from strings to Python data types and to remove the oddities of the log file format. Most fields are easy:

if res["user"] == "-":
    res["user"] = None

res["status"] = int(res["status"])

if res["size"] == "-":
    res["size"] = 0
    res["size"] = int(res["size"])

if res["referer"] == "-":
    res["referer"] = None

Converting the time value into a Python datetime object is surprisingly messy, mainly because of the timezone information. The best way I found until now is the following:

import time, datetime

class Timezone(datetime.tzinfo):

    def __init__(self, name="+0000"): = name
        seconds = int(name[:-2])*3600+int(name[-2:])*60
        self.offset = datetime.timedelta(seconds=seconds)

    def utcoffset(self, dt):
        return self.offset

    def dst(self, dt):
        return timedelta(0)

    def tzname(self, dt):

tt = time.strptime(res["time"][:-6], "%d/%b/%Y:%H:%M:%S")
tt = list(tt[:6]) + [ 0, Timezone(res["time"][-5:]) ]
res["time"] = datetime.datetime(*tt)

If you know of any better way of doing this, please let me know. Hints are very welcome!

Large Deviations for One Dimensional Diffusions with a Strong Drift

After many delays, the paper containing the main result of my PhD thesis finally was published today:

This is more theoretical than the things I currently do, but I believe that it is a good paper.

Abstract. We derive a large deviation principle which describes the behaviour of a diffusion process with additive noise under the influence of a strong drift. Our main result is a large deviation theorem for the distribution of the end-point of a one-dimensional diffusion with drift θb where b is a drift function and θ a real number, when θ converges to ∞. It transpires that the problem is governed by a rate function which consists of two parts: one contribution comes from the Freidlin-Wentzell theorem whereas a second term reflects the cost for a Brownian motion to stay near a equilibrium point of the drift over long periods of time.

Supermassive Black Hole at the Center of the Milky Way

Mark Reid wrote a very nice review article about the current (very strong) evidence for the existence of a supermassive black hole at the centre of the Milky Way. Definitely worth a read! (Found via the KFC blog.)


Today I discovered the very exciting wordle tool at It can convert texts into word clouds like the following:

[worlde created word field]

My Favourite MacOS X Command

A nice MacOS X feature:

voss@soup [~] osascript -e 'tell app "ARDAgent" to do shell script "whoami"'

A Review of the Universe

Today I discovered, by chance, the web page A Review of the Universe - Structures, Evolutions, Observations, and Theories. I have not yet read it all but the parts I looked at seem quite nice.

Link Path Lengths in Wikipedia

Stephen Dolan provides a very nice page which can find the shortest link between any two Wikipedia articles. The results are sometimes surprising: for example, can you guess how to get from Pleistocene to Microprocessor in three clicks?

Blenheim Palace

This week-end we had an exciting trip to Blenheim Palace. [Blenheim palace]


In a recent questionaire, our IT services asked the following, interesting question: Please rank the following in your priority order (1 = Highest, 2 = Medium, 3 = Lowest)

A university with working hours only (presumably theirs not mine) network access is an interesting thought. And I wonder whether guaranteed access differs from access or whether the first option is just a subset of the second option.


I finally got around to put some photos from my trip to Aachen onto my web page. Some weeks late, but here they are:

[sparse tree] [dead horse] [some road] [Cathedral I] [Cathedral II] [The Throne of Charlemagne] [Pulpit]

A fast numerical algorithm for the estimation of diffusion model parameters

Our article

is now in print.

Wisent release 0.4

Today I completed version 0.4 of the Wisent parser generator for Python. The program is now efficient enough to deal with grammars of the size of the C grammar.

I spent some effort on generating useful error messages for grammar conflicts. For example the usual "if-else conflict" in the C grammar is reported as follows:

cg.wi:164:61: shift-reduce conflict: the input
cg.wi:164:61:     IDENTIFIER { if ( IDENTIFIER ) ;.else ...
cg.wi:164:61:   can be shifted using the production rule
cg.wi:164:61:     selection_statement: if ( expression ) statement.else statement;
cg.wi:164:25:   or can be reduced to
cg.wi:164:25:     IDENTIFIER { selection_statement.else ...
cg.wi:164:25:   using the production rule
cg.wi:164:25:     selection_statement: if ( expression ) statement;
cg.wi: 1 conflict, aborting ...

Now you can even tell Wisent to ignore this conflict by placing an ! before the offending else.

Efficient LR(1) Parsers

While trying to add the capability to create LALR(1) parsers to Wisent, I came across the following paper:

This article is nicely written and I hope that using his algorithm I can teach Wisent to create efficient LR(1) parsers, thus removing the need for LALR(1) support.

Wisent: A Parser Generator for Python

Today I published the first version of a parser generator for Python. Currently it only generates LR(1) parsers, i.e. it is not very useful for big grammars. I hope to add support for LALR(1) parsers in the future.

One problem is the choice of a project name: I came up with Wisent since this fits together well with Bison. But unfortunately I am not the first one to think so, there are at least two other parser generators which also use the name Wisent, and most likely I will rename my program. Suggestions for a better name, preferably with Monty Python connotations, are very welcome.

gravity powered lamp (updated)

There is a lamp powered by gravity. The energy is created by a weight slowly sliding down some rail and you recharge the lamp by moving the weight up again. The motion is then converted to electricity and used to power the LEDs which create the light. I want one of these lamps!!!

Update. A quick calculation shows that this lamp cannot possibly work: the best possible efficiency for generating light is approximately 683 lumen per watt (real efficiencies are much lower), thus to get the advertised 600-800 lumens … over a period of 4 hours one would need more than 12650 joule. Since lifting a weight of 1 kg by 1 meter only provides 9.81 joule, the lamp, even assuming 100% efficiency, will never be able to generate enough electricity to power the LEDs.

Understanding Art for Geeks

Understanding Art for Geeks is a very nice picture collection on Flickr.

ancient Egyptian hippo

During the Christmas holidays I carved an ancient Egytian hippo from soapstone. It turned out to be quite a happy animal. The photo shows the almost finished statuette, only some polishing is left to do.

My photo page has a picture of a very early stage.

fast-dm release 29

We prepared another fast-dm release over the Chrismas holidays: release 29 fixes a bug where for fixed z large values of sz were not correctly recognised.

fast-dm bug fix release

The previous fast-dm version (release 27) introduced an unfortunate bug. We released a fixed version today. As a nice side effect the program is now a bit faster for some parameter constellations (big st0 and high precision). If you are using an older version of fast-dm, you should upgrade to the new release 28.

Government Sanctioned Kidnapping

According to an article in the Times it seems that the United States' government believes it to be legal to kidnap anybody who is suspected of a crime by them. I find this scary!

New Fast-dm Release

Andreas and I finished release 27 of our program fast-dm for parameter estimation in Ratcliff's diffusion model (in Psychology). You can download the new release from my fast-dm web page.

coma terror

I had missed this until now: In Leeds a man created a terrorist threat by falling into coma on a bus. Most exciting!

Found on a Blackboard

If M's a complete metric space
(and non-empty), it's always the case:
    If f's a contraction
    Then, under its action,
Exactly one point stays in place.

Sampling Conditioned Diffusions

Today I completed another article, which will eventually appear in the conference proceedings for Heinrich v. Weizsäcker's birthday conference:

This is a (hopefully) very nice and readable summary of our project to use SPDEs to construct MCMC methods in infinite dimensional spaces.

Flash Cookies

Yesterday I learned of the existence of flash cookies (aka local shared objects) which somehow had evaded my attention until now. Flash cookies seem to be very similar to the usual HTTP cookies, except that they can store more information (up to 100kb by default, instead of 4kb for HTTP cookies).

Flash cookies are independent of the browser's normal cookie settings and are not blocked by by tools like CookieSafe. It seems that the only way to view flash cookies is by visiting Adobe's Setting Manager web page.

Spam statistics

Today I spent some time looking through my spam folder. Since beginning of this year I received 47587 spam messages but only 38281 ham (i.e. good) messages. This means that 55% of my mails are now spam.

Not surprisingly, the spam messages came from a much wider set of hosts than the spam messages: 43673 hosts sent only spam, 3034 hosts sent only ham, and 145 hosts sent both types of messages. This effect can also be seen in my old map of the internet.

The worst offender is the host with the IP address It sent 49 spam emails to me this year, the first one on 2007-02-23 (Bitte Geld abholen !!), the last one until now on 2007-11-02 (Bitte zurück Rufen). It is sad that this spammer manages to use the same address for 10 month in a row for an, in my eyes, criminal activity without being stopped.

What every Programmer should know about memory

I started reading Ulrich Drepper's excellent series What every Programmer should know about memory. It is published by Linux Weekly News in weekly installments. Until now, the following parts appeared:

The author shows, for example, how the naive matrix-matrix multiplication

for (i = 0; i < N; ++i)
  for (j = 0; j < N; ++j)
    for (k = 0; k < N; ++k)
      res[i][j] += A[i][k] * B[k][j];

can be improved to run 10 times faster for big matrices!

Random walk bridges

Recently I learned how to compute the distribution of the number N of visits in zero of a one-dimensional, symmetric, simple random walk (Xn), conditioned on X2n=0. There are two ways to compute this.

Plan A. Using the nice equality

P(X1≠0, …, X2n≠0) = P(X2n=0)

which I learned long ago as a student, we can see that there is a bijection between all bridge paths and all paths which don't hit zero. (The equality is proved by constructing a bijection between the complements of these sets, using the reflection principle.) We can apply this to bridge paths with at least k zeros, mapping the path segment starting at the kth zero to a path which does not hit zero. This results in a bijection between all bridge paths having at least k zeros and all paths having exactly k zeros. If you happen to know this number, you are done. Otherwise you can use

Plan B. Consider the following map: for each bridge path with at least k zeros, flip the last k excusions so that they are all positive and then remove the last step of each of these excursions. This procedure defines a map from paths of length 2n with at least k zeros which end at zero to paths of length 2n-k (we removed k steps) which end at level k (all removed steps where downwards). It transpires that this map is surjective and every image has exactly 2k pre-images. Thus the number of bridge paths with at least k zeros is

/ 2n-k \ 2k | | \ k /

and dividing by the total number of bridge paths gives the required probability:

P(Nk | X2n=0) / 2n-k \ 2k | | \ k / = --------------- . / 2n \ | | \ n /

Shiny New Shelf

[kitchen shelf during construction]

For my birthday I built myself a new kitchen shelf, starting with timber from homebase. At the back it has specially crafted gaps so that the wires and tubes from my heating can go through. Say hello to my new shelf!

New Article!!!

Another paper I wrote together with my brother Andreas finally has been accepted:

This paper describes the mathematical foundations which underly our fast-dm program.

Older entries can be found on the next page

Copyright © 2007, Jochen Voss. All content on this website (including text, pictures, and any other original works), unless otherwise noted, is licensed under a Creative Commons Attribution-Share Alike 3.0 License.