Harald Kirsch

genug Unfug.


Monq.jfa faster again

The implementation of a deterministic finite automaton (DFA) has one central component: transition tables. Each node of the DFA contains one. It maps the incoming character to an outgoing edge, or rather to another node of the DFA, namely the target node of the transition by this character. The fastest implementation is an array indexed by the incoming character. For a character ch and a transition table trans this is just one array access:


Many transition tables are, however, extremely sparse with sometimes only a handful characters leading out of a node. In this case, using the full array of $2^{16}$ character values is an extreme waste of main memory. From the beginning monq optimized these array based transition tables such they only contain entries for the range of characters between the first and the last character used. Retrieving the transition then looks like


which is only minimally slower than storing the full array. But when only a few characters between the first and the last used character indeed have a transition, this still is a big waste of main memory. Therefore there was always a second implementation for the transition table, that only stores the available transitions and searches for the selected one with a binary search.

For every node, the decision between the two implementations was based on an estimated memory footprint. And the smaller one was always chosen.

In version 1.4 of monq I introduced a parameter that allows to trade memory usage for speed. If it is set high enough, no transition table will use a binary search. And the good news: my performance tests show that for the use case of searching a number of words in a long text, a speedup between 20% and 40% is possible.

(all times in seconds)
no. words 3 4 11 17 26 35
$\Delta t$ regex 9 12 28 42 68 82
$\Delta t$ monq 40 47 49 54 55 56
$\Delta t$ monq with StringBuilder 24 24 26 27 29 30
$\Delta t$ monq trading memory for speed 21 22 23 25 25 25


Monq.jfa even faster with StringBuilder

As mentioned in the previous article, my revived DFA class library for Java dates back from the time of Java 1.4. It always made heavy use of StringBuffer. Now I found the time to complete replace all instances of StringBuffer with StringBuilder and rerun the performance test reported in the previous article. Here is the table again with the new data:

(all times in seconds)
no. words 3 4 11 17 26 35
$\Delta t$ regex 9 12 28 42 68 82
$\Delta t$ monq 40 47 49 54 55 56
$\Delta t$ monq with StringBuilder 24 24 26 27 29 30

Now the break even, where monq takes over java.util.regex, is already at around 11 words. But I am looking already at some profiling results to see if this can be improved even more.


Monq.jfa performance comparison with java.util.regex

Recently I found the time to revive the Java package for deterministic finite automata (DFA), that I wrote many years ago at the EBI: Monq.jfa.

The software dates from times when Java was version 1.4 and generics where not yet available, but I noticed that it is quite complete with regard to functionality, and the test coverage shows that most of it is exercised in tests.

My biggest surprise though came from a performance comparison with java.util.regex. While java.util.regex uses "only" nondeterministic finite automata (NFA) and should therefore naturally be slower than a DFA implementation, I expected that implementation tricks and algorithmic developments in the last 10 years had improved its speed such that my DFA implementation provides no real gain anymore. But, as is often case: it depends.

I downloaded Wikicorpus 1.0, and combined 10 of the files into one file of around 210 MB with 35 million words, according to wc -w. The task was to count the appearance of a number of words. The gist of the implementation with java.util.regex is as follows:

Matcher matcher = Pattern.compile("word1|word2|... ").matcher(...);
while (/*input available*/) {
  if (matcher.find(start)) {
    Counter cnt = result.get(matcher.group());
  /* advance start */

The class Counter is merely a wrapper around an int with the increment() method. For brevity I leave out the details of buffering and looping over the input which is read in chunks.

For monq.jfa the central code looks like

  CounterAction cntAction = new CounterAction();
  Nfa nfa = new Nfa().or("word1").or("word2")...;
  Dfa dfa = nfa.compile(...);
  new DfaRun(dfa).setIn(...).filter();

The CounterAction is a callback which is called inside the filter() method for each matche found in order to count it.

I tested with an increasing number of randomly picked words, where the was one of them with a total count of 2.2 million times and the most frequent. The table below shows running times with an increasing number of words.

(all times in seconds)
no. words 3 4 11 17 26 35
$\Delta t$ regex 9 12 28 42 68 82
$\Delta t$ monq 40 47 49 54 55 56

When I started the measurements with three words, I was disappointed to see that all the lines of code seem to be useles: java.util.regex is 4 times faster. But then I remembered why monq.jfa was implemented in the first place: to filter text for whole dictionaries of strings or even regular expressions. Therefore I started adding words and, hey, the table shows the trend. As soon as your dictionary exceeds around 20 words, the DFA shows its strength.

In fact, starting from 17 words and onwards, the DFA implementation exhibits nearly the theoretical limit behavior, whereby the running time depends on the input length only, not on the complexity of the regular expression. Given that at the EBI we were using dictionaries with several thousand words, it seems monq.jfa was just the right thing to use. ☺


Linux Mint with Cinnamon

My old Laptop, a Lenovo T500, served me well for many years now. And it could have served me further, if I had not bought an Android 10" tab 2 years ago. Uuuuhu, how can this be related?

Well, the tab was intended for the casual surfing on the web and sometimes to waste some time with a small game. Yet, when surfing the web, I like to give feedback, and not just by clicking a button with a thumb on. Rather I tend to write, and write lot. But this is a complete pain with the on-screen keyboard on a pad. And not only the writing. I came to he conclusion that the general idea of touching with your fingers on the screen you want to read is a silly idea except for the most constraint situations.

So I decided to buy one of these 12.5" laptops, an Asuspro BU201. While it is of course larger than the tab, it is still light and small enough to carry through the house and use on the couch, if necessary --- in particular so compared to the T500 monster brick.

I got it from ixsoft.de with Linux Mint pre-installed with the Cinnamon desktop environment selected. Everthing worked just as it should so far. Compared to Xfce, Cinnamon looks a bit nicer, more sleek, more modern, but keeps all the benefits described over there as compared to Ubunutu Unity or Gnome-3. Fully configurable, for example to ludicrous settings like having F12 close windows and a double click on the title bar move the window to the background. — Very nice.


A mechanistic explanation of Time

$ \def\Vec#1{\mathbf{#1}} \def\vt#1{\Vec{v}_{#1}(t)} \def\v#1{\Vec{v}_{#1}} \def\av{\bar{\Vec{v}}} $

If you search the Internet for an explanation of what time is, there are a lot of sites with a lot of words basically saying that they cannot say anything about time. Here I have a different approach in that I ask you to compare two simple formulas and let you draw your own conclusions.

The speed of light

Given an observer whose local time is $t$ and an observed object passing by with a velocity of $\Vec{v}(t)\in\mathbb{R}^3$, Special Relativity tells us that for the proper time $\tau$ of the object the following holds: \begin{equation}\label{eq:srt} c^2 = |\Vec{v}(t)|^2 + \left(\frac{c\,d\tau(t)}{dt} \right)^2 \end{equation} where $c$ is the speed of light and $|\cdot|$ denotes the absolute value of a vector.

The term $d\tau/dt$ could be called the speed of aging of the object with respect to the observer. If the object is at rest with respect to the observer, i.e. $|\Vec{v}(t)|=0$, then $d\tau/dt = 1$, meaning the observer and the object age with the same speed.

The other extreme case is where the relative velocity approaches the speed of light $c$ and $d\tau/dt\to 0$ meaning that with respect to the observer, the object is not aging anymore.

Average velocity of an ensemble of points

The second formula describes the average velocity of an ensemble of $n$ points. For each point particle $0\leq i<n$, its velocity shall be $\vt{i}$. Further, all absolute values of the velocities shall be identical, namely $|\vt{i}|=c$. The average velocity of the ensemble is then \begin{equation} \av(t) = \frac{1}{n}\sum_{i=0}^{n-1} \vt{i} . \end{equation} For brevity we leave out the dependence on $t$ for now as we compute the difference $c^2-\av^2$. \begin{align} c^2-\av^2 &= c^2 - \frac{1}{n^2} \left(\sum_{i=0}^{n-1} \v{i}\right)^2 \\ &= c^2 - \frac{1}{n^2} \sum_{i,j=0}^{n-1} \v{i}\v{j} \\ \end{align} The last sum is symmetric in $i$ and $j$ and therefore contains each pair $\v{i}\v{j}$ twice, except where $i=j$. The latter, quadratic terms are extracted from the sum to arrive at: \begin{align} c^2-\av^2&= \underbrace{c^2 - \frac{1}{n^2}\sum_{i=0}^{n-1} \v{i}^2}_{(A)} - \frac{1}{n^2}\sum_{i<j} 2\v{i}\v{j} \end{align} Since $\v{i}^2=|\v{i}|^2=c^2$, the terms denoted as $(A)$ in the formula can be written as \begin{align} (A)&= c^2 -1/n^2\cdot n c^2 \\ &= c^2(1-1/n) \\ &= \frac{n-1}{n}c^2 \\ &= \frac{1}{2} n (n-1) \frac{2c^2}{n^2} \end{align} The term $1/2 n(n-1)= 1/2 (n^2-n)$ is the number of elements in the lower right half of a square matrix without the diagonal elements and is therefore the same as $\sum_{i<j} 1$, which we multiply by $\frac{2c^2}{n^2}$ and plug it into $(A)$ to get \begin{align} c^2 -\av^2 &= \sum_{i<j}\frac{2c^2}{n^2} - \frac{1}{n^2}\sum_{i<j} 2\v{i}\v{j}\\ &= \frac{1}{n^2} \sum_{i<j} c^2-2\v{i}\v{j}+c^2 \\ \scriptsize \text{(since $c^2=\v{i}^2$)}\qquad &= \frac{1}{n^2} \sum_{i<j} (\v{i}-\v{j})^2 . \label{eq:final} \end{align} What does this last equation mean? Remember, we are talking about an ensemble of $n$ point particles all moving at the speed of light. Lets envisage the particles trapped in a box with $n$ sufficiently large and their movements sufficiently random, then the velocity of the box, $\Vec{v}_B$ is a close approximation to $\av$, the average over the particle velocities.

Putting it together

Suppose the box is the observed object we talked about in the first section. Then, by comparing equation \eqref{eq:srt} with the last equation \eqref{eq:final}, we get \begin{equation} \left(\frac{c\,d\tau(t)}{dt} \right)^2 = c^2- |\Vec{v}_B|^2 \approx c^2 - \av^2 = \frac{1}{n^2} \sum_{i<j} (\v{i}-\v{j})^2 \end{equation}

  1. On the far left we have the term which describes the flow of time in the box with respect to the observer.
  2. On the far right we have a purely mechanical term, the average delta velocities of the point particles in the box.

Does this make sense?

Here comes the part that I have to leave to the reader, except that I throw in my opinion anyway: it does make a lot of sense to me. Consider the box moving faster and faster, until it reaches the speed of light. What happens to the point particles in the box? Since they move also at the speed of light along with the whole box, the delta velocities $\v{i}-\v{j}$ have to converge to zero. Consequently the relative positions of the points inside the box become constant: the content of the box is kind of frozen, nothing changes anymore. This is the situation where also $c\,d\tau/dt=0$, meaning that proper time comes to a halt.

My conclusion is that (proper) time is nothing but the integrated average of change happening in a closed system. For a (closed) system of point particles, change is simply the delta velocity between points.

Open questions

Of course there are many. My priority one question is whether a similar derivation is possible for a system "filled" with a changing field, such that $d\tau/dt$ turns out to be equal to some measure of change of the field.

$\def\v#1{\mathfrak{#1}} \def\vx{\v{x}} \def\vy{\v{y}} \def\vz{\v{z}} \def\mA#1#2{a_{#1 #2}} \def\ma#1#2{a^{#1}_{#2}} \def\t#1{\tilde #1} \def\tx{\t{x}} \def\ty{\t{y}} \def\d#1{\partial #1} \def\dd#1{\partial_{#1}} \def\pderiv#1{\frac{\partial}{\partial #1}} $


Contravariant, Covariant, Tensor

(IV) Differentiation is Covariant

Already the fourth part about tensors and such that I write down to understand this stuff myself, but others may benefit. For the notation, please read the first and second part.

Before I start, I would like to introduce a notation which I see seldom used, but which I find quite helpful. For a function $f:(x^1,\dots,x^n)\to K$ lets denote the derivative with regard to the $k$-th parameter as $\dd{k}f$. Normally this is denoted with $\d{f}/\d{x_k}$, but this gets confusing when we have something like $\d{f(2r,4s,7t)}/\d{x_2}$, where the arguments do not contain $x_2$. From the index of $x_2$ we can see that the partial derivative with regard to the second parameter is meant, but, well, I don't like it that way. Instead I will write $\dd{2}f(2r,4s,7t)$. To be more general, I define \begin{equation} \dd{k}f(x^1,\dots,x^n) := \pderiv{x_k}f(x^1,\dots,x^n) \end{equation} to denote the derivative with regard to the $k$-th argument, without applying the inner derivative. If the $k$-th argument is not just $x_k$, but some function of other parameters, we then write \begin{align*} \frac{d}{dt}f(\dots,g_k(t),\dots) &= \sum_{k=1}^n \dd{k}f(\dots,g_k(t),\dots)\cdot \frac{d}{dt}g_k(t) \end{align*}

That said, lets look at a function defined like $f$ above on the coordinates $x^i$ or $y^j$ of a vector $\vz=x^i\vx_i=y^j\vy_j$, using Einstein Notation again for summing over a pair of upper and lower indexes, where the $\vx_i$ and $\vy_j$ are basis vectors of two different bases $\v{X}$ and $\v{Y}$ of our vector space $V$. Since the function is defined on just the coordinates of a vector, the value of the function is typically not unique for a given vector, but differs with the basis used, i.e. except for special cases we have $$f(x^1,\dots,x^n)\neq f(y^1,\dots,y^n).$$ But lets look at the derivatives of $f$ with regard to the coordinates and remember that $x^i = \ma{i}{j} y^j$ for the basis change matrix $\ma{i}{j}$. \begin{align*} \pderiv{y_j}f(x^1,\dots,x^n) &= \pderiv{y_j} f(\ma{1}{k}y^k,\dots,\ma{n}{k}y^k) \\ &= \sum_{i=1}^n \dd{i}f(\ma{1}{k}y^k,\dots,\ma{n}{k}y^k) \cdot \pderiv{y_j} \ma{i}{k}y^k \\ &= \sum_{i=1}^n \dd{i}f(\ma{1}{k}y^k,\dots,\ma{n}{k}y^k) \cdot \ma{i}{j} \\ &= \sum_{i=1}^n \dd{i}f(x^1,\dots,x^n) \cdot \ma{i}{j} \\ &= \sum_{i=1}^n \ma{i}{j}\pderiv{x_i} f(x^1,\dots,x^n) \end{align*} Using Einstein Notation, this is $$ \pderiv{y_i}f(x^1,\dots,x^n) = \ma{i}{j}\pderiv{x_i} f(x^1,\dots,x^n) $$ where I very conciously do not remove the function arguments to let the formula look neat. Otherwise one could be tempted to think that this covariant transformation not only converts $\partial/\d{x_i}$ into $\partial/\d{y_j}$, but also magically replaces the $x^i$ with $y^j$ in the argument listl, which it does not.

The result shows that the differential operators $\partial/\d{x_i}$ form a basis dependent $n$-tupel which transforms covariantly. Hence it is correct that the index is a subscript.

$\def\v#1{\mathfrak{#1}} \def\vx{\v{x}} \def\vy{\v{y}} \def\vz{\v{z}} \def\mA#1#2{a_{#1 #2}} \def\ma#1#2{a^{#1}_{#2}} \def\t#1{\tilde #1} \def\tx{\t{x}} \def\ty{\t{y}} \def\d#1{\partial #1} \def\dd#1{\partial_{#1}} \def\pderiv#1{\frac{\partial}{\partial #1}} $


Contravariant, Covariant, Tensor

(III) Index Notation

If you were reading the previous two parts of this series in the hope to see indexes hop up and down between subscript and superscript, you may be disappointed. But don't dispair. Now that we understand that there exist two different types of basis-dependent $n$-tupels, it is time to talk about superscript indexes to distinguish contravariant tupels from covariant ones.

For the notation please refer to the previous blog post. To summarize, we have seen three transformations between vector space bases, \begin{equation*} \vy_j = \sum_{i=1}^n \mA{i}{j}\vx_i, \qquad x_i = \sum_{j=1}^n \mA{i}{j} y_j, \qquad \ty_j = \sum_{i=1}^n \mA{i}{j} \tx_i \end{equation*} where

  • the first is the basis vector transformation serving as a reference for the direction of the transformations,
  • the second is the transformation of coordinates of some vector $\vz\in V$, called contravariant because it goes in the opposite direction of the first, and
  • the third is the transformation of the values $\tx_i:=f(\vx_i)$ and $\ty_j:=f(\vy_j)$ of a linear form $f:V\to K$, called covariant because it goes into the same direction as the reference.

So there are basis-dependent $n$-tupels which are covariant and others which are contravariant. The simple idea is to distinguish them by raising the index for contravariant tuples to a a superscript. According to this rule, we must raise the index of coordinate values and from now on write them as $x^i$ and $y^i$ such that a vector $\vz$ now is written as \begin{equation} \vz = \sum_{i=1}^n x^i\vx_i = \sum_{j=1}^n y^j\vy_j . \label{eq:z} \end{equation}

And that was all? Not quite! Remember that for a fixed $j$ the $\mA{i}{j}$ are actually the coordinates of $\vy_j$ with regard to basis $\v{X}$. This means we must write, from now on $\ma{i}{j}$. Our transformations are then \begin{equation*} \vy_j = \sum_{i=1}^n \ma{i}{j}\vx_i, \qquad x^i = \sum_{j=1}^n \ma{i}{j} y^j, \qquad \ty_j= \sum_{i=1}^n \ma{i}{j} \tx_i \end{equation*} Strikingly, these three sums as well as the one in \eqref{eq:z} always zip up an upper with a lower index. And if this is the case then, as propsed by Einstein, the summation sign shall be left out. Having matching pairs of upper and lower index is enough to let us know that there is a sum over this index. In this Einstein notation, our three transformations can now be written as \begin{equation*} \vy_j = \ma{i}{j}\vx_i, \qquad x^i = \ma{i}{j} y^j, \qquad \ty_j= \ma{i}{j} \tx_i. \end{equation*} Very concise. This reminds me of programming languages, where some, like Java, are more verbose than others, like Scala or Perl. The tradeoff is that the less verbose a notation or language is, the more you need to know by heart. For the expert, verbosity is less efficient, while the beginner or even someone who has not used the notation for some time, may easily get lost.

But since the Einstein notation is so common I will use it too in the parts to come.

$\def\v#1{\mathfrak{#1}} \def\vx{\v{x}} \def\vy{\v{y}} \def\vz{\v{z}} \def\mA#1#2{a_{#1 #2}} \def\ma#1#2{a^{#1}_{#2}} \def\t#1{\tilde #1} \def\tx{\t{x}} \def\ty{\t{y}} \def\d#1{\partial #1} \def\dd#1{\partial_{#1}} \def\pderiv#1{\frac{\partial}{\partial #1}} $


Contravariant, Covariant, Tensor

(II) Covariance

After I understood where the term contravariant comes from, I am now ready to explain covariant. As before we have a vector space $V$ over a field $K$ with two bases \begin{align*} \v{X} &= (\vx_1,\dots,\vx_n), \qquad \vx_i\in V, \\ \v{Y} &= (\vy_1,\dots,\vy_n), \qquad \vy_j\in V \end{align*} and a set of $\mA{i}{j}\in K$ that transform $\v{X}$ into $\v{Y}$ according to \begin{equation} \vy_j= \sum_{i=1}^n \mA{i}{j}\vx_i . \label{eq:vy} \end{equation} Further we look at a linear form $f:V\to K$, i.e. a function from $V$ into $K$ that assigns an element $f(\vz)$ to each $\vz\in V$ and is linear. In particular $f$ provides us with two $n$-tupels $\tx_i:=f(\vx_i) \in K$ and $\ty_j:=f(\vy_j) \in K$, one for each of the bases.

This reminds of the coordinates of a vector $\vz\in V$ which are also $n$-tupels of values depending on the selected basis, and we can ask whether and how we transform the $\tx_i$ into the $\ty_j$. But this is not difficult: \begin{align*} \ty_j &= f(\vy_j) \\ &= f\left(\sum_{i=1}^n \mA{i}{j}\vx_i\right) & &&\text{by \eqref{eq:vy}} \\ &= \sum_{i=1}^n \mA{i}{j} f(\vx_i) \\ &= \sum_{i=1}^n \mA{i}{j} \tx_i. \end{align*} We see that the $\mA{i}{j}$ transform basis vectors $\vx_i$ into $\vy_j$ (see \eqref{eq:vy}) as well as the coefficients $\tx_i$ into $\ty_j$, hence these coefficients of a linear form $f$ transform in the same direction as the bases and are therefore covariant.

To summarize, the $\mA{i}{j}$ perform for us the following transformations:

  1. basis vector $\vx_i \longrightarrow \vy_j$ (reference)
  2. vector coordinates $y_j \longrightarrow x_i$ (contravariant, opposite direction of reference)
  3. linear form coefficients $\tx_i \longrightarrow \ty_j$ (covariant, same direction of reference)

If you were reading hoping to see indexes hop up and down between subscript and superscript, you may be disappointed, but don't dispair. I think that only now that we clearly understand that there exist two different types of basis-dependent $n$-tupels, it is time to talk about superscript indexes to differentiate contravariant tupels from covariant ones.

And the rules are simple. The components of a basis-dependent $n$-tupel have their index

  1. as a subscript, like $\tx_i$ and the basis vectors $\vx_i$ itself, if the $n$-tupel transforms covariant, and
  2. as a superscript like $x^i$, if the $n$-tupel is contravariant.
$\def\v#1{\mathfrak{#1}} \def\vx{\v{x}} \def\vy{\v{y}} \def\vz{\v{z}} \def\mA#1#2{a_{#1 #2}} \def\ma#1#2{a^{#1}_{#2}} \def\t#1{\tilde #1} \def\tx{\t{x}} \def\ty{\t{y}} \def\d#1{\partial #1} \def\dd#1{\partial_{#1}} \def\pderiv#1{\frac{\partial}{\partial #1}} $


Contravariant, Covariant, Tensor

(I) Contravariance

For some time now I struggled to understand what covariant and contravariant mean in the context of vectors and tensors. By writing this series of articles, I primarily explain the concepts to myself, but other may like it too.

I started reading Raum, Zeit, Materie from Hermann Weyl, a classic that explains these concepts really good. Well, at least after having read in other books and on Wikipedia about the topic, Weyl's book seems to have helped to overcome the last hurdle to understanding. Much more so anyway than statements like "a tensor is something that transforms like a tensor" in an otherwise nice book.

Let me describe what I learned. In the discussion of a vector space $V$ over a field $K$ the terms covariant and contravariant become relevant in particular when more than one basis comes into play. Something varies (changes) either along with the change from one basis to the other — "co-", or against it — "contra-".

Let the $n$ dimensional vector space $V$ have a basis $$ \v{X} = (\vx_1,\dots,\vx_n) , $$ which in particular means that an arbitrary element $\vz\in V$ has a unique representation with regards to $\v{X}$ as \begin{equation} \vz = x_1\vx_1 +\dots+ x_n\vx_n = \sum_{i=1}^n x_i\vx_i \,, \qquad x_i\in K . \label{eq:zFromX} \end{equation} Here is a point that can easily lead to confusion, since the $n$-tuple $(x_1,\dots,x_n)$ might be called a "vector" in other contexts. But $\vz$, as an element of the vector space $V$, is a vector, while $(x_1,\dots,x_n)$ are just the coordinates of $\vz$ with respect to $\v{X}$.

We should look at the vectors in $V$ as opaque items for which we do not know any inner structure. They are no numbers, or tuples of numbers or anything we can manipulate directly. All we know about them is that we can add them and multiply them with a value from $K$ to get another one of them. The coordinates are merely a handle for the vector, a view, a representation but with a kink: they only make sense if we know the respective basis. Without reference to the basis, $(x_1,\dots,x_n)$ are not coordinates, but just a meaningless bunch of numbers.

We see how arbitrary the coordinates are as soon as we introduce another basis $$\v{Y} = (\vy_1,\dots,\vy_n),$$ different from $\v{X}$. Now the same $\vz$ has different coordinates $(y_1,\dots,y_n)$ such that \begin{equation} \vz = y_1\vy_1+\dots +y_n\vy_n = \sum_{j=1}^n y_j\vy_j . \label{eq:zFromY} \end{equation} Luckily, the two sets of coordinates are not completely arbitrary but have a relation to each other, dictated by the relation between the two bases, as can be derived as follows.

In the same way as $\vz$ is a weighted sum of basis vectors, each $\vy_j$ of the second basis is a weighted zum of the basis vectors $\vx_i$: \begin{equation} \vy_j = \mA{1}{j}\vx_1+\dots+\mA{n}{j}\vx_n = \sum_{i=1}^n \mA{i}{j}\vx_i \qquad \mA{i}{j}\in K, \, \forall j\in\{1,\dots,n\}. \label{eq:switchbase} \end{equation} So for a given $j$, the $\mA{i}{j}$ are the coordinates of $\vy_j$ with respect to basis $\v{X}$. Now we can ask how $\vz$ looks in basis $\v{Y}$ when we build each $\vy$ from the $\vx$: \begin{align} \vz &= \sum_{j=i}^n y_j\vy_j \\ &= \sum_{j=1}^n y_j\left( \sum_{i=1}^n \mA{i}{j} \vx_i\right)\\ &= \sum_{i=1}^n \left(\sum_{j=1}^n \mA{i}{j} y_j\right) \vx_i \end{align} By matching the term in parentheses with equation \eqref{eq:zFromX}, and invoking the uniqueness of coordinates given a basis, we get \begin{equation} x_i = \sum_{j=1}^n \mA{i}{j} y_j. \end{equation} Comparing this to equation \eqref{eq:switchbase}, $$ \vy_j= \sum_{i=1}^n \mA{i}{j}\vx_i , $$ we see that on the one hand the $\mA{i}{j}$ transform the basis vectors from $\v{X}$ to $\v{Y}$, while on the other hand they transform the coordinates in the opposite direction, from $\v{Y}$ coordinates to $\v{X}$ coordinates.

This is how the term contravariant comes about: the coordinates transform contravariant to the bases. And typically one just says that the coordinates are (or transform) contravariant.

The next part will show that there are also $n$-tupels that are transformed by the $\mA{i}{j}$ in the same direction as the basis and are hence called covariant.

My recent experiment is an HTML app showing Open Street Map maps. It is particularly targeted on mobile devices with Javascript support for geolocation.

Here is the map.