Thursday, June 25, 2015

The percent of valid formulae in random strings

In the solution to the Arithmetic hide and seek problem, I mentioned a practical observation that among all strings composed of 18 characters "0123456789+-*/^k()" roughly 20% are valid formulas, and conjectured that it is true for arbitrarily long strings. This appears to be wrong: the percent gradually converges to 0.

Proof

Let, for some formal language, \(n_k\) is the number of words of length k. Then Z-transform of the sequence \(n_0,n_1,...\) is called its growth function \(f(z)\). (Apparently, it is common to use non-standard Z-transform by positive powers of z for growth function; I instead use the usual negative transform). For a regular language, that is set of strings, accepted by some finite automaton, growth functions are always rational and can be calculated by a simple formula:
$$f(z) = e(zI - A)^{-1}s,$$ where A is transition matrix, showing number of ways to get from state i to state j; e is row matrix of accepting states (1 for accepted 0 otherwise); and s is column matrix of initial states.

Unfortunately, formulas are not a regular language, because accepting automaton must have infinite number of states to balance braces. So instead of the exact count, I would estimate it by upper and lower bounds.

Lower bound

Lower bound is given by the automaton, that accepts valid formulas without braces.

Automaton, accepting formulas without braces. Blue are accepting states.

To better match grammar in my solution, it disallows leading zeros in integers. States are named by the last accepted character; blue color indicates accepting states. State "OP" is achieved after consuming a binary operator. If 5 states are ordered as [OP,0,1-9,0-9,K] then the transition matrix is:
$$A = \left[\begin{matrix} 1 & 5 & 5 & 5 & 5\\ 1 & 0 & 0 & 0 & 0 \\ 9 & 0 & 0 & 0 & 0\\ 0 & 0 & 10 & 10 & 0\\ 1 & 0 & 0 & 0 & 0 \end{matrix}\right]$$ which, with \(e=(0,1,1,1,1)\) and \(s^T=(1,0,0,0,0)\) gives growth function:
$$f_{lower}(z) = \frac{11z^2-20z}{z^3-11z^2-45z+100}.$$ After inverse transform, this gives an expression for number of valid strings of length k:
$$n_{lower}(k) = 0.623\cdot(-4.399)^k+0.02467\cdot{1.654}^k+0.598\cdot{13.745}^k. $$

Upper bound

To get upper bound, consider an automaton, accepting strings with unbalanced braces, only requiring that operator is not following an opening brace, and number not following a closing brace. Such strings are clearly a superset of all valid formulas.

Automaton, accepting formulas with unbalanced braces. Blue are accepting states, "op" is one of 5 binary operators "+-*/^".

Its transition matrix, for state ordering (OP, 0, 1-9, 0-9, ")", K) is:
$$A = \left[ \begin{matrix} 2 & 5 & 5 & 5 & 5 & 5\\ 1 & 0 & 0 & 0 & 0 & 0\\ 9 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 10 & 10 & 0 & 0\\ 0 & 1 & 1 & 1 & 1 & 1\\ 1 & 0 & 0 & 0 & 0 & 0 \end{matrix} \right]$$ which gives growth function:
$$f_{upper}(z) = \frac{11z-20}{z^3-13z^2-23z+80}$$ and the number of accepted strings is:
$$n_{upper}(k) = -0.6339*(-3.06)^k-0.003648 \cdot (1.8383)^k+0.6375 \cdot {14.2217}^k $$

Conclusion

Thus, the number of correct formulas grow not slower than \({13.745}^k\) and not faster than \({14.2217}^k\) where k is string length. Since total number of strings is \(18^k\), the percent of valid formulas is gradually decreasing with rate between \({0.7636}^k\) and \({0.7901}^k\). Moreover, \(k \approx \log_{18}t\), where t is current string index in shortlex ordering. Using this, asymptotic percentage bounds can be written as: \({0.7636}^{log_{18}t} = t^{-0.0933}\) and \(t^{-0.0815}\). Slow decrease rate explains why it was not apparent in the numeric test results.

Despite that my initial conjecture on percentage of valid formulas failed, the conclusion on growth rate is nevertheless correct, since number of formulas grows exponentially with length.

Note on Z-transform

Articles I have seen define growth function by means of non-standard Z-transform by positive powers of z:

$$f(z) = \sum_{i=0}^\infty a_i z^i.$$

Wikipedia calls it "Geophysical definition". I here took a regular definition applied in control systems and signal processing, since it allows me to use standard computer algebra tools. Its another advantage is that bases of the exponentials in expression for string count are just negated roots of the denominator polynomial. Traditional growth functions may be obtained by substituting \(z'=1/z\).

Sunday, June 14, 2015

Crystal growing: Mohr's salt

Some fresh crystals of the common laboratory preparate of iron (II): Mohr's salt, or ammonium iron sulfate hexahydrate (NH4)2Fe(SO4)2·6H2O. (Whole gallery)

Some crystals of Mohr's salt. Cell size is 5mm.

Among other iron (II) compounds it is notable for its stability when exposed to air. Most other salts are easily oxidized by air, but Mohr's salt can be stored for years without notable changes (by some reports). This (together with non-toxicity and availability) makes it a good material for hobby crystal growing.

Obtaining

Surely, the easiest way to get Mohr's salt is to buy it in the chemicals store. It is the most boring way too.

I prepared my salt from more common compounds:

  • Iron (II) sulfate Fe2SO4 from the gardener's store,
  • Sulfuric acid H2SO4, used as electrolyte for acid accumulators,
  • And ammonia NH3 from the drug store, in a 10% aqueous solution.

First step was to prepare ammonia sulfate from the NH3 solution and sulfuric acid, by equation:
$$ 2NH_3 + H_2SO_4 \rightarrow (NH_4)_2SO4 $$
The reaction is very exothermic, and must be done carefully. With 10% NH3 and 43% H2SO4, temperature reached 80-90°. Moreover, NH3 is evaporating quickly from the hot solution, while the reaction is not finished. So wear goggle, do it outside and ensure that the reaction vessel will not crack of sudden temperature change. I added acid with a small excess.

Then I added Fe2SO4·7H2O to the hot solution. After stirring, everything dissolved, giving brown solution (pure compound is green, brown color is caused by Fe(III) contamination). After evaporating it partially and cooling, light-green crystals of Mohr's salt precipitated.
$$(NH_4)_2SO_4 + FeSO_4 \cdot 7H_2O \rightarrow (NH_4)_2Fe(SO_4)_2 \cdot 6H_2O \downarrow + H_2O $$
To grow bigger crystals, I used them for traditional growing procedure with a glass and a thread.

Three biggest crystals

Growing

The compound easily forms large, pale-green crystals. I used slow evaporation technique, with a seed crystal suspended on a thin nylon thread. Crystalline Mohr's salt has good resistance to oxidation by air, but its solution is more prone to it. Initially light green solution gradually becomes brown in a couple of days. To reduce oxidation, I added some excess of the sulfuric acid to it. There are also reports, that additional acid improves crystal clarity too.

Crystals are monoclinic, their form depends on impurities and additions. When solution is fresh, flat tabular crystals grow.

Crystals grown from the fresh solution has flat tabular form.

From the older solution, contaminated with Fe3+ ions, more elongated crystals are growing.

Crystal grown from the older solution is elongated.

Despite all my efforts, I failed to grow crystals with even perfect faces. In all cases, faces had irregularities crystals were not entirely transparent. I tried to add glycerol (which gives good results with copper sulfate), which caused flattening of crystals and slightly improved transparency.

Safety notes

The compound itself is generally safe and low toxic, its pure form is used as iron supplement for humans. Acidic solution is corrosive and irritant. The preparation procedure I used, involving hot acid and ammonia could be dangerous. Also, spilled solution leaves permanent stains on everything, handle it with caution.

Sunday, June 7, 2015

A rainbow on the asphalt

It was a sunny summer day. I was walking to the bus stop when a rainbow, not in the clean sky, but on the gray asphalt of the road. I stopped to look better, and it almost disappeared. It was best visible in motion.

That's what I saw when I stopped. But where is the rainbow…

After a closer examination, it appeared that road workers were updating road surface marking and spilled some glass spheres, used to add retro-reflectivity to the marking. White sugar-like thing on the photo above is made of these small glass spheres. And just like water drops create rainbow in the sky, glass drops made rainbow on the road.

I wanted to make a better photo, but big contrast between the white glass "sand" and dark asphalt made rainbow hardly visible on a still photos. So I decided to try to average-out the noise, using the fact that rainbow observed position does not move when observer moves.

All frames taken, camera position is slightly different on every frame, but the rainbow is on the same place.

It's a common way to deal with noise in measurements: make many measurements and then average them. Since noise is random, it will become become weaker, as the number of measurements grows. (Noise amplitude is inverse proportional to the square root of the measurements number).

So I made 15 photos, moving a little between frames, and averaged them.

All 15 pictures averaged with Gimp.

Here it is! There is something unsettling in this picture… I intentionally made all photos in the same pose to use the shadow for aligning images, and on the merged photo black shadow makes strange view on the blurred road.

Color contrast exaggerated to make rainbow more prominent.
Finally, some circular blur added to reduce noise and show the rainbow clearer.

Monday, June 1, 2015

Arithmetic hide-and-seek: solution

Here is my solution for the problem of arithmetic hide and seek. Please read it if you prefer to solve the problem by yourself.

Many users on G+ (comments thread) and in the comments has correctly solved it and additional questions. The first ones were Suhail Sherif, Brent Werness and Heffalump Horton. Hope that the problem entertained you.

Solution

First note, that if Angel knows the formula, then he can win instantly, whatever the current turn number \(k\) is. To do this, just substitute \(k\) to the formula and obtain exactly, where Demon hides. Indeed, Angel does not know the formula. However, it is not a problem, since the set of possible formulas is countably infinite. This means that it is possible to build an infinite sequence of formulas that contain every possible formulas (moreover, this can be done explicitly, see below). Angel can use this sequence to check formulas one-by-one, on each turn substituting current turn number to \(k\) and checking for Demon. Sooner or later, Angel will try the formula, Demon has conceived and win the game (though he can win earlier, by pure luck). This gives the winning strategy.

Explicit solution

The above solution could be enough for a mathematician, but as a programmer, I wanted to write down the explicit algorithm. To do this, first the sequence of all formulas should be built. It can be done in a multitude of ways, some of them are more computationally effective (faster and contain less duplicates) than others. For the proof-of-concept code, I used the simplest approach.

Using the usual linear notation, every allowed formula can be written as a string, composed of characters “0123456789+-*/^()k” (where ^ stands for powering). It is easy to enumerate all possible strings. A simple way is shortlex order: first all strings of 1 character, then of 2 characters in alphabetic order and so on. By excluding non-valid expressions, one gets a sequence of all possible formulas, from the shortest ones to the longest.

I implemented this enumeration algorithm in Python. To check whether the string is a valid formula it uses a simple parser, written with pyparsing library. To reduce numbers of duplicates (equivalent formulas), it disallows numbers starting with 0, except for the 0 itself. The implementation is not breathtakingly fast, but fast enough to enumerate expressions up to 5 characters in a reasonable time.
Source code is on Github: dmishin/arithmetic-hide-and-seek.

Here are few first formulas and corresponding search attempts done by Angel.

Turn kFormula fValue f(k)
100
211
322
1099
11k11
121010
1019999
102-00
47629512
4772k3'902'185'687'894'990'289'226'996'537'241'457'882'185'747
1729kk1.40457…×105598

The sequence of attempts behaves erratically, but its amplitude grows very fast. At the step \(k=320282\), the formula \(k^{k^k}\) is reached, which evaluates to tremendous \(320282^{320282^{320282}} \approx 10^{10^{1763323.7111}} \), a number that is so big that to write down even the number of digits in it one needs 1'763'324 digits.

This sequence is shown on the graph below. The graph uses logarithmic vertical scale because of the fast growth rate. Positive values are show in red, and negative in blue. Note that not all actual data points are shown, some values are so large, that even in the logarithmic coordinates the rest of the graph will be too small.

First 10000 search attempts of Angel, using the implementation above.

Growth rate

How fast the amplitude of Angel's search grows? Quite fast, actually. Even the logarithmic plots above do not properly reflect it, since the highest data points are not shown on them.

For the proposed implementation, growth rate is easy to estimate. Quite obviously, when \(k\) is large enough, the biggest possible values have form k^k^…^k. Since the percent of valid formulas remains roughly constant at about 20% (a rather curious fact, actually. Is it true, and what is the limit value? I don't know UPD: Actually, it is not true, but number of formulas nevertheless grows exponentially), the length of the string grows approximately as logarithm of the turn number: \(L ~ \log k\). Therefore, maximum value grows as k^k^k^… (log k times). This can be written as: \(^{\log k}k\) (see tetration) or, with Knuth's up-arrow notation, as \(k\uparrow\uparrow\log k\).

Growth rate, however, depends on the enumeration method chosen. By rearranging formulas, it is always possible to make growth faster.

It is also possible to decrease growth rate. For example, it can be done by “diluting” sequence with growing sequences of zeros. However, there is a lower limit: the growth rate should be strictly faster than of any fixed height power tower: \(k^{k^{k^\ldots}} = k\uparrow\uparrow n\).

Angel's winning strategy, first 50000 steps. Slightly zoomed to show steps with smaller amplitude.

Extensions

Here are few thoughts on possible extensions of the problem, with a result that I was not aware of.

Extension first: more operators

It can be easily seen that the set of allowed operators is rather arbitrary. It can be expanded with arbitrary number of additional operators, functions or constants like \(\pi\).

Extension second: Turing machine

But what if instead of a formula, Demon uses a Turing machine: an ideal computer, capable to implement any algorithm. This change the game.

At first, it may seem that nothing has changed significantly. The set of all programs is countable and enumerable, but there is one obstacle … It can be formulated as a theorem:

Theorem 1: weaker Demon

If a winning strategy for the Angel exists, and Demon's computation method is expressive enough (see below), then Angel's search sequence is not expressible by Demon.
Here expressive enough means that for every expressible formula \(f(k)\), there is another expressible formula \(f_1(k)\), such that \(f_1(k) \ne f(k) \forall k\). For example, \(f_1(k)=f(k)+1\). The rules from the original problem are, indeed, expressible enough in this sense.

The proof is by contradiction:suppose that Angel's sequence (winning strategy) \(A(k)\) can be implemented by Demon. Then he can implement sequence \(A(k)+1\), which is always 1 step before Angel. Then Angel can't win. Contradiction.

Theorem 2: slower Demon

The above theorem is a corollary of another theorem:

Growth rate of Angel's search sequence is strictly faster than growth rate of any sequence that Demon can compute.
This theorem can also be proven by contradiction.

So, back to the Turing machines. Now it is clear that if Demon uses an algorithm instead of formula, then Angel's sequence can't be calculated by an algorithm: it is uncomputable. This does not means that the winning strategy does not exist: we just can't write an algorithm that produces it.

However, if Angel (with some divine help maybe) can obtain a Turing oracle - an entity that can determine, whether any given Turing machine hangs or terminates, then the situation changes. Using the oracle, Angel can win with the following algorithm:

k := 1
for each algorithm A:
    if oracle says that A(k) terminates, then:
        search Demon at point A(k)
        k := k + 1
 

Since oracle can not be implemented algorithmically, Demon can't go a step before angel and can't escape.

Go deeper

So what if we go further, and give oracle to Demon?

Now Angel can't catch him for sure even with the help of an oracle. Since the set of all oracle-enabled algorithms is also enumerable, it essentially proves that Turing oracle is not enough to solve termination problem of an algorithm with oracle!

Thus, winning strategy, while existing in some sense, can not be computer neither by an algorithm nor by an algorithm with oracle. To catch Demon, Angel needs something more powerful: a second order oracle.

This race or arms can be extended indefinitely, defining an infinite hierarchy of oracles.