Wavelets and the Abel Prize

The Abel Prize is a Norwegian mathematics prize, awarded yearly to international mathematicians with extraordinary contributions within mathematics. Each year I try to read a bit about the winner and his work, and at least get an idea of what the winner has contributed to in his field.

This year’s winner is Yves Meyer who is awarded the price for his role in developing the mathematical theory of wavelets, a theory he accidentally came across by the photocopier at the École Polytechnique in Paris [1]. He is in Oslo now, and will receive his prize in an award ceremony tomorrow, May 23.

So what are wavelets and what can they be used for?
I’m certainly not the right person to answer, with my few hours of knowledge on the topic, but I will share what I have discovered so far.
A wavelet is, as the word indicates, a wave-like function, or rather a family of functions, with some important properties that make them useful for signal processing. They are for instance better at detecting sudden changes in the signal than Fourier transforms.

Meyer has his own wavelet as shown in the image below [2].
MeyerMathematica
The Meyer wavelet looks a bit complicated, doesn’t it? So I will stick to a much simpler wavelet, actually the very first example of a wavelet, the Haar wavelet. Haar constructed this function system in 1909 [3], long before the term Wavelet was introduced, but the system appears later as a special case of the Daubechies wavelet [4].
The wavelet is defined by the wavelet function \psi(t), called the mother wavelet, and a scaling function \phi(t), sometimes referred to the father wavelet. The mother wavelet is defined as

    \[    \psi(t) = \begin{cases}     1 & 0 \leq t < \frac{1}{2} \\     -1 & \frac{1}{2} \leq t < 1 \\     0 & \text{otherwise}   \end{cases} \]

with the scaling function

    \[  \phi(t) = \begin{cases}     1 & 0 \leq t < 1 \\     0 & \text{otherwise}  \]

If we continue to split the unit interval [0, 1] in 2^n pieces, we get a system of functions \psi_i^n(t) = 2^{\frac{n}{2}}\psi(2^n t - i) and \phi_i^n(t) = 2^{\frac{n}{2}}\phi(2^n t - i), where i \in \{ 0, \ldots, 2^n - 1\}. The functions \psi_i^n forms an orthonormal basis for square integrable functions on the unit interval. That means that functions can be approximated by a linear combinations of \psi_i^n, the same holds for the set of \phi_i^n.

The YouTube video below shows what the \psi_i^n looks like and how they are scaled and translated from the mother wavelet.

We will start by looking at the following function, the four purple line fragments, which one can think of as the four one-dimensional pixels [9, 5, 2, 4].

If we look at the father wavelets \phi_0^2, \phi_1^2, \phi_2^2 and \phi_3^2, we see that f can be written as

    \[ f = 9\phi_0^2 + 5\phi_1^2 + 2\phi_2^2 + 4\phi_3^2 \]

From the mother and the father wavelets one can construct subspaces V_n = \text{span}(\phi_i^n) and W_n = \text{span}(\psi_i^n) and we have that V_n is the orthogonal complement of W_n in V_{m+1} [2]:

    \[ V_n \oplus W_n = V_{m+1} \]

Let us see how we can represent f by a point in V_n and a point in W_n. The function is in V_2 and we want to write the function decomposed over V_1 and W_1. In V_1 and W_1 we have the functions \phi_0^1 = \sqrt{2}\phi(2t), \phi_1^1 = \sqrt{2}\phi(2t-1), \psi_0^1 = \sqrt{2}\psi(2t) and \psi_1^1 = \sqrt{2}\psi(2t-1). I skip the square root of two to get simpler calculations, and it only contributes with a constant factor anyway. We then have the following functions:

    \[    \psi_0^1(t) = \begin{cases}     1 & 0 \leq t < \frac{1}{4} \\     -1 & \frac{1}{4} \leq t < \frac{1}{2} \\     0 & \text{otherwise}   \end{cases} \]

    \[    \psi_1^1(t) = \begin{cases}     1 & \frac{1}{2} \leq t < \frac{3}{4} \\     -1 & \frac{3}{4} \leq t < 1 \\     0 & \text{otherwise}   \end{cases} \]

    \[  \phi_0^1(t) = \begin{cases}     1 & 0 \leq t < \frac{1}{2} \\     0 & \text{otherwise} \end{cases}  \]

    \[  \phi_1^1(t) = \begin{cases}     1 & \frac{1}{2} \leq t < 1 \\     0 & \text{otherwise} \end{cases}  \]

We see that \psi_0^1 and \phi_0^1 are non-zero in [0, \frac{1}{2}) and \psi_1^1 and \phi_1^1 are non-zero in [\frac{1}{2}, 1), so we want to find a and b such that

    \[ a\phi_0^1(t) + b\psi_0^1(t) = \begin{cases} a + b & 0 \leq t < \frac{1}{4} \\ a - b & \frac{1}{4} \leq t < \frac{1}{2} \end{cases} = \begin{cases} 9 &  0 \leq t < \frac{1}{4} \\ 5 & \frac{1}{4} \leq t < \frac{1}{2} \end{cases} \]

If we solve these equations we find that a = \frac{9+5}{2} and b = \frac{9-5}{2}, we get the equivalent for the pair 2 and 4, hence f decomposes into the point (\frac{9+5}{2}, \frac{2+4}{2}) in V_1 and the point (\frac{9-5}{2}, \frac{2-4}{2}) in W_1. We see that the point in V_1 is the average of pair of coefficients of f and the point of W_1 is the difference of the pair of coefficients, and we can represent f by [7, 3, 2, -1], where the two first points are the approximation coefficients of the original function (if you decide to zoom out and only keep half of the points in the unit interval, is seems quite natural to average the pair of points), and the two last points are called the detail coefficients.
The calculations for this example holds in general for 2^k points, we can approximate the original with half as many points by averaging pair of points, and the detail coefficients will give information about the difference of the pair of points. The detail coefficients can be used to go the other way and reconstruct the original point.

The same calculations can be applied to a 2-D image by first calculate average and difference for each row, and then for each column. I have written a simple Java program which does these calculations for an image, and thus transforms the image to a compressed image, together with three images of detail coefficients. The interesting part of the program is where the averages and differences are computed, the rest is just converting Java’s BufferedImage from color to gray scale, and from an Image to a matrix, and back again. The code snippet below shows how I computed the average and difference for the rows, the rest of the code can be found in the gist here.

  
private double[][] transformRows(double[][] input){
        int size = input.length;
        double[][] result = new double[size][size];
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size / 2; j++) {
                result[i][j] = (input[i][2*j] + input[i][2*j + 1])/2.0;
                result[i][size/2 + j] = (input[i][2*j] - input[i][2*j + 1])/2.0;
            }
        }
        return result;
    }


Many of the pixels in an image will be very similar to their neighbouring pixels, hence many of the differences will be close to zero. The differences are largest when there are sudden changes in the image. For instance beetween the hair and the face of Lenna. Another interesting observation is that the in upper right image the vertical details are most prominent, like the nose, whereas the lower left has horizontal details like the mouth.This is because of the way we calculate average and sums, the upper right will be averaged differences of pixels in the same row, and the lower left will be averaged differences of pixels in same column.

So what can this wavelet transform be used for? One obvious application is image compression. We have got a compressed version of the original, and the detail coefficients are often zero (or a very small number). One can repeat the process, and transform the transformed image several times, and then use the detail coefficients to reconstruct the original, with or without loss [5]. Another interesting application is to use the detail coefficients as features in image recognition. In [6] it was used with logistic regression to detect facial expression, and in [7] the Haar transform was used to classify leaves.

References
[1] If it were true, it would be known – The Abel Committee explains the
choice of Yves Meyer as this year’s Abel Prize Laureate

[2] Wavelet. (2017, April 24). In Wikipedia, The Free Encyclopedia. Retrieved 16:17, May 21, 2017, from https://en.wikipedia.org/w/index.php?title=Wavelet&oldid=776912745
[3] A. Haar Zur Theorie der orthogonalen Funktionensysteme
[4] Haar wavelet. (2017, March 4). In Wikipedia, The Free Encyclopedia. Retrieved 21:24, May 21, 2017, from https://en.wikipedia.org/w/index.php?title=Haar_wavelet&oldid=768512588
[5] Colm Mucahy, Image compression using the Haar wavelet transform, pdf
[6] Mahesh Goyani and Narendra Patel, Multi-Level Haar Wavelet based Facial Expression Recognition using Logistic Regression, pdf
[7] Haiyan Zhang and Xingke Tao, Leaf Image Recognition Based on Wavelet and Fractal Dimension,
pdf

What do you feel?

DollEmotionThis weekend I spent some time making a simple mobile app for emotion recognition, using Microsoft Cognitive Services. There are several cool APIs there, for instance the Computer Vision API which can recognize content in an image, recognize celebrities, and extract text from images. I guess I just ended up with the Emotion API because it also might be useful for work.
The APIs are well documented, and there are client libraries and sample applications on GitHub, and the client libraries are available at Maven central repository for Java and NuGet for C#. And they are free of use for registered developers, within some limits (30000 transactions pr month for the Emotion API).
I first decided to try to make the app in Xamarin, since it’s now free with Visual Studio, but I didn’t get the the client library to work there, it seemed to be something wrong with serialization in mono, so the library might not be entirely cross-platform (or I messed up somewhere). So I switched to Android studio, and made the very simple demo app there instead. ClownEmotion

The app allows you to take a photo, and the photo is then posted to Microsofts service for emotion recognition The service returns a bounding box for each recognized face, together with confidence across the eight feelings; happiness, sadness, surprise, anger, fear, contempt, disgust and neutral.
The initial testing showed that the old doll is neutral, but it also scores some at surprise. The clown was happy with very high confidence, 0.99997, but the clown scored similar on happiness without the nose. Further, the faces of teddy bears and guinea pigs are not recognized, and the same was occasionally true for my husband, too much beard perhaps?

K-mean clustering in Neo4j

The last few days I’ve been playing with Neo4j. I’ve always been intrigued by graphs, and I’ve wanted to learn more about graph databases for some time now, and finally, I was able to find some time for it.

One challenge I usually have when I want to test out new technology is to find a suitable use case. Simple enough, but still one that illustrates some amount of the features and functionality of the new technology. In this case I ended up with a simple toy example to illustrate how the k-mean clustering algorithm, used in machine learning, can partition a data set into k disjoint clusters.

The k-mean clustering algorithm

The algorithm itself is fairly simple. Assume that we have a data set with observations, and each sample has data for d features, hence each sample can be considered as a real d-dimensional vector [x_1, \ldots, x_d]. This means that we can calculate the squared Euclidean distance of two vectors \textbf{x} = [x_1, \ldots, x_d] and \textbf{y} = [y_1, \ldots, y_d] as

    \[ \|\textbf{x} - \textbf{y}\|^2 = (x_1 - y_1)^2 + (x_2 - y_2)^2 + \dots + (x_d - y_d)^2, \]

and if we have m vectors \textbf{x}^1,\textbf{x}^2, \ldots, \textbf{x}^m we can calculate the mean or average as

    \[ \frac{1}{m}\sum_{j = 1}^{m}{\textbf{x}^j} = \frac{1}{m}[\sum_{j = 1}^{m}{x_1^j}, \ldots, \sum_{j = 1}^{m}{x_d^j}]. \]

To use the algorithm, we first have to decide the number of clusters, and then initialize what is called centroids, one for each cluster, \{ c_1, \ldots, c_k\}. The centroids will be the centres of each cluster. One way of initializing the centroids is to randomly pick k samples from the data set. Once we have the initial k centroids the following two steps are repeated.

Cluster assignment step

In this step each sample s is assigned to the cluster corresponding to the centroid with the smallest Euclidean distance to the sample, i.e. the centroid closest to s. Hence, s is assigned to one of the centroids of the following set. (It might happen that the sample is exactly in the middle of two or more centroids, but usually, this set consists of only one centroid).

    \[ \{c_i:  \|c_i - s\|^2 =  \min_{j}{\|c_j - s\|^2}\} \]

Centroid update step

When all the samples are assigned to a centroid, this step will calculate new centroids for each of the k clusters by taking the mean of all the assigned samples. So for assigned samples s^1, \ldots, s^r, the vector for the new centroid c_i will be

    \[ \textbf{c}_i = \frac{1}{r}\sum_{j = 1}^{r}{\textbf{s}^j} \]

These two steps are repeated until the algorithm converges, i.e., the assignment step doesn’t change the previous assignments. But the algorithm might converge to a local optimum, and the algorithm doesn’t guarantee that the global optimum is found. So a common approach would be to run the algorithm several times with different initialization of the centroids and choose the best one.

The data set

In order to do the clustering I needed a data set, and I ended up with data set of seeds from UCI Machine Learning Repository. The set consist of 210 samples of kernels of three types of wheat, 70 samples of each type. The samples each have seven attributes; area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient and length of kernel groove, and hence each sample can be considered as a 7-dimentional vector. The set is labelled, so we know which samples that belongs to the same type of wheat. The k-mean algorithm if often used with unlabelled sets and doesn’t use the labels, but I’ve included them in the example so we easily can see how the clustering works compared to the original labelling.

Implementation in Neo4j

I have two kinds of nodes in my model, seeds and centroids. They both contain the seven attributes from the data set, and seeds also have their original labelling. Further, centroids have an index which gives the number of the corresponding cluster, and an attribute “iteration”, which contains the iteration the centroid is used in. (Another approach would be to remove the old centroids after calculating the new ones, and thus, there is no need to keep track of the iterations. But we want to see the change so we are keeping all centroids). The centroids are chosen very non-random, I’ve just picked data from one seed of each label, hence the clustering will be good already after the first cluster iteration.

The creation of the nodes is pretty straight forward. The attributes of a node in Neo4j are in JSON format, so I started off by reading the data set file in Java, creating Seed objects, and then use Gson to convert the objects to JSON strings. Before the colon and the type name one can add an identifier if one wants to refer to the same node later in the same script, but we don’t need that in our case.

CREATE (:Seed {area:15.26,perimeter:14.84,compactness:0.871,kernelLength:5.763,kernelWidth:3.312,asymmetryCoefficient:2.221,kernelGrooveLength:5.22,label:1})
CREATE (:Centroid {area:13.84,perimeter:13.94,compactness:0.8955,kernelLength:5.324,kernelWidth:3.379,asymmetryCoefficient:2.259,kernelGrooveLength:4.805, index:1, iteration:1})

The complete script file for creating all the nodes is in this file.

Cluster assignment

The code below shows how I assign the seeds to the right centroid. It feels a bit clumsy, but it was the best I could come up with in pure Neo4j. If this was a part of an application, one would probably let the application code calculate the distances, and find the minimum. And as far as I can tell, Neo4j does’t support user defined functions, so I cannot create like a distance-function instead of duplicating the calculation of distance three times.
The MATCH statement does pattern matching, and one can specify paths, nodes and relations, with our without types (which I think is called labels in the graph database world) on nodes or relationships, and with identifiers to refer to the elements later in the statement. So in our case we need all the seeds, and for each seed we will calculate the distances to each of the three centroids, and then find the closest centroid. The MATCH clause finds 210 rows, one for each seed, containing the seed and the three centroids.
The three SET statements add new attributes distC1, distC2 and distC3 to each seed, containing the distance from the seed to each centroid. The following WITH clause is used to bind variables from the matching to be used later. So we want to keep each seed s, and then for each seed the closest centroid, kept as minC, and finally, we create a IN_CLUSTER relationship from the seed s to the centroid minC. After the CREATE one could have tidied up the seeds, and deleted the three distance attributes.

MATCH (s:Seed), (c1:Centroid{index: 1, iteration: 1}), (c2:Centroid{index: 2, iteration: 1}), (c3:Centroid{index: 3, iteration: 1}) 
SET s.distC1 = (s.area - c1.area)^2 + (s.perimeter - c1.perimeter)^2 + (s.compactness - c1.compactness)^2 + (s.kernelLength - c1.kernelLength)^2 + (s.kernelWidth - c1.kernelWidth)^2 + (s.asymmetryCoefficient - c1.asymmetryCoefficient)^2 + (s.kernelGrooveLength - c1.kernelGrooveLength)
SET s.distC2 = (s.area - c2.area)^2 + (s.perimeter - c2.perimeter)^2 + (s.compactness - c2.compactness)^2 + (s.kernelLength - c2.kernelLength)^2 + (s.kernelWidth - c2.kernelWidth)^2 + (s.asymmetryCoefficient - c2.asymmetryCoefficient)^2 + (s.kernelGrooveLength - c2.kernelGrooveLength)
SET s.distC3 = (s.area - c3.area)^2 + (s.perimeter - c3.perimeter)^2 + (s.compactness - c3.compactness)^2 + (s.kernelLength - c3.kernelLength)^2 + (s.kernelWidth - c3.kernelWidth)^2 + (s.asymmetryCoefficient - c3.asymmetryCoefficient)^2 + (s.kernelGrooveLength - c3.kernelGrooveLength)
WITH s, 
case 
when s.distC1 <= s.distC2 and s.distC1 <= s.distC3 then
c1
when s.distC2 <= s.distC1 and s.distC2 <= s.distC3 then c2 else c3 end as minC 
CREATE (s)-[:IN_CLUSTER]->(minC)
RETURN *

The picture shows the three cluster after the seeds have been assigned, and the table shows the statistics, the first cluster has few seeds, they should have 70 seeds each if the algorithm gets everything correct, but the first cluster has a high rate of correct seeds, compared to the third cluster where only 80% of the seeds are correct.
clustering_1

Cluster no Total assigned Correct assigned Percentage correct
1 50 44 88%
2 80 69 86.25%
3 80 64 80%

Centroid update

The next step is to find the new centroids for the next assignment step, by calculating the average values of the seven attributes for the seeds assigned to each cluster. I found the useful avg-function that calculates the average of an attribute value over all nodes with the same identifier. But in this case it is important to think through what you include in a single match. If the match statement was like MATCH (s1:Seed)-[:IN_CLUSTER]->(c1:Centroid {index: 1, iteration: 1}), (s2:Seed)-[:IN_CLUSTER]->(c2:Centroid {index: 2, iteration: 1}), (s3:Seed)-[:IN_CLUSTER]->(c3:Centroid {index: 3, iteration: 1}) we would get the Cartesian product over s1, s2 and s3, and the number of rows returned would 50 * 80 * 80 = 320 000. This would still give the right numbers when using the average function since taking the average over multiple copies of a value will give back the original value, but for other aggregate functions, like sum, one would of course get wrong values.

MATCH (s1:Seed)-[:IN_CLUSTER]->(c1:Centroid {index: 1, iteration: 1})
WITH avg(s1.area) as s1Area, avg(s1.perimeter) as s1Perimeter, avg(s1.compactness) as s1Compactness, avg(s1.kernelLength) as s1KernelLength, avg(s1.kernelWidth) as s1KernelWidth, avg(s1.asymmetryCoefficient) as s1AsymmertryCoefficient, avg(s1.kernelGrooveLength) as s1KernelGrooveLength
MATCH (s2:Seed)-[:IN_CLUSTER]->(c2:Centroid {index: 2, iteration: 1})
WITH s1Area, s1Perimeter, s1Compactness, s1KernelLength, s1KernelWidth, s1AsymmertryCoefficient, s1KernelGrooveLength, avg(s2.area) as s2Area, avg(s2.perimeter) as s2Perimeter, avg(s2.compactness) as s2Compactness, avg(s2.kernelLength) as s2KernelLength, avg(s2.kernelWidth) as s2KernelWidth, avg(s2.asymmetryCoefficient) as s2AsymmertryCoefficient, avg(s2.kernelGrooveLength) as s2KernelGrooveLength
MATCH (s3:Seed)-[:IN_CLUSTER]->(c3:Centroid {index: 3, iteration: 1})
WITH s1Area, s1Perimeter, s1Compactness, s1KernelLength, s1KernelWidth, s1AsymmertryCoefficient, s1KernelGrooveLength, 
s2Area, s2Perimeter, s2Compactness, s2KernelLength, s2KernelWidth, s2AsymmertryCoefficient, s2KernelGrooveLength, 
avg(s3.area) as s3Area, avg(s3.perimeter) as s3Perimeter, avg(s3.compactness) as s3Compactness, avg(s3.kernelLength) as s3KernelLength, avg(s3.kernelWidth) as s3KernelWidth, avg(s3.asymmetryCoefficient) as s3AsymmertryCoefficient, avg(s3.kernelGrooveLength) as s3KernelGrooveLength
CREATE (:Centroid {area: s1Area, perimeter: s1Perimeter, compactness: s1Compactness, kernelLength: s1KernelLength, kernelWidth: s1KernelWidth, asymmetryCoefficient: s1AsymmertryCoefficient, kernelGrooveLength: s1KernelGrooveLength, index: 1, iteration: 2})
CREATE (:Centroid {area: s2Area, perimeter: s2Perimeter, compactness: s2Compactness, kernelLength: s2KernelLength, kernelWidth: s2KernelWidth, asymmetryCoefficient: s2AsymmertryCoefficient, kernelGrooveLength: s2KernelGrooveLength, index: 2, iteration: 2})
CREATE (:Centroid {area: s3Area, perimeter: s3Perimeter, compactness: s3Compactness, kernelLength: s3KernelLength, kernelWidth: s3KernelWidth, asymmetryCoefficient: s3AsymmertryCoefficient, kernelGrooveLength: s3KernelGrooveLength, index: 3, iteration: 2})
RETURN *

Next cluster assignment

The next cluster assignment is done by the same statement as the last, only with the iteration number in the centroids increased to two. An now we can easily see which of the seeds that are moved to a new cluster. And the statistics show that cluster two and three are improved, whereas the first cluster now has more seeds, but the correctness has decreased.
graph

Cluster no Total assigned Correct assigned Percentage correct
1 66 55 83.3%
2 73 66 90.4%
3 71 63 88.7%

Now, it is up to you to continue the clustering if you’d like, all the code is in this gist. If you find errors in this post, have suggestions for improvements or have other comments or questions, please let me know! 🙂

References

  1. http://neo4j.com/
  2. http://archive.ics.uci.edu/ml/datasets/seeds
  3. https://en.wikipedia.org/wiki/K-means_clustering

A tiny theorem prover

lispAt work we have many great (and some quite funny and not so great) old books, and when I came across this LISP book, I had to borrow it. (At some point there must have been a real library, the books have numbering, and an old fashioned book card at the back, with date and name of previous borrowers. So maybe I didn’t borrow the book properly, I just took it off the shelf). The book has many great chapters, for instance “Symbolic pattern matching and simple theorem proving”, “Program writing programs and natural language interfaces” and “LISP in LISP”. Here is a little theorem prover from chapter “Symbolic pattern matching and simple theorem proving” in the book, based on the resolution rule, and rewritten in Racket.

The resolution rule produces a new clause if two clauses contains complementary literals, i.e, one clause contains c and the other contains \neg c.

    \[ \frac{a_1\lor \ldots \lor a_i \lor c \lor a_{i+1}\lor \ldots \lor a_n, b_1\lor \ldots \lor b_j \lor \neg c \lor b_{j+1}\lor \ldots \lor b_m}{a_1\lor \ldots \lor a_i \lor a_{i+1}\lor \ldots \lor a_n, b_1\lor \ldots \lor b_j \lor b_{j+1}\lor \ldots \lor b_m} \]

The simplest case is when we have the clauses a \lor c and b \lor \neg c. Since c and \neg c cannot both be true at the same time we know that either a or b has to be true, hence \frac{a \lor c, b \lor \neg c}{a \lor b}.

(If you’re not familiar with logic operands and terms, here is the shortest intro, from a non-expert: the sign \neg is negation, and a literal is either a letter or the negation of a letter, and a letter is a variable that can be either true or false. The operators \lor is “or” (disjunction) and \land is “and” (conjuction), and a clause is a finite disjuction of literal, i.e., letters or negation of letters connected with or. In the fraction above, the clauses above the line are arguments, and the clause below the line is the conclusion, and it follows/can be deduced from the arguments. Hence, the conclusion is true if all the arguments are true)

The logic operators and the literals are symbols/quotes in the code (since we’re not in the “Lisp in Lisp”-chapter of the book), and the following resolve-function is my Racket version of the equivalent function in Common Lisp from the book. The code in the book uses SETQ to update variables, and GO for jumping in code, a lot, but I’m using let-expressions and recursing instead.
The resolve function uses two helper functions; invert puts the symbol ‘not in front of a literal, or removes the ‘not if the literal is a negation, and the combine function is a cleanup function that adds a literal to a list of literals if itself or the negation is not in the list already, and removes duplicates.

(define (resolve x y)
  (letrec ([res (λ (rest-x rest-y)
                 (cond
                   [(null? rest-x) 'no-resolvent]
                   [(member? (invert (car rest-x)) rest-y) (cons 'or (combine (car rest-x) (append (cdr x) (cdr y))))]
                   [else (res (cdr rest-x) rest-y)]))])
          (res (cdr x) (cdr y))))

And some tests to show how the function works:

(check-equal? (resolve '(or P) '(or Q)) 'no-resolvent)
(check-equal? (resolve '(or P) '(or (not P))) '(or))
(check-equal? (resolve '(or P (not Q)) '(or Q (not R))) '(or P (not R)))

So how can we use the resolution rule in a proof?

A conjecture is proved if the premise implies the conclusion, hence a conjecture is false if there is a combination of literals that makes the premise true and the conclusion false at the same time. Which again is equivalent to say that the conjecture is false if there is a combination of literals that makes the premise and the negation of the conclusion true at the same time.

We require both the premise and the negation of the conclusion to be on conjunctive normal form, which means that they both are of the form of clauses connected with “and”s. So for the premise and the negation of the conclusion to be true at the same time all the clauses have to be true.

If then the combined clauses contain both a literal and its negation then it is impossible for all the clauses to be true at the same time, hence the conjecture is not false, it is true. And this is the key idea of the proof algorithm; to loop through all clauses, and use the resolution rule one each pair of clauses. If the resolvent is empty, we know that the teorem is true, if we get a non-empty resolvent back, it is added to the list of clauses, and the resolvent is again used in the resolution rule with each of the other clauses in the list. If we have looped all the clauses and not have met an empty resolvent then the conjecture is false.

(define (prove premise negation)
  (letrec ([try-next-remainder (λ (remainder clauses)
                     (cond [(null? remainder)
                            (displayln '(theorem not proved))
                            #f]
                           [else (try-next-resolvent (car remainder) (cdr remainder) remainder clauses)]))]
           [try-next-resolvent (λ (first rest remainder clauses)
                       (if (null? rest) (try-next-remainder (cdr remainder) clauses)
                       (let ([resolvent (resolve first (car rest))])
                         (cond
                           [(or (equal? resolvent 'no-resolvent)
                                (member? resolvent clauses)) (try-next-resolvent first (cdr rest) remainder clauses)]
                           [(null? (cdr resolvent)) (display-clauses first (car rest))
                                                    (displayln '(produce the empty resolvent))
                                                    (displayln '(theorem proved))
                                                    #t]
                           [else (display-clauses first (car rest))
                                 (displayln (append '(produce a resolvent:) (list resolvent)))
                                 (try-next-remainder (cons resolvent clauses) (cons resolvent clauses))]))))]
           [clauses (append (cdr premise) (cdr negation))])
    (try-next-remainder clauses clauses)))
(test-case
 "prove true"
 (let ([premise '(and (or Q (not P))
                      (or R (not Q))
                      (or S (not R))
                      (or (not U) (not S)))]
       [negation '(and (or P) (or U))])
   (check-true (prove premise negation))))

The complete code, with tests, can be found here.

References

  1. https://en.wikipedia.org/wiki/Propositional_variable
  2. https://en.wikipedia.org/wiki/Propositional_calculus
  3. https://en.wikipedia.org/wiki/Resolution_(logic)
  4. https://en.wikipedia.org/wiki/Conjunctive_normal_form
  5. Lisp by Winston, Patrick Henry; Horn, B.K.P.

Week summary

One of the reasons for having this blog running again was to share more of what I do at work. I have been working as a software developer for some years, but a couple of months ago I began as a research scientist at a research institute, in the ICT department. I’m still sort of figuring out what I’m doing and how things work, but I also think that others are curious to what researchers do for a living.

At work

Currently I’m working on three projects, but this week I only managed to work on two of them, a pre-project for writing a project proposal to the research council, and a development project for making a web site in PHP.

The income of the institute consists to great extend of project fundings from the research council here in Norway, or from the EU, and there is always a continous work on writing proposals and to stay up to date with upcoming calls. The proposal I’m working on is for a large project with many partners, and I’m learning a lot on how these type of projects work. I’ve met many great people from the industry and other research institutions over the last few weeks on various workshops and meetings, and I have met users that will benefit from our solutions if get the funding. This week I’ve worked on the budget for the project (really not that fun, but it has to be done 🙂 ) and I’ve been writing and searching for references for the proposal and a state-of-the-art report we are working on, so I’m learning a lot about the actual research area as well.

When it comes to the web site, developing in PHP is also quite new to me. I’m not very found of the dynamic typing, and the tedious syntax with $this->that. Not to mention the extended use of arrays, where new keys and values are added ad hoc as needed. The latter is probably not just the language’s fault.
I’ve suggested that we should do some tidying in the code, and next week we will discuss what the domain model should look like. Hopefully we will be able to replace some arrays with real objects, do general tidying and write some test, and that will make the code more fit for fight.
Other than that I also spent some time trying to get a constructor to work for a new class I wrote. The reason why it didn’t work was the most newbie reason of all; I didn’t notice that there should be a double underscore at the beginning, and had only a single one in my code 🙂

After work

  • Often I read books on the subway to and from work, but this week I started to listen to the Type Theory Podcast. I listen to the two first episodes, which were great, and the webpage has good posts on the episodes, including many relevant references. I hope they will make more episodes.
  • I also watched Jenny Tong’s talk from Google’s Ubiquity, and now I want to make all sorts of stuff with Arduino. Great video!
  • I ran into one of the managers for the last customer I worked with when I still was a consultant, and I was invited to a cake celebration the next day, celebrating that the application I worked one had been put in production. It was nice to meet all my great friends and former colleagues there, both from my old company and from the customers.
  • It is the final week of the Machine Learning course I’m following at Coursera, taught by Andrew Ng. It’s a great course, and I would like to do more machine learning stuff in the future. I also believe that all programmers should learn a bit about machine learning and data mining, everyone wants to have applications that make suggestions or decisions for the user based on earlier behaviour nowadays.
  • We had a great meeting in the Lambda club, which is a group that originated from my former company, with enthusiastic programmers with an interest in functional programming, languages in general, and all weird and fun stuff one can do with code. At the meeting we had great fun solving the following puzzle; given squares divided into eight regions, where each region can be coloured in white or black, how many different squares are there, when squares obtained by rotation or flipping (reflection) of another square is consider to be the same. It’s a great puzzle which can be solved in various ways, and would fit into a codekata or similar.
    square

Ant dependency graph

I needed to see the dependencies between the ant tasks in our project.
I first came across this older page with different tools for creating a graph of the dependencies, but unfortunately, the tools mentioned here do not seem to be developed any more.
But then I found this great groovy script embedded as an ant target.
I downloaded groovy, and separated the taskdef from the groovy script, in order to place the groovy libraries where I wanted.

   <taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy">
     <classpath>
	<fileset dir="groovy-1.8.6/lib" includes="*.jar"/>
      </classpath>
    </taskdef>
    <groovy>
      ....
    </groovy>

GraphViz was installed in no time (from ubuntu APT), and the resulting graph is quite good!

Trying out PhoneGap and Sencha with Eclipse on Ubuntu

Yesterday at work we discussed mobile app development on different platforms and technologies. Someone mentioned Sencha, which I hadn’t heard about before, so today I had to find out more 🙂
I came across the tutorial A Sencha Touch MVC application with PhoneGap, which I of course had to try out, despite the fact that it is marked as “hard” and I know very little about javascript and nothing about PhoneGap, and the tutorial is based on developing in xcode on a mac, I use eclipse in Ubuntu.
So here are some notes on what I did to get it to work on Ubuntu.
* I started with this list for getting started with PhoneGap
* When I wanted to install the ADT plugin in Eclipse I got an error saying “missing org.eclipse.wst.sse.core 0.0.0”. I then did the steps of post #7 here which did the magic.
* After all the installing I continued with the “Hello World”-example here, which also gives a good setup for following the Sencha tutorial. (One might name the project Contacts right away)
* I happily continued with the Sencha tutorial for a while, copy-pasting code and getting the folder structure as described. But after a while the need to test the app on a real device emerged.
* So to get my Samsung galaxy to communicate with the computer I first put the phone in developement mode in settings -> Programs -> Development, and then I did the story described in this forum post, and restarted eclipse, and then suddenly, the app ran on the phone rather than on the slow emulator.
* After that, I really wanted the app to get and display my phones contact list, but I was only getting an empty page, and an error in the log saying “Uncaught DataView requires tpl, store and itemSelector configurations to be defined.”, and of course, the order in which the js-files are included in the index.html matters! Hence, include Contact.js before ContactsList.js, and then, ta-daa, the app is listing my contact list! (The rest of the tutorial will be food for another day 🙂 )