Sudoku Satisfiability

Tim Anderton

2020-12-29 00:00

Modern neural networks are becoming increasingly powerful perceptual machines but their ability to perform logic is still rudimentary at best. The early days of AI research focused on logic and symbol manipulation and for many decades computers have had super human logical capabilities (in some respects anyway). Recently there has been a surge of interest in trying combine modern neural architectures with classical symbolic/logical AI. These sorts of systems are starting to be described as "neuro symbolic AI". I think the general idea of incorporating sophisticated exact logical systems into a framework alongside fuzzier more "intuitive" perceptual systems is a very interesting idea.

What does this have to do with Sudoku you ask? Sudoku is exactly the sort of problem that is relatively easy to tackle with classical AI techniques and somewhat awkward to tackle using nothing but neural networks (not that it isn't fun and interesting to try see for example the many notebooks using tthis million puzzle sudoku dataset).

But with a little work you can encode sudoku as one of a standard sort of constraint satisfaction problem like a linear program or a boolean satisfiability problem. With such systems you can quickly and efficiently find solutions which you can be extremely confident will be 100% correct. By contrast if I were presented with for example a multi-layer perceptron or convolutional neural net which showed 100% prediction accuracy on a test set of a few hundred thousand boards I would not be convinced that the network truly had learned a function which was actually a perfect sudoku solver, it would be much more likely that it had somehow learned a close (but imperfect) approximation to such a thing. I certainly would not say that such a neural network would "understand" sudoku in any meaningful way. Though if I'm being honest I also do not think that the expression of the rules of sudoku in propositional logic represents a much stronger form of "understanding" in the human sense.

I have a lot more knowledge and experience with neural networks than I do with logical inference. But I have been wanting to add some logic based tools to my toolbox. The ability to do complex exact logical inference is useful even if I never apply it in my work as an ML practitioner. I have some experience with linear programming but I have never used a satisfiability solver before for anything and thinking through how to encode sudoku into a satisfiability problem seemed like it would be a good exercise to help me become more familiar with those sorts of logical systems.

Blogging Blitz 2020

Tim Anderton

2020-12-19 10:49

Comments

The end of the year is just around the corner once again, and once again I have generated just a couple of actual blog posts, several times as many unfinished drafts, and dozens of hastily scribbled ideas. Again this year I'm taking some time off and am going to make a push to get as many posts out as I can.

With last years "blogging blitz" I managed to get out 2 extra posts before the year was out. Naively I had thought I might be able to get 10 or 12 "quick" posts finished. My working theory was that since I had more than half a dozen very detailed partially finished blog posts it should have been possible to extract small relatively complete portions of those posts and polish them up for quick publication. That along with a handful of quick explorations of completely new ideas could get me all the way to my (apparently rediculously over ambitious) goal of 1 post a day average.

But what actually happened was rather different. Each time I revisited one of unfinished blog posts into which I have already poured a significant amount of work there would be one or more compelling but difficult to solve problems laid out for me. In short this strategy amounted to one where I was continually nerd sniping myself.

Of the posts that I actually managed to put out in last years blogging blitz one of them was a quick treatment of a fresh idea (the permutation-cnn post) and the other was a somewhat more detailed write up of the application of the exponential moving average trick to the calculation of streaming Fourier transforms. The streaming Fourier transform started out as an extraction of an idea from a much longer half finished post (which is still languishing for want of attention). However almost none of the code or prose came from the parent post I think trying to incorporate it actually may have slowed me down.

This year I am planning on a very different strategy. Instead of trying to focus on ideas into which I have sunk the most time in the past, I am going to actively try to avoid things which have already received a lot of attention. Instead I am going to focus mostly on ideas that I have put little or no time into previously. I will also try and somewhat favor the simpler more well defined ideas over the more complicated or unusual ideas (though how fun I think the idea is will definitely also still factor in).

A lot of my half finished posts are of the form "achieve the ambitious but somewhat vague goal x". For example I have put a hundred hours or more into a post about attempting to turn the path through parameter space of a neural network as it trains into audio in a way such that you could use it as a training diagnostic tool. Apparently that is a difficult task (or at least very difficult for me). Incidentally, that is the post that I extracted the idea for the "streaming spectral summaries" post in last years blogging blitz. But for this years blitz I will do my best to keep from doing any such goal based posts (as opposed to idea/concept based).

Additionally I will try and intersperse the longer more involved idea driven posts with short "tips and tricks" type posts which focus on simple tricks which I find useful (for example perhaps a post about why you maybe should sometimes be using np.newaxis instead of the .reshape methods of numpy arrays).

The blogging blitz 2020 starts now!

A real world loopy lock

Tim Anderton

2020-09-19 03:21

Comments

A while back I got a Master Lock "speed dial" pad lock. The padlock has a central nub that you push up/down/left/right in the correct sequence to unlock the lock. Like most locks you can set this sequence to a sequence of your choice. However, unlike most locks, you can set this pass sequence to be of any length! This begs the question how many different combinations does this lock allow? And if you were so inclined what would be the fastest possible way to crack its pass code?

lock_teaser_image

Visualizing Lattices

Tim Anderton

2020-04-17 10:54

Comments

John Conway recently passed away and I wanted to do something in homage.

More than a decade ago, I happened to pick up the book "Sphere Packings Lattices and Groups" by Conway and Sloane. The deep beauty and complexity of the mathematics of lattices took me completely by surprise. I'm not much of a group theorist but visualizing and trying to understand high dimensional structures is one of my lifelong passions. So in this post I'm going to spend some time playing with visualizations of high dimensional lattices.

Streaming Spectral Summaries

Tim Anderton

2019-12-27 00:00

Comments

In this post I'm going to tackle the question of how to go about taking the Fourier transform a stream of data when you aren't allowed to keep any more than the most recent K data points in memory, but must be able to estimate the spectral power at frequencies much lower than 1/K.

Permutation CNN

Tim Anderton

2019-12-22 00:00

Comments

To kick off the blitz I'm going to begin with an idea I had back in April of this year which I think could be a lot of fun to play with but which I don't feel like I have had enough time to get a real feel for. The idea is in essence "why not replace all the dense layers in a neural network with convolutional layers?".

Blogging Blitz 2019

Tim Anderton

2019-12-21 10:36

Comments

I haven't blogged this year as much as I would have liked to. The one significant post I did manage to get out about binary trees and coordinate systems is one of my all time favorites. But that post took me more than 6 months time to write, mostly in 30 minute stints on the train to and from work.

In fact if you count the time from the first moment when the idea was scribbled in one of my notebooks till the publication of the post that is probably 5 years time or more. I'm not exactly sure when those first scribbles were written down. There was a pause of at least 3 years time between the first time I wrote down the idea and when I stumbled onto the idea again and decided it was worth putting some more time in. I thought that the binary tree to spherical coordinate system correspondence was a particularly beautiful idea and I wanted to rescue it from the semi-infinite limbo of ideas scribbled into my notebooks and then forgotten.

old_notebooks

There are many many more ideas that I have played with for a few hours, days, or even weeks, and then left to languish in some stage of partial completion. If I continue doing things the way I have, then I will never have the time to polish most of these ideas up enough to publish. To at least partially combat this I'm going to finish up 2019 with a spate of posts where the idea is to get out as many posts as I can and try to suppress my internal censor. This will mean that the next few posts will be a little less polished and the ideas a little less well thought out. My usual style is probably a little too detailed and long winded anyway. Ready, set, go!

Audio Yo

Tim Anderton

2019-10-01 20:10

Comments

This is the notebook for the presentation that I will be giving at the SLC python meetup tomorrow. This slide and other slides like these containing the rough planned text of the talk are "skip" slides and should (hopefully) get removed when I compile the notebook into a slides format. (Bet you didn't know nbconvert could do that!).

To make a notebook into a slide show you will also need to annotate the cells of the notebook to say whether they represent a new slide, a continuation of the previous slide (slide type indicated by a "-"), a sub-slide (optional material for a slide which can be navigated with up/down or skipped by going left right), a fragment (which is on the previous slide but appears when you next hit right), or is to be skipped all together. This information can be added by changing the cell toolbar type to slideshow ( View->Cell Toolbar->slideshow)

you can then launch the slideshow using an nbconvert command like the following,

jupyter nbconvert slideshow.ipynb --to slides --post serve

Anyway the style of this post will be slightly different than normal and you will just have to imagine me waving my hands around frantically and waiting for dramatic ... pauses.

Binary Trees and Hyper Spherical Coordinates

Tim Anderton

2019-04-07 00:00

Comments

When dealing with vectors it is often very helpful to decouple the direction of the vector from its magnitude. This is a trick I was first taught in high school physics, and which I have never stopped finding extremely useful. In high school physics problems were mostly in 2D and so a direction was uniquely specified by just one rotation angle. As an undergraduate I was properly introduced to spherical polar coordinates, which let you express directions in 3D as a function of 2 angles. But it wasn't until close to the end of my graduate education in physics that one day I stumbled over the idea that polar coordinates in 2D, spherical polar in 3D can be thought of as special cases of a more general formalism that allows you to turn any binary tree with D leaves into expressions for the D components of a unit normal direction vector as a function of a set of standard angles.

I recently saw the outline of this idea scribbled down in one of my notebooks and I was again struck by the beauty of it, and I would like to share.

At the cost of getting a little ahead of myself here is the tree that corresponds to the usual spherical polar coordinate expansion.

teaser_image

To read off the expression for each coordinate you simply run up the tree from the appropriate labeled node and collect terms and then remember to put the overall vector magnitude $r$ out front of each term. I think even just this diagram alone might possibly be worth a blog post since it is much easier to remember for me than the pile of trigonometric terms that it represents. I certainly would have liked to have seen something like this as a mnemonic for spherical polar as an undergrad (or for that matter as a grad student).