Time to Think

Tim Anderton

Over the past few years ideas have been piling up in my notebooks and as half finished blog post drafts and paper drafts. More than once during the past few years I have been actively working on an idea in my free time only to see a fleshed out version of that same idea (or one very much like it) get published by someone working at DeepMind or Facebook AI a few months later. Sometimes this was demoralizing since it meant that I could no longer be the first to publish some novel idea. But more often than not it felt like an invitation, evidence that my ideas were worth pursuing more agressively. Even when the published papers were in very close alignment with my own unpublished musings there always felt like something I could add, a direction of investigation that seemed important or obvious to me but which wasn't being addressed. For example I was recently working on a series of related ideas in which the weights in a neural network are themselves the outputs of another network. A few months later I saw this paper and a few months after that another paper in which the authors train a network to generate the weights for other networks. But, although the idea was very much the same, the approaches are very different than what I had in mind and I think that my approach may even work better. But I was never going to find out unless I could find the time and energy to spend more than a couple of hours a week working on it, I want in on the fun!

Also, I have been thinking more and more about how much I was learning over time and how much creative work I was producing and comparing it to other periods of my life. By far the year of my adult life in which I learned most rapidly and did the most creative work is the year of transition between grad school and taking an ML job in industry. The first two years of grad school and the first year as a professional ML scientist might be competitive if you consider only the raw rate of learning. The first couple years of grad school made me revisit all the physics that I wasn't quite ready for on first exposure as an undergrad (I'm looking at you Hamiltonian/Lagrangian mechanics) and I found on second exposure what had been quite impossible to grasp now regularly just seemed to click into place. Likewise my first year as a professional ML scientist I had to quickly become very proficient in a whole host of skills and technologies that I previously had little exposure to (git, docker, CI/CD, SQL, API design, monitoring/alerting, airflow, pulsar, ... ) not to mention the absolute torrent of business jargon and hyper specific company information. But although I learned a ton in both of those periods of time but my creative output was relatively low. It is hard to tackle open ended questions when you are flooded with an endless series of very specific seemingly urgent questions like "What is the value of this integral?" or "How do I use a docker volume?".

When I compare the level of creativity of ideas I see in my notebooks from the year in between grad school and work the difference is night and day. Even compared to the later years of grad school (where in principle producing creative and original thought was my overarching goal) the difference is stark. In the year after grad school I gave myself permission to explore more freely. Instead of feeling constant intense pressure to pursue only those ideas which were expedient I suddenly felt free to pursue whatever ideas I found compelling. The resulting increase in terms of raw rate of learning, creativity, and my personal happiness was dramatic.

If I look back on the past three years working on ML in industry it isn't the production ML systems which I value most (though there are some aspects of those systems of which I am proud). It is the creative and seemingly useless things that I have done of which I am by far the most proud. The perfect example is the correspondence between binary trees and hyper-spherical polar coordinate systems that I wrote up on the train headed to/from work. The correspondence seems so straightforward that I would actually be quite surprised if I was the first person to ever notice it, but, I have certainly never seen it used before by anyone other than myself. At first glance this might look like a useless mathematical curio. After all why would any one want a lot of different high dimensional spherical coordinate systems? For that matter why would anyone want a single high dimensional spherical coordinate system in the first place? I think the answer is clear; no one really does, or more to the point no one yet knows that they do want such a thing. Because to me a flexible family of ways to reparameterize the unit sphere in any number of dimensions sounds like something I absolutely want in my toolbox. I may not yet have found a great application for it (and maybe I never will), but it seems so compelling and beautiful that I don't care if I never do get much use out of it.

Looking at things from this perspective it seems that I've gotten the mixture of how to spend my budget brain power budget all wrong. I have been spending the overwhelming majority of my brain power on the most seemingly urgent (and frankly sometimes inane) problems and questions of the moment. But I deeply want to be able to spend more of my time on the things which I find most compelling (like pursuing "real" AGI), even when those things are seemingly useless (like turning binary trees into coordinate systems) or perhaps impossible (like achieving "real" AGI).

Ideally I would love to land a job that affords me the freedom to explore compelling novel ideas, something like a research scientist position at a place like DeepMind. But, without a strong collection of publications in prestigious ML publications landing such a position seems very unlikely. In principle I could have continued trying to write up ideas as paper drafts in my spare time but that hasn't been working out so well for me recently. Even if I do manage to squeeze out a paper draft once every other year by giving up nights, weekends, and vacations one paper isn't likely to be enough to successfully re-aim my current trajectory. So I decided it was time to take a little leap of faith. I quit my job so that I can focus on the things that I find most compelling and turn some of the things in the ever growing pile of interesting ideas into prototypes, papers and blog posts.

March 14 marked the first day of another year of transition, a year of time to focus and think. I'm perhaps a little late in my announcement of my intent to the world. But, I couldn't help but immediately dig into some ideas that I've been saving up and seeing if they really work out (turns out some of them do!). Now two weeks in and with some paper drafts under way I've relaxed enough that I feel ok to slow down and officially acknowledge this moment.

Comments