---

Claude, Gemini And Me Go Hunt The Busy Beaver

I have an admission to make. I got nerd-sniped the other day. It started innocently: by reading an article about Busy Beavers [0].

Those are the Turing Machines of size N which take maximum for that machine size amount of steps to terminate. Then, deciding whether a Turing machine of size N terminates, is "easy": if it didn't terminate after BB(N) steps, then it will not ever terminate. BB(5) was found just last year to be 47,176,870, and now the game is about finding the BB(6), which promises to be a phenomenally large number. The article talked about BB(6), and one of the current candidates is a function that "almost certainly never terminates, but we can't prove either way using the current mathematics". The equivalent Python function looks deceptively simple:

# Does this Python function halt?
def antihydra():
    a = 8
    b = 0
    while b != -1:
        if a % 2 == 0:
            b += 2
        else:
            b -= 1
        a += a//2

it creates a sequence of numbers made by adding the result of integer division by two to the current number to make the next one, and terminates if the share of odd numbers in the sequence ever start to go above two thirds: every odd number decreases it by one, and every even number increases it by two.

I did not heed the gentle warning in the article that this problem is related to the Collatz conjecture - a problem in math that's been unsolved for almost a century, and decided to muck around with LLMs to try to produce something looking like a proof. From a casual look it looks like it won't ever terminate, as the numbers seem to be distributed roughly 50/50, which is a far cry from the 33/66 proportion that is needed to have this terminate. However, math is about rigorous proofs, because what looks "obvious" to the "casual look" is often wrong.

I was okay with math in the uni, though I could never bear the burden of repetitive calculations (after manual calculations of a Fourier transform of a custom interval-defined function gave me three different times three different results, I wrote a program which calculated the correct result symbolically and printed the sequence of calculations). Of course not needing much of it for a lot of years didn't add to skills, but I can at least have a vague grasp when I am reading "the real math", and it seemed like a fun occasion to squint at some.

And boy, squint I did. Feeding the problem to Claude and giving a few ideas on how to attack it rigorously has produced a manuscript of three pages claiming that it solved the problem ! With my math skills being rusty and given the warning in the article, I decided to feed the results back into Claude for a review. It seemed happy with the result. "That can't be", I thought - so I decided to try Gemini as a reviewer. And it did a pretty fine job of doing a devastating deconstruction of the flaws present in the original "proof" from Claude. I asked it to help with the directions, and it did - and I fed them back to Claude.

The first iteration I had tried using just the Claude web interface, but with the large amount of text and the math symbols, having the "research" in files and under version control seemed like a better idea, so I quickly switched to that - so Claude was also doing the majority of the organisations (and dare I judge, the repository looks like a mad scientist's lab!! :)

Despite the slight chaos with the file organisation, it was much easier to work this way, and I made a bunch of new iterations, moderating this funny "scientific collaboration": Claude doing most generative work, Gemini doing the critique and finding flaws (it was pretty good at that!), and me moderating the process, copy pasting between the two and throwing in occasional curve-ball ideas like "can we try to attack this part using Fourier transform ?" and "what if we look at the result as an angle of a point on a circle that makes turns, rather than a fraction?".

The most enjoyable for me part in the process was that, despite being a complete incompetent in the subject matter, I could give these often not even possible questions and watch the result. It felt like playing with the sand on the seaside - except the sand was the ideas and concepts that were flying at various height over my head. The "theoretical" justification for giving these questions for me was attempting to throw the LLM state into something semantically different but which would be "congruent" with the problem.

What's there to show for all of this ?

Claude+Gemini+AY trio did successfully discover the connection of the termination problem with the Collatz conjecture, and also made a connection with another conjecture, which posits that the fractional parts of sequence "(3/2)^n" are uniformly distributed between 0 and 1. If that were true, the antihydra will not terminate. But, much like antihydra "seems like it's not terminating", the even parts "seem like they are uniformly distributed", and is an open hard problem. Another few venues that seemed to find the agreement were in the area of Generalised Baker's Transformations, and Beta-transformations - but I did not pursue that for two reasons: both were flying a bit too high over my head and it was far too much past the bed time.

So, other than pretending that I have learned some math and have no shame by attempting to attack an 80-year old unsolved conjecture, do I have some useful result to share ? Yes, I think I do.

My anecdotal experience indicates that Claude + Gemini seem to work pretty well in tandem, specifically: Claude as generator, and Gemini as a critic/reviewer. Gemini sounds more rigorous, while Claude seems more open to experimenting. Why am I saying so ? In a few 1:1 tests, Gemini more was more precise in catching and describing the issues than Claude; whereas in a couple of tests where I wanted it to do the generative work, Claude seemed to produce a longer and more rigorous-looking result - Gemini's seemed weaker, when I asked it to do all the "nuts and bolts" from A to Z. However, Gemini gave the impression of having better access to the math knowledge, thus my decision to assign them "Professor/reviewer" and "Researcher" roles, with me acting mostly as a source of randomness when they inevitably were getting stuck.

This "Professor/reviewer" role for Gemini was something first for me during this experiment - before, I almost exclusively used Claude. When I have a chance, I will try Gemini as a "software reviewer" in the same tandem with Claude, and see how it fares. But, that will be a topic for another write-up.

In the meantime, in case you are curious about the subject of this post in more detail, I give you a few teaser links - to the original article, and to some of the fun references that I have found in the process.

The original Quanta article that started it all:
https://www.quantamagazine.org/busy-beaver-hunters-reach-numbers-that-overwhelm-ordinary-math-20250822/

The wiki page about the short program with long run time:
https://wiki.bbchallenge.org/wiki/Antihydra

A Mahler's conjecture that is very closely related to the one we "discovered":
https://en.m.wikipedia.org/wiki/Mahler%27s_3/2_problem

A Mahler's conjecture that has absolutely nothing to do with this article, other than a testament that Mahler had a very broad expertise:
https://www.ias.edu/sites/default/files/video/IAS-2018.pdf

A very interesting paper that links Baker's maps with random Markov chains, which may well have nothing to do with this story but is anyway fun:
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/1A09A014313051A6DCDE0D0A2EAF92B3/S0008414X00015418a.pdf/on-a-class-of-generalized-bakers-transformations.pdf

Files in 2025-08-27-Claude-Gemini-And-Me-Go-Hunt-The-Busy-Beaver:

../
HEADER.txt                                         27-Aug-2025 22:09                8495