For the first time ever, I tried using AI to help with a non-exercise mathematical proof. I used Claude Opus 4.5, working through a mix of Claude Code and Anthropic’s new Cowork feature. The experiment was a success: We managed to close a gap in a proof, and Claude’s help made the process both faster and more pleasant than working alone would have been.
Starting from old notes
I started with some old, incomplete draft notes for a paper I hope to finish one day. Claude read through them and found several typos. The most interesting mistake was a case where I had written “$\leq$” instead of “$\geq$” within a rather technical section. Thankfully, this turned out to be just a typo and not a genuine mathematical error.
The troublesome lemma
The task I used for this experiment was to find a proof for the following lemma:
Lemma. Let $\mu_1 \geq \cdots \geq \mu_n \geq 0$ and $\delta > 0$. For any subset $I \subseteq \{1, \ldots, n\}$, define $$\phi(I) := \prod_{i \in I} \Bigl(1 - \frac{\mu_i}{\delta+\sum_{j \in I}\mu_j}\Bigr).$$ Then $$\min\Bigl\{ \phi(I) \Bigm| I \subseteq \{ 1, \ldots, n \} \Bigr\} = \min\Bigl\{ \phi\bigl(\{1,\ldots,k\}\bigr) \Bigm| k \in \{1, \ldots, n\} \Bigr\}.$$
My original idea had been to argue that any given set $I$ can be modified step-by-step to reach a set of the form $\{1, \ldots, k\}$, making $\phi(I)$ smaller at every step. The problem was that for all the types of steps I had previously tried, it was either hard to show (or wrong) that $\phi(I)$ always decreases.
For example: replacing a larger index with a smaller one sometimes increases $\phi$. Filling in gaps (say, replacing $\{1, 3\}$ with $\{1, 2, 3\}$) also sometimes makes $\phi$ larger.
Systematic exploration
I asked Claude to brainstorm ten ideas for such modification steps. The list Claude came up with included all the approaches I had already tried, plus some new ones. Then I had Claude use Python scripts to search for counterexamples to each idea. Claude used a combination of systematic search, random sampling, and numerical optimisation, and found counterexamples for all ten. The screenshot below shows a small part of this process.

We kept adding more ideas. In the end we had 24 different approaches, of which only two had not been ruled out by counterexamples. One useful idea that Claude introduced was to consider rules where you derive two different sets from $I$, with the hypothesis that at least one of them always reduces the value of $\phi$. The most promising rule at this stage was “either remove $\max(I)$, or fill the first gap”. For example, this hypothesis states that either $\{1\}$ or $\{1, 2, 4\}$ must be better than $\{1, 4\}$.
I could have done all of this myself, but it would have taken hours or days. Claude managed it in less than two hours (thinking, writing and running Python scripts, waiting for my input), with only occasional supervision on my part.
Finding a proof
I picked one of the two remaining modification rules and asked Claude to come up with a formal proof that this rule works. Claude tried for a while but never got anywhere.
At this point I suggested defining $$f(m) = \phi\bigl(I \cup \{m\}\bigr)$$ and asked Claude to try using this function to prove the lemma. This unlocked the problem! Without further help by me, Claude produced a correct proof within about five minutes. Claude’s main idea was to use the Cauchy–Schwarz inequality to show that whenever $f’(m) = 0$, we have $f’’(m) < 0$. This implies that $f$ has no local minima, and then on any interval the minimum of $f(m)$ is therefore attained at one of the two boundary points, which turn out to correspond to the two sets derived from $I$.
I believe I would have come up with this idea myself eventually, but it would certainly have taken me longer.
Tidying up together
From there, we worked together to simplify and tidy the proof. One example where Claude was helpful: at some point I noticed that we had used the symbol $k$ for two different quantities. Claude was able to replace one group of $k$s with $\ell$ automatically, while leaving the other $k$ alone. Here is the first page of our proof:

Conclusion
Overall, this was pleasant to do. It helped to resurrect an old draft, and was much faster than doing it all myself. Claude contributed useful ideas and saved a lot of tedious work, but the process still needed human guidance at the critical moments.