3 February 2016

A few days ago I came across a thread on Hacker News which contained a lot of misinformation about primality testing. I'm not an expert on the subject, so I don't necessarily know the state of the art. However, the methods that I do know illustrate the tradeoffs that can be made. I'll be leaving out a lot of details, but hopefully will give enough to search for if you want to learn more.

The Problem

The fundamental problem that we're discussing is: Given a number \( N \), how can we determine whether it is prime? Throughout this post, we'll use \( n \) to denote the number of bits of \( N \), so for the number in the Hacker News thread, \( n = 1024 \). In other words, \( 2^{n-1} \leq N < 2^n \).

Trial Division

The simplest algorithm for testing primality is to try dividing \( N \) by every possible divisor up to \( \sqrt{N} \). This method will run in time \( O( p(n)2^{\frac{n}{2}} ) \), where the polynomial factor \( p(n) \) depends on factors like how division is implemented, and so on. We could also try to be a bit smarter and only try dividing by primes, but there are \( \Theta\left(\frac{\sqrt{N}}{\log \sqrt{N}}\right) \) primes up to \( \sqrt{N} \), so this is only going to improve the runtime by a factor of \( O(\log \sqrt{N}) = O(n) \) at most.

PRIMES is in P

In 2002, there was a (theoretical) breakthrough in primality testing where it was discovered that there is a deterministic algorithm for primality testing that runs in \( O(n^{12+\epsilon}) \) time for any \( \epsilon > 0 \), which is now known as the AKS primality test. Over the next few years, the runtime was improved to \( O(n^{6+\epsilon}) \), which will be our baseline.

Probable Primes

One well-known alternative to AKS is the Miller-Rabin primality test. One iteration of Miller-Rabin consists of choosing a random number between \( 1 \) and \( N-1 \) and then doing a test with that number. If \( N \) is actually prime, then the test will always pass. If \( N \) is composite, then at least three quarters of the numbers between \( 0 \) and \( N-1 \) will cause the test to fail.

Generally Miller-Rabin is done with multiple iterations. If any single iteration fails, then \( N \) is certainly composite. However, if every iteration succeeds, then we still don't have a proof that \( N \) is prime. If we had an ideal random number source such that each of \( k \) iterations chose uniformly and independently, then we would have a probability of at most \( \frac{1}{4^k} \) that \( N \) is not prime.

Each iteration of Miller-Rabin takes \( O(n^{2+\epsilon}) \) time with an efficient multiplication algorithm, so overall it takes \( O(k \cdot n^{2+\epsilon}) \) time to verify a probable prime with at most \( \frac{1}{4^k} \) false positive rate. This fast runtime explains why Miller-Rabin is used so often in practice, but sometimes you would like more certainty that your number really is prime.

Provable Primes

If there's one thing I would like someone reading this post to take away, it's that you can verify the primality of a number faster than running AKS. The idea is that instead of working from just the number itself, we'll work from both a number and a certificate that gives us a hint about how to prove that number is prime. Once we have such a certificate, we'll be able to prove with certainty that our number is prime.

The first type of primality certificate was introduced in 1975 and is known as a Pratt certificate. A Pratt certificate for \( N \) consists of:

  • A number \( a \) such that
    • \( a^{N-1} \equiv 1 \pmod{N} \)
    • For any prime factor \( p \) of \( N-1 \), \( a^{\frac{N-1}{p}} \not\equiv 1 \pmod{N} \).
  • The prime factorization of \( N-1 \).
  • A Pratt certificate for each prime divisor of \( N-1 \) larger than some threshold.

The existence of such a certificate proves that \( N \) is prime, because the constraints on \( a \) ensure that it has order \( N-1 \), and if \( N \) is composite, then no such \( a \) exists (for example, if \( N = pq \) where both \( p \) and \( q \) are prime, then the maximum order is \( (p-1)(q-1) \)). However, in order to check those constraints, we need the factorization of \( N-1 \), and we then need to verify that all of the claimed primes are actually prime, hence the recursive nature of the certificate.

The threshold chosen should be such that it is reasonable to use a different primality test for numbers below that threshold (for example, trial division). For illustrative purposes, we'll use the threshold 100. Let's look at an example for \( N = 107 \). In this case, one possible certificate would be:

  • \( a = 2 \)
  • \( 106 = 2 \cdot 53 \)

To prove that 107 is prime, we then check:

  • \( 2^{106} \equiv 1 \pmod{107} \)
  • \( 2^{\frac{106}{2}} \not\equiv 1 \pmod{107} \)
  • \( 2^{\frac{106}{53}} \not\equiv 1 \pmod{107} \)
  • \( 2 \cdot 53 = 106 \)
  • 2 is prime.
  • 53 is prime.

Once we've verified all of these facts, we have proven that 107 is prime. It can be shown that the size of a Pratt certificate is at most \( O(n^2) \) bits, and that it can be verified in \( O(n^{3 +\epsilon}) \) time. However, generating a Pratt certificate is difficult, so this method isn't quite good enough in practice.

Elliptic Curve Primality Proving

Rather than using Pratt certificates, one can use Atkin-Goldwasser-Kilian-Morain certificates. I won't go into any details of the certificates here, but the intuition is that the problem with Pratt certificates is that in order to verify the order of an element of the multiplicative group modulo \( N \), one has to know the factorization of \( N-1 \), and therefore needs to be able to verify primality of all the prime factors of \( N-1 \), which could be as large as \( \frac{N-1}{2} \).

By using elliptic curve groups rather than the multiplicative group, you can try multiple elliptic curves until you find one that happens to have favorable properties for constructing the certificate.

The upshot of all of this is that such a certificate will have \( O(n^2) \) bits and be verifiable in \( O(n^4) \) time. Furthermore, there are heuristic arguments that say that a certificate should be able to be generated efficiently for almost all primes, and that the fraction for which it fails falls toward 0 exponentially.

As a result, if you need a number that is proven to be prime, elliptic curve primality is a good choice. While there are not guarantees on how fast a certificate is found, in practice a certificate will be found in an acceptable time. Once you have a certificate, then store it alongside the prime, and anyone can verify it to prove for themselves that the number is truly prime. While the certificate generation does not have a guaranteed runtime, the verification does.

Conclusion

As demonstrated by the Hacker News thread, most people only know of a small subset of these primality testing methods, which leads them to believe many false things. For example, people who know only of Miller-Rabin might believe that there's no efficient way to prove primality. People who know about Miller-Rabin and AKS might believe that primality can be proven but is impractical because of the runtime. In fact, there are many methods that have different tradeoffs. It's important to recognize where you don't know about a solution, and not confuse that with a solution not existing.