The LLL algorithm can efficiently “reduce” a “lattice” “basis”, something that turns out to be quite useful in cryptography, finding equations that a given number is a root of, integer programming, and lots of other cool stuff.
It has a (well deserved) reputation for being rather mysterious, despite the fact that you can kind of implement it in a dozen lines of code.
But what on earth does it do and how does it work? I spent the last 3 weeks nerd-sniped by those questions and this article is my answer.
Pre-reqs: Knowing what vectors, inner products (aka dot products) of vectors and determinants are.
Introduction
Lattices
Say you take a bunch of vectors with integer elements, and add them and subtract them from each other to get all the new vectors you possibly can. What you get is called a lattice. It’s kind of like a vector space but it’s discrete – has big holes in it. Here’s an example starting with the vectors and
def visualize_lattice(v1, v2, bounds=4, window=10):
G = Graphics()
for i in range(-bounds, bounds + 1):
for j in range(-bounds, bounds + 1):
point = i*vector(v1) + j*vector(v2)
if abs(point[0]) <= window and abs(point[1]) <= window:
G += point2d(point, color='blue', size=30)
return G
v1 = [2,0]
v2 = [1,1]
L = visualize_lattice(v1, v2, bounds=5, window=10)
show(L)
Sometimes, however, if we start with weird vectors, it can be a bit harder to figure out what the lattice looks like just from doing a few additions and subtractions. Let’s try and
v3 = [6,7]
v4 = [7,8]
L = visualize_lattice(v3, v4, bounds=5, window=10)
show(L)
Looks kind of 1-dimensional. But if we do more additions and subtractions….
L = visualize_lattice(v3, v4, bounds=150, window=10)
show(L)
Huh, it’s the whole integer plane! It would have been nice to know we could get both and
from our original vectors, that would have told us the lattice is the whole integer plane directly.
Any set of vectors that lets you generate the whole lattice is called a basis. and
was our original basis (by definition), but it turns out
and
is also a basis for the same lattice
Something that takes a bunch of vectors that form a basis and gives you a “nicer” basis is called an lattice basis reduction algorithm.
The LLL algorithm is a famous algorithm that does basis reduction – it gives you back a new basis that has vectors that are pretty short (the elements aren’t super big integers) and the vectors are nearly orthogonal to each other. Let’s try it on our vectors
M = matrix([v3, v4])
M.LLL()
Pretty cool!
Integer Relations
Here’s a fun example of why it might be useful to be able to shorten and orthogonalize integer vectors.
Suppose you vaguely remember that the golden ratio is about 1.618 (call that ) and satisfies a quadratic equation but you don’t remember the equation itself.
You can find that equation using LLL.
Here’s the idea. Take the integer vectors and
. They form a basis of some lattice. The first vector in the LLL-reduced basis will be an integer linear combination of these three, thus necessarily of the form
. But such a vector is “short” only if
,
,
are small and
is even smaller. Thus the first three entries of this short vector are likely to be the coefficients of the integral quadratic polynomial which has
as a root.
Let’s try it!
r = 1.618
v1 = [1,0,0, round(1000 * r * r)]
v2 = [0,1,0, round(1000 * r)]
v3 = [0,0,1, 1000]
M = matrix([v1, v2, v3])
M.LLL()
Our short vector is
, which means
satisfies the equation
. Or equivalently
, which is indeed the equation for the golden ratio!
Gram-Schmidt Orthogonalization
If we were working with real vectors, shortening is super easy – just divide the vector by its norm to get a norm-1 vector. Orthogonalizing is also quite easy: there is a known process for doing so, called the Gram-Schmidt orthogonalization (GSO) process.
Given a basis , we can construct a new basis
such that
and for
,
.
Let us introduce the notation that (non-bolded) refers to the 2-norm of the vector
.
Notice that , which is the projection of
onto the
direction. So the new vector
is the projection of
onto the subspace orthogonal to
.
We define , and we can write
.
Thus the new ‘s have the same span as the
‘s, but they are orthogonal. This process is very useful in many applications, but it is not enough (on its own) for our purposes, because we are working with integer vectors, and the coefficients
we multiply our vectors with are not necessarily integers.
To better understand these vectors, consider the orthonormal basis obtained by normalizing the Gram-Schmidt vectors. In this basis (where
and so on), the original vectors
can be written as


Two Dimensions: Lagrange’s Algorithm
Lets think about applying GSO to two integer vectors:
. Our starting point is the Gram-Schmidt orthogonalization process in which we set
and
, where
.
Intuitively, is the shortest vector we can hope for, since we removed all
‘s components from it (this fact is what make the Gram-Schmidt orthogonal). However,
might not be an integer, and we can only multiply our vectors by integers. To fix this issue, we transform
into the following basis:
and
, where
means rounding to the closest integer. Now
is the shortest lattice vector we can hope for, as we removed all the integer components of
. Note that when projecting
to the line generated by
, then this projection is between
to
.
The reduced basis of is
. Image and discussion adapted from Vinod Vaikuntanathan’s Notes
So far we reduced , but left
as is. But what if
is very long to begin with? There is no guarantee that the reduced basis is short. Let’s apply rounded-GSO, which is known as size-reduction, to the vectors
and
in that order.
# Create the plot with dots for grid points
x = [-4..4]
y = [-4..4]
dots = sum(point((i,j), size=20, color='lightgray') for i in x for j in y)
# Create the vectors
b1 = vector([4,4])
b2 = vector([-3,-1])
# Adjust these coordinates to match your desired vectors
arrow1 = arrow((0,0), b1, color='green', width=2)
arrow2 = arrow((0,0), b2, color='red', width=2)
# Add labels
l1 = text('b₁', (4.2,4.2), color='green')
l2 = text('b₂', (-3.2,-1.1), color='red')
# Combine and display
show(dots + arrow1 + arrow2 + l1 + l2, aspect_ratio=1, axes=False)
B = matrix([b1,b2])
B.gram_schmidt()[0]
b1_tilde = B.gram_schmidt()[0][0]
b2_reduced = b2 - round((b2 * b1_tilde)/(b1_tilde * b1_tilde)) * b1_tilde
b2_reduced
g1 = arrow((0,0), (4,4), color='gray', width=2)
g2 = arrow((0,0), (1,3), color='gray', width=2)
show(dots + arrow1 + arrow2 + l1 + l2 + g1 + g2, aspect_ratio=1, axes=False)
Not very impressive, neither short nor particularly orthogonal. What if we had done it the other way around, with the shorter red arrow as b1?
l1_new = text('b2', (4.2,4.2), color='green')
l2_new = text('b1', (-3.2,-1.1), color='red')
# Combine and display
show(dots + arrow1 + arrow2 + l1_new + l2_new, aspect_ratio=1, axes=False)
c1 = b2
c2 = b1
C = matrix([c1,c2])
C.gram_schmidt()[0]
c1_tilde = C.gram_schmidt()[0][0]
c2_reduced = c2 - round((c2 * c1_tilde)/(c1_tilde * c1_tilde)) * c1_tilde
c2_reduced
g3 = arrow((0,0), (-3,-1), color='gray', width=2)
g4 = arrow((0,0), (-2,2), color='gray', width=2)
show(dots + arrow1 + arrow2 + l1_new + l2_new + g3 + g4, aspect_ratio=1, axes=False)
Much nicer! We have learned something about rounded-GSO applied to integer vectors: the order of the vectors is really important: we should start with the smallest vectors and use them to orthogonalize the others. In fact, if we start with the shorter vector, this guarantees that the resulting basis is “close” to orthogonal – the angle between to
is at least 60 degrees. This is because if we calculated
on the new reduced basis as
, then
, because we are left with 1/2 in the worst case of rounding.
But , so
implies
.
We use this property to define one of the things we would like to see in a reduced basis.
A basis is size-reduced if the Gram-Schmidt orthogonalization of the basis satisfies the following condition: for every
.
The other thing we would like to see is .
Suppose we apply the Gram-Schmidt orthogonalization process to and
to get
and
.
Let’s rewrite in terms of the Gram-Schmidt vectors:
and
We notice that
where there are no cross terms in the square because and
are orthogonal.
This equivalent size comparison in terms of GSO vectors instead of the original vectors will be useful for understanding LLL.
More Dimensions: Hermite’s Algorithm
How can we generalize Lagrange’s algorithm to multiple dimensions? We have two ideas: size reduction, and making sure that we size reduce using the vectors in the right order.
Hermite (in a letter to Jacobi) invented a generalization based on the idea of picking the smallest vector first, and then recursing in the subspace othogonal to that vector.
- Let
be a basis of a lattice
.
- Ensure that
has minimal norm among all
‘s: otherwise, swap
with the shortest basis vector
.
- Apply recursively the algorithm to the projected basis
, in such a way that all the basis vectors
are size-reduced with respect to
:
for all
.
- If
has minimal norm among all
‘s, the algorithm terminates. Otherwise, swap
with the shortest basis vector
, and restart from the beginning.
What does this look like? Suppose we have reduced vectors. Remember that (written in the normalized original GSO basis) the vectors look like
If we look at the projection left after we drop coordinates corresponding to the vectors we’ve already sorted, what’s left is
We need to pick the smallest of these. We could also be lazy and pick between the smaller of the first two options ( and
) since there’s less coefficients to work with. This is still fine as long as we then backtrack and compare
to the new
.
If we do this, we want to ensure , which means (after squaring both sides)
exactly like in the 2D case.
Pseudocode for Hermite’s Algorithm
Note that the inner product of a vector with itself defines the square of its norm: .
INPUT
a lattice basis b1, b2, ..., bn in Z^m
PROCEDURE
B* <- GramSchmidt({b1, ..., bn}) = {b1*, ..., bn*}; and do not normalize
μi,j <- InnerProduct(bi, bj*)/InnerProduct(bj*, bj*); using the most current values of bi and bj*
k <- 2;
while k <= n do
(SIZE REDUCTION STEP)
for j from k−1 to 1 do
if |μk,j| > 1/2 then
bk <- bk − ⌊μk,j⌉bj;
Update B* and the related μi,j's as needed.
end if
end for
(NOW CHECK SHORTNESS)
if InnerProduct(bk*, bk*) >= (1 − (μ_k,k−1)^2) InnerProduct(bk−1*, bk−1*) then
k <- k + 1;
else
Swap bk and bk−1;
Update B* and the related μi,j's as needed.
k <- max(k−1, 2);
end if
end while
return B the Hermite reduced basis of {b1, ..., bn}
OUTPUT
the reduced basis b1, b2, ..., bn in Zm
Minimizing a Loss Function
Can we frame the idea of putting the vectors with shortest GSO-version first in a numerical way which can be optimized? One idea is to define a loss function that measures basis quality by multiplying the norms of the GSO-vectors together, but using more copies of the earlier vectors, which weights them more. Then we try to minimize this loss.
Formally, let where
Where the determinant for non full rank matrices is defined as .
We can also write
because the determinant is not changed by orthogonalizing vectors (its the product of their othogonal parts that matters).
For 3 vectors this function would be

Let’s try it on our old 2D vectors:
def phi2vectors(v1,v2):
phi1 = (matrix([v1]) * matrix([v1]).transpose()).determinant()
phi2 = matrix([v1,v2]).determinant()
return (phi1 * phi2).abs()
b1 = vector([4,4])
b2 = vector([-3,-1])
print(phi2vectors(b1,b2))
print(phi2vectors(b2,b1))
Clearly swapping them is good for the quality of the reduced basis.
Termination of Hermite’s Algorithm
We can use this basis quality loss function to prove that Hermite’s algorithm terminates. The idea is simple. The only place the algorithm could loop forever is if we keep on incrementing and decrementing (in the pseudocode above). But we only ever decrease
when we swap
and
because
was actually smaller (in the space orthogonal to the first
vectors).
But every time we do this,
decreases (because the last vector became smaller)!
So starts out as some number. Every time we swap and loop again it decreases. But it’s the product of absolute values of determinants of integer matrices, which means it’s a positive integer! So it can’t decrease forever.
Complexity
Great. We proved the algorithm terminates. But how long does it take? Surprisingly, WE DON’T KNOW. We know it’s in polynomial time upto dimension 4, but beyond that the complexity of this algorithm is actually an open problem!
The trouble is that our termination argument said “the loss decreases every swap”… but we don’t know how much it decreases.
The LLL Algorithm
Lenstra, Lenstra and Lovasz’s great insight was (as it always is) to ask the right question. That question was: “What if we change our goal? What if we accept a worse quality output in exchange for a faster algorithm?”
Remember that the reason we wanted the GSO vectors in order of size was because reducing with a much bigger vector first gave bad results. But what if we were slightly more chill about this?
What if, instead of saying “We want the first vector smaller than the next.”, we said “We just don’t want the first vector to be too much bigger than the next.”
Let’s introduce a parameter (traditionally in the range
).
Now instead of demanding , we ask for
. For example, if
, we’d be fine if
, so the first vector isn’t shorter, but at least it’s no more than 4/3 times the size of the next vector.
After squaring both sides
So the condition we need is … and that’s exactly what the LLL algorithm is, just Hermite’s algorithm but with a weaker condition on the GSO vector size ordering.
Pseudocode for LLL Algorithm
INPUT
a lattice basis b1, b2, ..., bn in Zm
a parameter δ with 1/4 < δ < 1, most commonly δ = 3/4
PROCEDURE
B* <- GramSchmidt({b1, ..., bn}) = {b1*, ..., bn*}; and do not normalize
μi,j <- InnerProduct(bi, bj*)/InnerProduct(bj*, bj*); using the most current values of bi and bj*
k <- 2;
while k <= n do
for j from k−1 to 1 do
if |μk,j| > 1/2 then
bk <- bk − ⌊μk,j⌉bj;
Update B* and the related μi,j's as needed.
(The naive method is to recompute B* whenever bi changes:
B* <- GramSchmidt({b1, ..., bn}) = {b1*, ..., bn*})
end if
end for
if InnerProduct(bk*, bk*) > (δ − μ2k,k−1) InnerProduct(bk−1*, bk−1*) then
k <- k + 1;
else
Swap bk and bk−1;
Update B* and the related μi,j's as needed.
k <- max(k−1, 2);
end if
end while
return B the LLL reduced basis of {b1, ..., bn}
OUTPUT
the reduced basis b1, b2, ..., bn in Zm
Termination and Complexity of LLL
We use exactly the same loss function argument as before. Let’s do it a bit more rigorously.
The main difference from Hermite is that we don’t swap and backtrack so often, we only swap if it’s worth it – we get a big reduction in loss – otherwise we just go to the next dimension or terminate.
The basis quality loss function is where
Observation 1. is not too large to begin with
So, where
is the bit length of the vectors. We also know
because we’re dealing with integer lattices and each piece of the loss
can be interpreted as the
-dimensional volume enclosed by the vectors
.
Observation 2. The reduction step does not change the loss.
This is clear by looking at and observing that the reduction step leaves
‘s unchanged.
Observation 3. The swap step reduces by a constant factor.
Proof. Let us say that
and
were swapped. This only affects the value of
because changing order of vectors does not affect the determinant.
The old value of is,
while the new value is
.
. We see that the component of
orthogonal to the span of
is
(written in the old GSO basis created using
first). So,

So as long as is a fixed constant, we know that the loss decreases by a constant factor at every swap.
Combining the observations we see that the algorithm has to terminate in time poly .
We have built up LLL step by step, from Gram-Schmidt to Lagrange to Hermite to Lovasz’s relaxation. And now we’re done! But there’s still one final mystery waiting for us.
In theory, the downside of LLL is that (in the worst case) the basis quality can be poorer than what you get from Hermite. But in practice the basis quality is much better than the worst case bound… and no one knows why!
Maybe you’ll find out. Good luck 🙂
Credits
I extracted various ideas and text from:
AN LLL ALGORITHM WITH QUADRATIC COMPLEXITY
Mathematics of Public Key Cryptography Chapter 17
Why is the Lovasz condition used in the LLL Algorithm? (cryptography stackexchange)