Understanding the LLL Lattice Basis Reduction Algorithm

The LLL algorithm can efficiently “reduce” a “lattice” “basis”, something that turns out to be quite useful in cryptography, finding equations that a given number is a root of, integer programming, and lots of other cool stuff.

It has a (well deserved) reputation for being rather mysterious, despite the fact that you can kind of implement it in a dozen lines of code.

But what on earth does it do and how does it work? I spent the last 3 weeks nerd-sniped by those questions and this article is my answer.

Pre-reqs: Knowing what vectors, inner products (aka dot products) of vectors and determinants are.

lllfromvv.png

Introduction

Lattices

Say you take a bunch of vectors with integer elements, and add them and subtract them from each other to get all the new vectors you possibly can. What you get is called a lattice. It’s kind of like a vector space but it’s discrete – has big holes in it. Here’s an example starting with the vectors [2,0] and [1,1]

In [1]:
def visualize_lattice(v1, v2, bounds=4, window=10):
    G = Graphics()
    for i in range(-bounds, bounds + 1):
        for j in range(-bounds, bounds + 1):
            point = i*vector(v1) + j*vector(v2)
            if abs(point[0]) <= window and abs(point[1]) <= window:
                G += point2d(point, color='blue', size=30)
    return G

v1 = [2,0]
v2 = [1,1]
L = visualize_lattice(v1, v2, bounds=5, window=10)
show(L)
No description has been provided for this image

Sometimes, however, if we start with weird vectors, it can be a bit harder to figure out what the lattice looks like just from doing a few additions and subtractions. Let’s try [6,7] and [7,8]

In [2]:
v3 = [6,7]
v4 = [7,8]
L = visualize_lattice(v3, v4, bounds=5, window=10)
show(L)
No description has been provided for this image

Looks kind of 1-dimensional. But if we do more additions and subtractions….

In [3]:
L = visualize_lattice(v3, v4, bounds=150, window=10)
show(L)
No description has been provided for this image

Huh, it’s the whole integer plane! It would have been nice to know we could get both [1,0] and [0,1] from our original vectors, that would have told us the lattice is the whole integer plane directly.

Any set of vectors that lets you generate the whole lattice is called a basis. [6,7] and [7,8] was our original basis (by definition), but it turns out [1,0] and [0,1] is also a basis for the same lattice

Something that takes a bunch of vectors that form a basis and gives you a “nicer” basis is called an lattice basis reduction algorithm.

The LLL algorithm is a famous algorithm that does basis reduction – it gives you back a new basis that has vectors that are pretty short (the elements aren’t super big integers) and the vectors are nearly orthogonal to each other. Let’s try it on our vectors

In [4]:
M = matrix([v3, v4])
M.LLL()
Out[4]:
[0 1]
[1 0]

Pretty cool!

Integer Relations

Here’s a fun example of why it might be useful to be able to shorten and orthogonalize integer vectors. Suppose you vaguely remember that the golden ratio is about 1.618 (call that r) and satisfies a quadratic equation but you don’t remember the equation itself. You can find that equation using LLL.

Here’s the idea. Take the integer vectors [1,0,0, 1000(r^2)], [0,1,0, 1000(r)] and [0,0,1, 1000]. They form a basis of some lattice. The first vector in the LLL-reduced basis will be an integer linear combination of these three, thus necessarily of the form [a,b,c,1000(ar^{2}+br+c)]. But such a vector is “short” only if a, b, c are small and ar^{2}+br+c is even smaller. Thus the first three entries of this short vector are likely to be the coefficients of the integral quadratic polynomial which has r as a root.

Let’s try it!

In [5]:
r = 1.618
v1 = [1,0,0, round(1000 * r * r)]
v2 = [0,1,0, round(1000 * r)]
v3 = [0,0,1, 1000]
M = matrix([v1, v2, v3])
M.LLL()
Out[5]:
[ -1   1   1   0]
[ -3  16 -18  34]
[  4 -25  30  22]

Our short vector [a,b,c,1000(ar^{2}+br+c)] is [-1,1,1,0], which means r satisfies the equation -x^2 + x + 1 = 0. Or equivalently x^2  -x  -1 = 0, which is indeed the equation for the golden ratio!

Gram-Schmidt Orthogonalization

If we were working with real vectors, shortening is super easy – just divide the vector by its norm to get a norm-1 vector. Orthogonalizing is also quite easy: there is a known process for doing so, called the Gram-Schmidt orthogonalization (GSO) process.

Given a basis \mathbf{b}_1, \ldots, \mathbf{b}_n, we can construct a new basis \tilde{\mathbf{b}}_1, \ldots, \tilde{\mathbf{b}}_n such that \tilde{\mathbf{b}}_1 = \mathbf{b}_1 and for i > 1, \tilde{\mathbf{b}}_i = \mathbf{b}_i - \sum_{j=1}^{i-1} \frac{\langle \mathbf{b}_i, \tilde{\mathbf{b}}_j \rangle}{\langle \tilde{\mathbf{b}}_j, \tilde{\mathbf{b}}_j \rangle} \tilde{\mathbf{b}}_j.

Let us introduce the notation that b_i (non-bolded) refers to the 2-norm of the vector \mathbf{b}_i.

Notice that \frac{\langle \mathbf{b}_i, \mathbf{b}_j \rangle}{\langle \tilde{\mathbf{b}}_j, \tilde{\mathbf{b}}_j \rangle}\tilde{\mathbf{b}}_j = \langle \mathbf{b}_i, \tilde{\mathbf{b}}_j/\tilde{b}_j \rangle(\tilde{\mathbf{b}}_j/\tilde{b}_j), which is the projection of \mathbf{b}_i onto the \tilde{\mathbf{b}}_j direction. So the new vector \tilde{\mathbf{b}}_i is the projection of \mathbf{b}_i onto the subspace orthogonal to \tilde{\mathbf{b}}_1, \ldots, \tilde{\mathbf{b}}_{i-1}.

We define \mu_{ij} = \frac{\langle \mathbf{b}_i, \tilde{\mathbf{b}}_j \rangle}{\langle \tilde{\mathbf{b}}_j, \tilde{\mathbf{b}}_j \rangle}, and we can write \tilde{\mathbf{b}}_i = \mathbf{b}_i - \sum_{j=1}^{i-1} \mu_{ij} \tilde{\mathbf{b}}_j.

Thus the new \tilde{\mathbf{b}}_i‘s have the same span as the \mathbf{b}_i‘s, but they are orthogonal. This process is very useful in many applications, but it is not enough (on its own) for our purposes, because we are working with integer vectors, and the coefficients \mu_{ij} we multiply our vectors with are not necessarily integers.

To better understand these vectors, consider the orthonormal basis \tilde{\mathbf{b}}_1/\tilde{b}_1, \ldots, \tilde{\mathbf{b}}_n/\tilde{b}_n obtained by normalizing the Gram-Schmidt vectors. In this basis (where \tilde{\mathbf{b}}_1/\tilde{b}_1 = [1,0,\dots,0]^\top and so on), the original vectors \mathbf{B} = [\mathbf{b}_1, \ldots, \mathbf{b}_n] can be written as

    \[\left(\begin{array}{cccc} \tilde{b}_1 & \mu_{21}\tilde{b}_1 & \cdots & \mu_{n1}\tilde{b}_1 \\ 0 & \tilde{b}_2 & \cdots & \mu_{n2}\tilde{b}_2 \\ \vdots & & \ddots & \vdots \\ 0& 0& & \mu_{n,n-1}\tilde{b}_{n-1}  \\ 0 & 0 & \cdots & \tilde{b_n} \end{array}\right)\]

where column i shows the coordinates of \mathbf{b}_i in this orthonormal basis.

Two Dimensions: Lagrange’s Algorithm

Lets think about applying GSO to two integer vectors: \mathbf{B}= \left[\mathbf{b}_{1}, \mathbf{b}_{2}\right]. Our starting point is the Gram-Schmidt orthogonalization process in which we set \tilde{\mathbf{b}}_{1}:=\mathbf{b}_{1} and \tilde{\mathbf{b}}_{2}:=\mathbf{b}_{2}-\mu_{21} \tilde{\mathbf{b}}_{1}, where \mu_{21}=\left\langle\mathbf{b}_{2}, \tilde{\mathbf{b}}_{1}\right\rangle /\left\langle\tilde{\mathbf{b}}_{1}, \tilde{\mathbf{b}}_{1}\right\rangle.

Intuitively, \tilde{\mathbf{b}}_{2} is the shortest vector we can hope for, since we removed all \tilde{\mathbf{b}}_{1} ‘s components from it (this fact is what make the Gram-Schmidt orthogonal). However, \mu_{21} might not be an integer, and we can only multiply our vectors by integers. To fix this issue, we transform \mathbf{B} into the following basis: \mathbf{b}_{1}^{\prime}=\mathbf{b}_{1} and \mathbf{b}_{2}^{\prime}=\mathbf{b}_{2}-\left\lfloor\mu_{21}\right\rceil \mathbf{b}_{1}^{\prime}, where \lfloor\cdot\rceil means rounding to the closest integer. Now \mathbf{b}_{2}^{\prime} is the shortest lattice vector we can hope for, as we removed all the integer components of \tilde{\mathbf{b}}_{1}. Note that when projecting \mathbf{b}_{2}^{\prime} to the line generated by \mathbf{b}_{1}^{\prime}, then this projection is between -\mathbf{b}_{1}^{\prime} / 2 to \mathbf{b}_{1}^{\prime} / 2.

lll_size_reduction.jpg

The reduced basis of \left[\mathbf{b}_{1}=(2,0), \mathbf{b}_{2}=(3,2)\right] is \left[\mathbf{b}_{1}^{\prime}=(2,0), \mathbf{b}_{2}^{\prime}=(1,2)\right]. Image and discussion adapted from Vinod Vaikuntanathan’s Notes

So far we reduced \mathbf{b}_{2}, but left \mathbf{b}_{1} as is. But what if \mathbf{b}_{1} is very long to begin with? There is no guarantee that the reduced basis is short. Let’s apply rounded-GSO, which is known as size-reduction, to the vectors [4,4] and [-3,-1] in that order.

In [6]:
# Create the plot with dots for grid points
x = [-4..4]
y = [-4..4]
dots = sum(point((i,j), size=20, color='lightgray') for i in x for j in y)

# Create the vectors
b1 = vector([4,4])
b2 = vector([-3,-1])
# Adjust these coordinates to match your desired vectors
arrow1 = arrow((0,0), b1, color='green', width=2)
arrow2 = arrow((0,0), b2, color='red', width=2)

# Add labels
l1 = text('b₁', (4.2,4.2), color='green')
l2 = text('b₂', (-3.2,-1.1), color='red')

# Combine and display
show(dots + arrow1 + arrow2 + l1 + l2, aspect_ratio=1, axes=False)
No description has been provided for this image
In [7]:
B = matrix([b1,b2])
B.gram_schmidt()[0]
Out[7]:
[ 4  4]
[-1  1]
In [8]:
b1_tilde = B.gram_schmidt()[0][0]
b2_reduced = b2 - round((b2 * b1_tilde)/(b1_tilde * b1_tilde)) * b1_tilde
b2_reduced
Out[8]:
(1, 3)
In [9]:
g1 = arrow((0,0), (4,4), color='gray', width=2)
g2 = arrow((0,0), (1,3), color='gray', width=2)

show(dots + arrow1 + arrow2 + l1 + l2 + g1 + g2, aspect_ratio=1, axes=False)
No description has been provided for this image

Not very impressive, neither short nor particularly orthogonal. What if we had done it the other way around, with the shorter red arrow as b1?

In [10]:
l1_new = text('b2', (4.2,4.2), color='green')
l2_new = text('b1', (-3.2,-1.1), color='red')

# Combine and display
show(dots + arrow1 + arrow2 + l1_new + l2_new, aspect_ratio=1, axes=False)
No description has been provided for this image
In [11]:
c1 = b2
c2 = b1
C = matrix([c1,c2])
C.gram_schmidt()[0]
Out[11]:
[  -3   -1]
[-4/5 12/5]
In [12]:
c1_tilde = C.gram_schmidt()[0][0]
c2_reduced = c2 - round((c2 * c1_tilde)/(c1_tilde * c1_tilde)) * c1_tilde
c2_reduced
Out[12]:
(-2, 2)
In [13]:
g3 = arrow((0,0), (-3,-1), color='gray', width=2)
g4 = arrow((0,0), (-2,2), color='gray', width=2)

show(dots + arrow1 + arrow2 + l1_new + l2_new + g3 + g4, aspect_ratio=1, axes=False)
No description has been provided for this image

Much nicer! We have learned something about rounded-GSO applied to integer vectors: the order of the vectors is really important: we should start with the smallest vectors and use them to orthogonalize the others. In fact, if we start with the shorter vector, this guarantees that the resulting basis is “close” to orthogonal – the angle between \mathbf{b}_{1}^{\prime} to \mathbf{b}_{2}^{\prime} is at least 60 degrees. This is because if we calculated \mu_{21} on the new reduced basis as \left\langle\mathbf{b}_{2}^\prime, \mathbf{b}_{1}^\prime\right\rangle /\left\langle\mathbf{b}_{1}^\prime, \mathbf{b}_{1}^\prime\right\rangle, then \left|\mu_{21}\right| \leq 1 / 2, because we are left with 1/2 in the worst case of rounding.

But \left\langle\mathbf{b}_{2}^\prime, \mathbf{b}_{1}^\prime\right\rangle = |\mathbf{b}_{2}^\prime| |\mathbf{b}_{1}^\prime| \cos(\theta), so \mu_{21} = |\mathbf{b}_{2}^\prime|  \cos(\theta) /|\mathbf{b}_{1}^\prime| \leq 1/2 implies \cos(\theta) \leq 1/2.

We use this property to define one of the things we would like to see in a reduced basis.

A basis is size-reduced if the Gram-Schmidt orthogonalization of the basis satisfies the following condition: \left|\mu_{i j}\right| \leq 1 / 2 for every i>j.

The other thing we would like to see is \left\|\mathbf{b}_{2}\right\| \geq\left\|\mathbf{b}_{1}\right\|.

Suppose we apply the Gram-Schmidt orthogonalization process to \mathbf{b}_{1} and \mathbf{b}_{2} to get \tilde{\mathbf{b}}_{1} = \mathbf{b}_{1} and \tilde{\mathbf{b}}_{2} = \mathbf{b}_{2} - \mu_{21} \tilde{\mathbf{b}}_{1}.

Let’s rewrite \left\|\mathbf{b}_{2}\right\| \geq\left\|\mathbf{b}_{1}\right\| in terms of the Gram-Schmidt vectors: \mathbf{b}_{1} = \tilde{\mathbf{b}}_{1} and \mathbf{b}_{2} = \tilde{\mathbf{b}}_{2} + \mu_{21} \tilde{\mathbf{b}}_{1}

We notice that

    \[\left\|\mathbf{b}_{2}\right\| \geq\left\|\mathbf{b}_{1}\right\| \iff \left\|\tilde{\mathbf{b}}_{2}+ \mu_{21} \tilde{\mathbf{b}}_{1}\right\|^{2} \geq \left\|\tilde{\mathbf{b}}_{1}\right\|^{2}\]

    \[\iff \left\|\tilde{\mathbf{b}}_{2}\right\|^{2} + \mu_{21}^{2} \left\|\tilde{\mathbf{b}}_{1}\right\|^{2} \geq \left\|\tilde{\mathbf{b}}_{1}\right\|^{2} \iff \left\|\tilde{\mathbf{b}}_{2}\right\|^{2} \geq \left(1-\mu_{21}^{2}\right)\left\|\tilde{\mathbf{b}}_{1}\right\|^{2}\]

where there are no cross terms in the square because \tilde{\mathbf{b}}_{1} and \tilde{\mathbf{b}}_{2} are orthogonal.

This equivalent size comparison in terms of GSO vectors instead of the original vectors will be useful for understanding LLL.

More Dimensions: Hermite’s Algorithm

How can we generalize Lagrange’s algorithm to multiple dimensions? We have two ideas: size reduction, and making sure that we size reduce using the vectors in the right order.

Hermite (in a letter to Jacobi) invented a generalization based on the idea of picking the smallest vector first, and then recursing in the subspace othogonal to that vector.

  • Let \left(\mathbf{b}_{1}, \ldots, \mathbf{b}_{d}\right) be a basis of a lattice L.
  • Ensure that \mathbf{b}_{1} has minimal norm among all \mathbf{b}_{i} ‘s: otherwise, swap \mathbf{b}_{1} with the shortest basis vector \mathbf{b}_{i}.
  • Apply recursively the algorithm to the projected basis \left(\mathbf{b}_{2}^{\prime}, \ldots, \mathbf{b}_{d}^{\prime}\right), in such a way that all the basis vectors \mathbf{b}_{2}, \ldots, \mathbf{b}_{d} are size-reduced with respect to \mathbf{b}_{1} : \left|\mu_{i, 1}\right| \leq 1 / 2 for all i \geq 2.
  • If \mathbf{b}_{1} has minimal norm among all \mathbf{b}_{i} ‘s, the algorithm terminates. Otherwise, swap \mathbf{b}_{1} with the shortest basis vector \mathbf{b}_{i}, and restart from the beginning.

What does this look like? Suppose we have reduced i -1 vectors. Remember that (written in the normalized original GSO basis) the vectors look like

    \[\left(\begin{array}{cccc} \tilde{b}_1 & \mu_{21}\tilde{b}_1 & \cdots & \mu_{n1}\tilde{b}_1 \\ 0 & \tilde{b}_2 & \cdots & \mu_{n2}\tilde{b}_2 \\ \vdots & & \ddots & \vdots \\ 0& 0& & \mu_{n,n-1}\tilde{b}_{n-1}  \\ 0 & 0 & \cdots & \tilde{b_n} \end{array}\right)\]

If we look at the projection left after we drop coordinates corresponding to the i-1 vectors we’ve already sorted, what’s left is

    \[\left(\begin{array}{cccc} \tilde{b}_i & \mu_{i+1,i}\tilde{b}_i & \cdots & \mu_{n,i}\tilde{b}_i \\ 0 & \tilde{b}_{i+1} & \cdots & \mu_{n,+1}\tilde{b}_{i+1}\\ \vdots & & \ddots & \vdots \\ 0& 0& & \mu_{n,n-1}\tilde{b}_{n-1}  \\ 0 & 0 & \cdots & \tilde{b_n} \end{array}\right)\]

We need to pick the smallest of these. We could also be lazy and pick between the smaller of the first two options (\mathbf{b}'_i and \mathbf{b}'_{i+1}) since there’s less coefficients to work with. This is still fine as long as we then backtrack and compare \mathbf{b}_{i-1} to the new \mathbf{b}_i.

If we do this, we want to ensure b'_{i+1}  \geq b'_i, which means (after squaring both sides)

    \[\left\|\mathbf{b}_{i+1}'\right\|^2 \geq\left\|\mathbf{b}_{i}'\right\|^2 \iff \left\|\tilde{b}_{i+1}+ \mu_{i+1,i} \tilde{b}_{i}\right\|^{2} \geq \left\|\tilde{b}_{i}\right\|^{2}\]

    \[\iff \left\|\tilde{b}_{i+1}\right\|^{2} + \mu_{i+1,i}^{2} \left\|\tilde{b}_{i}\right\|^{2} \geq \left\|\tilde{b}_{i}\right\|^{2} \iff \left\|\tilde{b}_{i+1}\right\|^{2} \geq \left(1-\mu_{i+1,i}^{2}\right)\left\|\tilde{b}_{i}\right\|^{2}\]

exactly like in the 2D case.

Pseudocode for Hermite’s Algorithm

Note that the inner product of a vector with itself defines the square of its norm: \langle \mathbf{v}, \mathbf{v} \rangle = \|v\|^2.

INPUT
    a lattice basis b1, b2, ..., bn in Z^m
PROCEDURE
    B* <- GramSchmidt({b1, ..., bn}) = {b1*, ..., bn*};  and do not normalize
    μi,j <- InnerProduct(bi, bj*)/InnerProduct(bj*, bj*);   using the most current values of bi and bj*
    k <- 2;
    while k <= n do 
        (SIZE REDUCTION STEP)
        for j from k−1 to 1 do
            if |μk,j| > 1/2 then
                bk <- bk − ⌊μk,j⌉bj;
               Update B* and the related μi,j's as needed.
            end if
        end for
        (NOW CHECK SHORTNESS)
        if InnerProduct(bk*, bk*) >= (1 − (μ_k,k−1)^2) InnerProduct(bk−1*, bk−1*) then
            k <- k + 1;
        else
            Swap bk and  bk−1;
            Update B* and the related μi,j's as needed.
            k <- max(k−1, 2);
        end if
    end while
    return B the Hermite reduced basis of {b1, ..., bn}
OUTPUT
    the reduced basis b1, b2, ..., bn in Zm

Minimizing a Loss Function

Can we frame the idea of putting the vectors with shortest GSO-version first in a numerical way which can be optimized? One idea is to define a loss function that measures basis quality by multiplying the norms of the GSO-vectors together, but using more copies of the earlier vectors, which weights them more. Then we try to minimize this loss.

Formally, let \phi(\mathbf{B})=\prod_{i} \phi_{i}(\mathbf{B}) where

    \[\phi_{i}(\mathbf{B})=\left|\operatorname{det}\left(\mathcal{L}\left(\mathbf{b}_{1}, \mathbf{b}_{2} \cdots \mathbf{b}_{i}\right)\right)\right|\]

Where the determinant for non full rank matrices is defined as \operatorname{det}(\mathbf{B})=\operatorname{det}\left(\sqrt{\mathbf{B}^{\top} \mathbf{B}}\right).

We can also write

    \[\phi_{i}(\mathbf{B})=\left\|\tilde{\mathbf{b}}_{1}\right\| \cdot\left\|\tilde{\mathbf{b}}_{2}\right\| \cdots\left\|\tilde{\mathbf{b}}_{i}\right\|,\]

because the determinant is not changed by orthogonalizing vectors (its the product of their othogonal parts that matters).

For 3 vectors \mathbf{b}_1,\mathbf{b}_2,\mathbf{b}_3 this function would be

    \[\sqrt(\operatorname{det} [\mathbf{b}_1^\top \mathbf{b}_1] \times \operatorname{det} [\mathbf{b}_1,\mathbf{b}_2]^\top [\mathbf{b}_1,\mathbf{b}_2] \times \operatorname{det} [\mathbf{b}_1,\mathbf{b}_2,\mathbf{b}_3]^\top [\mathbf{b}_1,\mathbf{b}_2,\mathbf{b}_3])\]

which weights the vectors \tilde{\mathbf{b}_1},\tilde{\mathbf{b}_2},\tilde{\mathbf{b}_3} created by orthogonalizing in that order with multiplicative weights 3:2:1.

Let’s try it on our old 2D vectors:

In [14]:
def phi2vectors(v1,v2):
    phi1 = (matrix([v1]) * matrix([v1]).transpose()).determinant()
    phi2 = matrix([v1,v2]).determinant()
    return (phi1 * phi2).abs()

b1 = vector([4,4])
b2 = vector([-3,-1])
print(phi2vectors(b1,b2))
print(phi2vectors(b2,b1))
256
80

Clearly swapping them is good for the quality of the reduced basis.

Termination of Hermite’s Algorithm

We can use this basis quality loss function to prove that Hermite’s algorithm terminates. The idea is simple. The only place the algorithm could loop forever is if we keep on incrementing and decrementing k (in the pseudocode above). But we only ever decrease k when we swap \mathbf{b}_k and \mathbf{b}_{k-1} because \mathbf{b}_k was actually smaller (in the space orthogonal to the first k-1 vectors). But every time we do this, \phi_{k}(\mathbf{B})=\left\|\tilde{\mathbf{b}}_{1}\right\| \cdot\left\|\tilde{\mathbf{b}}_{2}\right\| \cdots\left\|\tilde{\mathbf{b}}_{k}\right\| decreases (because the last vector became smaller)!

So \phi(\mathbf{B}) starts out as some number. Every time we swap and loop again it decreases. But it’s the product of absolute values of determinants of integer matrices, which means it’s a positive integer! So it can’t decrease forever.

Complexity

Great. We proved the algorithm terminates. But how long does it take? Surprisingly, WE DON’T KNOW. We know it’s in polynomial time upto dimension 4, but beyond that the complexity of this algorithm is actually an open problem!

The trouble is that our termination argument said “the loss decreases every swap”… but we don’t know how much it decreases.

The LLL Algorithm

Lenstra, Lenstra and Lovasz’s great insight was (as it always is) to ask the right question. That question was: “What if we change our goal? What if we accept a worse quality output in exchange for a faster algorithm?”

Remember that the reason we wanted the GSO vectors in order of size was because reducing with a much bigger vector first gave bad results. But what if we were slightly more chill about this?

What if, instead of saying “We want the first vector smaller than the next.”, we said “We just don’t want the first vector to be too much bigger than the next.”

Let’s introduce a parameter \delta (traditionally in the range [1/4, 1)).

Now instead of demanding b'_{i+1}  \geq b'_i, we ask for b'_{i+1}  \geq \delta b'_i. For example, if \delta = 3/4, we’d be fine if b'_{i+1} = 3/4 b'_i, so the first vector isn’t shorter, but at least it’s no more than 4/3 times the size of the next vector.

After squaring both sides

    \[\left\|\mathbf{b}_{i+1}'\right\|^2 \geq \delta \left\|\mathbf{b}_{i}'\right\|^2 \iff \left\|\tilde{b}_{i+1}+ \mu_{i+1,i} \tilde{b}_{i}\right\|^{2} \geq \delta \left\|\tilde{b}_{i}\right\|^{2}\]

    \[\iff \left\|\tilde{b}_{i+1}\right\|^{2} + \mu_{i+1,i}^{2} \left\|\tilde{b}_{i}\right\|^{2} \geq \delta \left\|\tilde{b}_{i}\right\|^{2} \iff \left\|\tilde{b}_{i+1}\right\|^{2} \geq \left(\delta -\mu_{i+1,i}^{2}\right)\left\|\tilde{b}_{i}\right\|^{2}\]

So the condition we need is \left\|\tilde{b}_{i+1}\right\|^{2} \geq \left(\delta -\mu_{i+1,i}^{2}\right)\left\|\tilde{b}_{i}\right\|^{2} … and that’s exactly what the LLL algorithm is, just Hermite’s algorithm but with a weaker condition on the GSO vector size ordering.

Pseudocode for LLL Algorithm

INPUT
    a lattice basis b1, b2, ..., bn in Zm
    a parameter δ with 1/4 < δ < 1, most commonly δ = 3/4
PROCEDURE
    B* <- GramSchmidt({b1, ..., bn}) = {b1*, ..., bn*};  and do not normalize
    μi,j <- InnerProduct(bi, bj*)/InnerProduct(bj*, bj*);   using the most current values of bi and bj*
    k <- 2;
    while k <= n do
        for j from k−1 to 1 do
            if |μk,j| > 1/2 then
                bk <- bk − ⌊μk,j⌉bj;
               Update B* and the related μi,j's as needed.
               (The naive method is to recompute B* whenever bi changes:
                B* <- GramSchmidt({b1, ..., bn}) = {b1*, ..., bn*})
            end if
        end for
        if InnerProduct(bk*, bk*) > (δ − μ2k,k−1) InnerProduct(bk−1*, bk−1*) then
            k <- k + 1;
        else
            Swap bk and  bk−1;
            Update B* and the related μi,j's as needed.
            k <- max(k−1, 2);
        end if
    end while
    return B the LLL reduced basis of {b1, ..., bn}
OUTPUT
    the reduced basis b1, b2, ..., bn in Zm

Termination and Complexity of LLL

We use exactly the same loss function argument as before. Let’s do it a bit more rigorously.

The main difference from Hermite is that we don’t swap and backtrack so often, we only swap if it’s worth it – we get a big reduction in loss – otherwise we just go to the next dimension or terminate.

The basis quality loss function is \phi(\mathbf{B})=\prod_{i} \phi_{i}(\mathbf{B}) where

    \[\phi_{i}(\mathbf{B})=\left|\operatorname{det}\left(\mathcal{L}\left(\mathbf{b}_{1}, \mathbf{b}_{2} \cdots \mathbf{b}_{i}\right)\right)\right| = \left\|\tilde{\mathbf{b}}_{1}\right\| \cdot\left\|\tilde{\mathbf{b}}_{2}\right\| \cdots\left\|\tilde{\mathbf{b}}_{i}\right\|\]

Observation 1. \phi(\mathbf{B}) is not too large to begin with

    \[\phi\left(\mathbf{B}_{\text {init }}\right) \leq \prod_{i=1}^{n}\left\|\mathbf{b}_{1}\right\| \cdot\left\|\mathbf{b}_{2}\right\| \cdots\left\|\mathbf{b}_{i}\right\| \leq \max _{i}\left(\left\|\mathbf{b}_{i}\right\|\right)^{O\left(n^{2}\right)}\]

So, \log \left(\phi\left(\mathbf{B}_{\text {init }}\right)\right)=\operatorname{poly}(n, W) where W is the bit length of the vectors. We also know \phi(\mathbf{B}) \geq 1 because we’re dealing with integer lattices and each piece of the loss \phi_{i} can be interpreted as the i-dimensional volume enclosed by the vectors \mathbf{b}_{1}, \mathbf{b}_{2} \cdots \mathbf{b}_{i}.

Observation 2. The reduction step does not change the loss. This is clear by looking at \phi_{i}(\mathbf{B})=\left\|\tilde{\mathbf{b}}_{1}\right\|\left\|\tilde{\mathbf{b}}_{2}\right\| \cdots\left\|\tilde{\mathbf{b}}_{i}\right\| and observing that the reduction step leaves \tilde{\mathbf{b}}_{i} ‘s unchanged.

Observation 3. The swap step reduces \phi by a constant factor. Proof. Let us say that \mathbf{b}_{i} and \mathbf{b}_{i+1} were swapped. This only affects the value of \phi_{i}(\mathbf{B}) because changing order of vectors does not affect the determinant.

The old value of \phi_{i} is, \phi_{i}^{\text {old }}=\left\|\tilde{\mathbf{b}}_{1}\right\| \cdot\left\|\tilde{\mathbf{b}}_{2}\right\| \cdots\left\|\tilde{\mathbf{b}}_{i}\right\| while the new value is \phi_{i}^{\text {new }}=\left\|\tilde{\mathbf{b}}_{1}\right\| \cdot\left\|\tilde{\mathbf{b}}_{2}\right\| \ldots\left\|\tilde{\mathbf{b}}_{i-1}\right\|. \left\|\tilde{\mathbf{b}_{i+1}}\right\|. We see that the component of \mathbf{b}_{i+1} orthogonal to the span of \mathbf{b}_{1}, \mathbf{b}_{2} \cdots \mathbf{b}_{i-1} is \tilde{\mathbf{b}}_{i+1}+\mu_{i+1, i} \tilde{\mathbf{b}}_{i} (written in the old GSO basis created using \mathbf{b}_{i} first). So,

    \[\frac{\phi^{\text {new }}}{\phi^{\text {old }}}=\frac{\phi_{i}^{\text {new }}}{\phi_{i}^{\text {old }}}=\frac{\left\|\tilde{\mathbf{b}}_{i+1}+\mu_{i+1, i} \tilde{\mathbf{b}}_{i}\right\|}{\left\|\tilde{\mathbf{b}}_{i}\right\|}<\sqrt{\delta},\]

because \left\|\tilde{b}_{i+1}+ \mu_{i+1,i} \tilde{b}_{i}\right\|^{2} < \delta \left\|\tilde{b}_{i}\right\|^{2} is our condition for swapping.

So as long as \delta<1 is a fixed constant, we know that the loss decreases by a constant factor at every swap.

Combining the observations we see that the algorithm has to terminate in time poly (n, W).

Conclusion

We have built up LLL step by step, from Gram-Schmidt to Lagrange to Hermite to Lovasz’s relaxation. And now we’re done! But there’s still one final mystery waiting for us.

In theory, the downside of LLL is that (in the worst case) the basis quality can be poorer than what you get from Hermite. But in practice the basis quality is much better than the worst case bound… and no one knows why!

Maybe you’ll find out. Good luck 🙂

Credits

I extracted various ideas and text from:

AN LLL ALGORITHM WITH QUADRATIC COMPLEXITY

Mathematics of Public Key Cryptography Chapter 17

Oded Regev’s Notes

Vinod Vaikuntanathan’s Notes

Why is the Lovasz condition used in the LLL Algorithm? (cryptography stackexchange)

Building Lattice Reduction (LLL) Intuition (the post that originally nerd-sniped me into thinking about LLL)

Leave a Reply

Your email address will not be published. Required fields are marked *