background introduction
The reduction algorithm is usually used in the field of hardware, because the modulus operation mod is a division operation, which is much slower than multiplication in hardware and takes up a lot of resources. Therefore, it is necessary to find a way to replace the modulo operation with multiplication and other simple operations. The modular reduction algorithm can use operations such as multiplication, addition, and shift to realize the modulus of large numbers, avoiding the division in the modular operation. Common methods include Montgomery modular reduction, barret modular reduction, etc. This article introduces the principle of the barret modular reduction algorithm.
barret reduction
Reduction is to use simple operations to avoid division operations, so as to facilitate hardware implementation. Taking A mod q as an example, if you want to calculate the result of A modulo q, how should you use the barret reduction algorithm?
If A mod q is stipulated first, then A is called the modulus and q is the base.
Assuming that the bit width of A is , and the bit width of q is , two constants need to be pre-calculated for hardware implementation :
When performing pre-computation, it is necessary to round off the calculation results , and then the sum satisfies the following inequality:
Then , the following inequalities hold:
Order , that is, for the above inequality, divide both sides by at the same time , get:
Since the bit width of A is , and the bit width of q is , A and q satisfy the following inequality:
Substituting the inequality satisfied by A and q into the inequality, we get:
So multiplying both sides by q gives:
Therefore, the modulo operation can be simplified as:
And because it is between A-3q and A, it takes the modulus of q, and only needs to judge which interval it is in [0,q), [q,2q), [2q,3q) , if it falls in the [q,2q) interval, then:
Above, the barret modular reduction is completed. Similarly, the modular reduction algorithm can be applied in the field of modular multiplication, that is, to realize barret modular multiplication. Compared with modular multiplication, AB mod q, you can directly regard the product of AB as A derived from the above formula, and then perform modular multiplication.
The barret modular reduction calculation process is roughly shown in the following figure:
hardware implementation
After reading the derivation process of the modulus reduction formula, some people will definitely have questions:
Two constants were pre-calculated before, and all the reduction derivations I followed depended on these two constants. Let’s look at H first. In order to constrain the polynomial coefficients within the range of the base, and then realize some homomorphic encryption algorithms in the field of cryptography, the selected base q is usually a fixed value, so the calculation amount of H is very small and can be directly pre-calculated and stored in RAM . Even if the value range of A is 1-200bit, I only need to pre-calculate 200 H values at most when the base q is determined.
H is easy to calculate when the base q is determined, but A is an input variable, and there are any possibilities, so how to pre-calculate?
In fact, there is no need for precomputation , because it is A divided by a power of 2. In hardware, division by a power of 2 can be realized by a shift operation. As for the calculation, the result needs to be rounded down, and only A needs to be shifted. For example
The calculation rounds the result down, which can be directly replaced by A shift.
To sum up, we can easily get the value of and the value of , and it does not consume much computing resources, and there is not much calculation delay, and the subsequent calculation is also divided by the power of 2, which can also be converted into a shift operation. Therefore, the main calculation amount of barret modulus reduction lies in:
The main amount of calculation lies in the calculation of the above two multiplications, q2 = q1*H, and q3*q.
hardware optimization
It has been deduced before that the main calculation of barret modular reduction is the calculation of two multiplications, q2 = q1*H, and q3*q.
For hardware implementation, the second calculation can be optimized, because after A-q3*q, its range needs to be judged. If it falls in the [q, 2q) range, then A mod q = A-q3*qq. In fact, we care about the range it falls in, and we don’t need to compare all the bits. The bit width of q is . Compare to determine the range .
Therefore, in hardware implementation, using barret modular reduction, the division is successfully simplified into two multiplications and one (two) additions.