Detailed explanation of the principle of barrel reduction and hardware optimization

background introduction

        The reduction algorithm is usually used in the field of hardware, because the modulus operation mod is a division operation, which is much slower than multiplication in hardware and takes up a lot of resources. Therefore, it is necessary to find a way to replace the modulo operation with multiplication and other simple operations. The modular reduction algorithm can use operations such as multiplication, addition, and shift to realize the modulus of large numbers, avoiding the division in the modular operation. Common methods include Montgomery modular reduction, barret modular reduction, etc. This article introduces the principle of the barret modular reduction algorithm.

barret reduction

        Reduction is to use simple operations to avoid division operations, so as to facilitate hardware implementation. Taking A mod q as an example, if you want to calculate the result of A modulo q, how should you use the barret reduction algorithm?

        If A mod q is stipulated first, then A is called the modulus and q is the base.

       Assuming that the bit width of A is w_{1}, and the bit width of q is , w_{2}two constants need to be pre-calculated for hardware implementation :

\begin{cases} & \ q_1=\frac{A}{2^{w_{2}}} \\ & \ H=\frac{2^{w_{1}+1}}{q} \end{cases}

        \small q_1When \small Hperforming pre-computation, it is necessary to round off the calculation results , and then \small q_1the \small Hsum satisfies the following inequality:

\begin{cases} & \ \ \ \frac{A}{2^{w_{2}}}-1 <q_1\leqslant \frac{A}{2^{w_{2}}} \\ & \ \frac{2^{w_{1}+1}}{q}-1<H\leqslant \frac{2^{w_{1}+1}}{q} \end{cases}

        Then \small q_2 =q_1\times H, the following inequalities hold:

\small q_2=\frac{A}{2^{w_{2}}} \times\frac{2^{w_{1}+1}}{q}

\small \frac{2^{w_{1}-w_{2}+1}A}{q}-\frac{A}{2^{w_{2}}}-\frac{2^{w_{1}+1}}{q}+1<q_2\leqslant \frac{2^{w_{1}-w_{2}+1}A}{q}

        Order \small q_3=q_2 / 2^{w_{1}-w_{2}+1, that is, for the above \small q_2inequality, divide both sides by at the same time\small 2^{w_{1}-w_{2}+1 , get:

\small \frac{A}{q}-\frac{A}{2^{w_{1}}+1}-\frac{2^{w_{2}}}{q}+\frac{1}{2^{w_{1}-w_{2}+1}}<q_3\leqslant \frac{A}{q}

        Since the bit width of A is w_{1}, and the bit width of q is w_{2}, A and q satisfy the following inequality:

\begin{cases} & \frac{A}{2^{w_{1}+1}} \leqslant1 \\ & \ \ \frac{2^{w_2}}{q} \leqslant2 \end{cases}

        Substituting the inequality satisfied by A and q into q_3the inequality, we get:

\small \frac{A}{q}-3<q_3\leqslant \frac{A}{q}

        So multiplying both sides by q gives:

A-3q<q_3\times q\leqslant A

        Therefore, the modulo operation can be simplified as:

A\ mod\ q=(A-q_{3}\times q)\ mod\ q

        And because A-q_{3}\times qit is between A-3q and A, it takes the modulus of q, and only needs to judge which interval it is in [0,q), [q,2q), [2q,3q) , if it A-q_{3}\times qfalls in the [q,2q) interval, then:

(A-q_{3}\times q)\ mod\ q=A-q_{3}\times q-q

         Above, the barret modular reduction is completed. Similarly, the modular reduction algorithm can be applied in the field of modular multiplication, that is, to realize barret modular multiplication. Compared with modular multiplication, AB mod q, you can directly regard the product of AB as A derived from the above formula, and then perform modular multiplication.

The barret modular reduction calculation process is roughly shown in the following figure:

hardware implementation

        After reading the derivation process of the modulus reduction formula, some people will definitely have questions:

\begin{cases} & \ q_1=\frac{A}{2^{w_{2}}} \\ & \ H=\frac{2^{w_{1}+1}}{q} \end{cases}

        Two constants were pre-calculated before, and all the reduction derivations I followed depended on these two constants. Let’s look at H first. In order to constrain the polynomial coefficients within the range of the base, and then realize some homomorphic encryption algorithms in the field of cryptography, the selected base q is usually a fixed value, so the calculation amount of H is very small and can be directly pre-calculated and stored in RAM . Even if the value range of A is 1-200bit, I only need to pre-calculate 200 H values ​​at most when the base q is determined.

        H is easy to calculate when the base q is determined, but A is an input variable, and there are any possibilities, so q_1how to pre-calculate?

        In fact, q_1there is no need for precomputation , because q_1it is A divided by a power of 2. In hardware, division by a power of 2 can be realized by a shift operation. As for the q_1calculation, the result needs to be rounded down, and only A needs to be shifted. For example

7/4=7>>2=3'b111 >>2=3'b001=1

downfloor(7/4) = downfloor(1.75)=2

        q_1The calculation rounds the result down, which can be directly replaced by A shift.

        To sum up, we can easily get the \small q_1value of and the value of , and it does not consume much computing resources, and there is not much calculation delay, and the subsequent calculation is also divided by the power of 2, which can also be converted into a shift operation. Therefore, the main calculation amount of barret modulus reduction lies in:\small H\small q_3

\small \begin{cases} &q_2=q_1\times H=\frac{A}{2^{w_{2}}} \times\frac{2^{w_{1}+1}}{q} \\ & A-q_3\times q \end{cases}

        The main amount of calculation lies in the calculation of the above two multiplications, q2 = q1*H, and q3*q.

hardware optimization

        It has been deduced before that the main calculation of barret modular reduction is the calculation of two multiplications, q2 = q1*H, and q3*q.

        For hardware implementation, the second calculation can be optimized, because after A-q3*q, its range needs to be judged. If it falls in the [q, 2q) range, then A mod q = A-q3*qq. In fact, we care about the range it falls in, and we don’t need to compare all the bits. The bit width of q is . Compare to determine the range\small w_2 .\small w_2+2\small w_2\small w_2+2

        Therefore, in hardware implementation, using barret modular reduction, the division is successfully simplified into two multiplications and one (two) additions.

        

Guess you like

Origin blog.csdn.net/qq_57502075/article/details/130052118