As you know, the processor does not provide long number mathematics. At the moment, processors work with 64-bit numbers. In the C language, this corresponds to the long long type. In cryptography, it is necessary to work with numbers of much greater length. There are two ways to solve this problem: use existing third-party libraries, or write everything yourself. The Extra Systems company in this product (as we usually do) has chosen the second option. In the Extra Systems Cypher Net encryption system, only the code written by us works.
To process long numbers, Extra Systems Cypher Net uses unsigned char arrays of a length large enough to hold numbers of the required size. There are two storage formats: regular and packed. The packed format uses all the bits, while the regular format uses only half. Thus, in the regular format, each byte contains one hexadecimal digit - four bits. The length of packed numbers is determined by the constant PACKED_VALUE_LENTH. If we are working with 1024-bit keys, then the math must be 2048-bit (since the product can contain twice as many digits as the factors). A byte contains 8 bits, so the value of the PACKED_VALUE_LENTH parameter is 256 (2048 divided by 8). And VALUE_LENTH will be twice as large, that is, 512.
Packing and unpacking of long numbers is performed by the pack_value and unpack_value procedures:
void pack_value(unsigned char *value, unsigned char *packed_value) { unsigned char x, y; unsigned int i = 0, j = 0; memset(packed_value, 0, PACKED_VALUE_LENTH); while (i < PACKED_VALUE_LENTH) { x = value[j++]; y = value[j++]; packed_value[i++] = x | (y << 4); } } void unpack_value(unsigned char *packed_value, unsigned char *value) { unsigned char cur_char; unsigned int i = 0, j = 0; clear_value(value); while (i < PACKED_VALUE_LENTH) { cur_char = packed_value[i++]; value[j++] = cur_char & 0xF; value[j++] = cur_char >> 4; } }
We will also give an example of the procedure for subtracting long numbers:
void value_sub(signed char *operand1, signed char *operand2) { //operand1 -= operand2 int i; for (i = 0; i < VALUE_LENTH; i++) { *(operand1 + i) -= *(operand2 + i); if (*(operand1 + i) < 0) { *(operand1 + i) += RADIX; if ((i + 1) < VALUE_LENTH) *(operand1 + i + 1) -= 1; } } }
We do not present other procedures (multiplication, division, etc.) here, since they (as well as the subtraction procedure published above) have nothing to do with cryptography.
Of course, the algorithms given above are related to the implementation of the system without using assembler inserts. When using such inserts, the situation changes radically. In the assembler implementation, the packing and unpacking operations are reduced to the identical transformation:
void pack_value(void *value, void *packed_value) { memcpy(packed_value, value, VALUE_LENTH); } void unpack_value(void *packed_value, void *value) { memcpy(value, packed_value, VALUE_LENTH); }
And the constants VALUE_LENTH and PACKED_VALUE_LENTH in our implementation of long math in assembly language have the same value (256 for 1024-bit keys). This is due to the fact that assembler modules do not use any "rarefaction" of bits, and information is stored densely, using all bits.
We are currently using 64-bit code, so the unit of long number processing is an 8-byte object. In fact, our assembler arithmetic uses 64-bit registers as digits, so the "base" in our assembler is 2 to the power 64. As an example, here is our assembler source code for the long number increment routine (the operand address is passed here in the rsi register):
;====================================================== WORD_SIZE EQU 8 LONG_INT_LENTH EQU VALUE_LENTH / WORD_SIZE ;====================================================== SECTION .text ;====================================================== ii_inc_long_int: mov rcx,LONG_INT_LENTH mov rax,1 mov rdx,0 .m10: add [rsi+rdx*WORD_SIZE],rax jnc .fin inc rdx loop .m10 .fin: ret ;======================================================
Thus, in the assembler, in the case of 1024-bit RSA keys, numbers of only 32 (the LONG_INT_LENTH parameter) "signs" are processed (each of which is a 64-bit machine word for this architecture). This, of course, provides a significant performance boost compared to long math modules in the C language (which work with words of only 4 bits, and, unlike assembler, do not have hardware support for overflow control).
The content of this page is also available in French, German, Ukrainian and Russian.
© Extra Systems, 2024 |