32비트 부동소수점 곱셈 구현

부동소수점(32bit) 곱셈을 베릴로그로 구현해봤다.

부동소수점에 대한 이해를 마친 사람들을 위한 포스트다.

부동소수점의 설명 아래의 링크로 대체한다. 한글 버전도 있다.

https://en.wikipedia.org/wiki/Floating-point_arithmetic

Floating-point arithmetic - Wikipedia

From Wikipedia, the free encyclopedia Computer approximation for real numbers An early electromechanical programmable computer, the Z3, included floating-point arithmetic (replica on display at Deutsches Museum in Munich). In computing, floating-point arit

en.wikipedia.org

베릴로그는 기본적으로 fixed point로 값을 표현한다.

따라서 float multiplication을 위해서는 float의 형식에 맞게 연산을 해줘야한다.

Single precision의 특수 케이스는 exponent에 있기 때문에, Out of Bound 케이스의 경우 exponent를 특수한 값(255, 0)으로 바꾸고 Mantissa의 경우 0으로 바꾸면 된다. 어차피 무한대 혹은 0이니까 합당하다.

실제로 C에서도 float의 무한대 값의 16진수 표현은 0x7F800000이다.

베릴로그 코드를 살펴보자.

module float_mul
#(parameter bias = 8'h7f)
(
    output [31:0] out_float,
    input [31:0] inA_float, inB_float
);                                          // single precision multiplier

/*-----Variables' name Description-----
sign: 0 : +, 1 : -
expA/B/Out: exponent, bias with 127
expSum: underflow == [9], overflow == ([9:8] == 2'b01)
mantA/B: mantissa 1.xxxx
mantOut: mantissa (1).xxxx
mantAxB: 48bit (24x24 result) raw multiply result
inf: infinity, exp = 255 (8'hff)
zero: zero, exp = 0 (8'h00)
b47: mantAxB[47]
Under/OverFlag: Underflow/Overflow Flag
*/

parameter Zero = 8'b0;
parameter Inf = 8'hff;

wire signA, signB, signOut;
wire [7:0] expA, expB, expOut;
wire [9:0] expSum;
wire [23:0] mantA, mantB;
wire [22:0] mantOut, mantOut_bef_special;
wire infA, infB, zeroA, zeroB, b47;
wire [48-1:0] mantAxB;
wire UnderFlag, OverFlag;

assign signA = inA_float[31];
assign expA  = inA_float[30 -: 8];
assign mantA = {1'b1, inA_float[0 +: 23]};  // real mant Value

assign signB = inB_float[31];
assign expB  = inB_float[30 -: 8];
assign mantB = {1'b1, inB_float[0 +: 23]};

float_check CheckA(infA, zeroA, expA);
float_check CheckB(infB, zeroB, expB);

assign mantAxB = mantA * mantB;
assign b47 = mantAxB[47];

assign signOut = signA ^ signB;
assign expSum  = (zeroA | zeroB) ? Zero : ((infA | infB)? Inf : (expA + expB - bias + b47));
assign UnderFlag = expSum[9] || &(~expSum[7:0]);
assign OverFlag  = (expSum[9:8] == 2'b01) || (&expSum[7:0]);
assign expOut  = UnderFlag ? Zero : ((OverFlag) ? Inf : expSum[7:0]);

assign mantOut_bef_special = b47 ? (mantAxB[46 -: 23] + mantAxB[22]) : (mantAxB[45 -: 23] + mantAxB[21]);
assign mantOut = (UnderFlag || OverFlag) ? 0 : mantOut_bef_special;

assign out_float = {signOut, expOut, mantOut};

endmodule

우선 주석에 변수에 대한 설명을 적어뒀다.

sign: 0이 양수, 1이 음수
expA/B/Out: 지수부, 127 바이어스가 더해짐.
expSum: underflow == [9], overflow == ([9:8] == 2'b01)
mantA/B: 가수부 1.xxxx
mantOut: 가수부 (1).xxxx
mantAxB: 48bit (24x24 result) 가수부 곱셈 결과
inf: 무한대, exp = 255 (8'hff)
zero: 0, exp = 0 (8'h00)
b47: mantAxB[47]
Under/OverFlag: Underflow/Overflow 플래그

각 32비트 float 값을 3개로 나누었고, 각자 연산을 수행한다.

float_check는 이미 무한대거나 0인 경우를 감지하기 위해 사용된다.

expSum의 경우 지수부의 합에 바이어스를 뺀 다음 b47을 더했다.

b47은 곱셈으로 인해 가수부가 47비트가 될 경우에는 지수부를 1 증가시켜야 하기 때문에 사용된다.

mantOut의 경우 가수부 곱셈에서 사라지는 유효숫자 비트 중 MSB를 반올림하게 했다.

언더/오버플로우가 발생할 경우는 0으로 처리했다.

테스트벤치는 아래와 같다.

module tb_float_mul();

reg [31:0] A_vector [0:99];
reg [31:0] B_vector [0:99];
reg [31:0] Output [0:99];
reg [31:0] A, B;
wire [31:0] Out;
wire [31:0] Error, AbsErr;

integer i, err;

float_mul TEST(Out, A, B);

assign Error = Output[i] - Out;
assign AbsErr = (Error[31]) ? -Error : Error;

initial begin
    $readmemh("input_a.txt", A_vector);
    $readmemh("input_b.txt", B_vector);
    $readmemh("output.txt",  Output);
    err = 0;
    for (i=0;i<100;i=i+1) begin
        A <= A_vector[i];
        B <= B_vector[i];
        // if (AbsErr[3:0] != 0) begin
        if (AbsErr[3:0] > 1) begin
            err = err + 1;
        end
        #(10);
    end
    $stop();
end

endmodule

AbsErr는 오차의 절대값을 나타낸 것으로, -1, 0, 1 중에 하나의 값이 나오는 것을 확인할 수 있었다.

A, B, Out 벡터 파일은 C++로 생성했으며 코드는 아래와 같다.

#include <iostream>
#include <string>
#include <bitset>
#include <sstream>
#include <random>
#include <limits>

using namespace std;

typedef union {
    float input; // assumes sizeof(float) == sizeof(int)
    int   output;
} DataFloat2Int;

int main() {
    ios::sync_with_stdio(0); // false
    cin.tie(0); // NULL
    cout.tie(NULL); // 0
    srand(static_cast<unsigned int>(std::time(0)));

    string STR1, STR2;
    DataFloat2Int data1, data2, data3;
    
    for (int i = 0; i < 100; i++) {
        data1.input = (rand() << 17 | rand() << 2 | rand() >> 13);
        data2.input = (rand() << 17 | rand() << 2 | rand() >> 13);
        data3.input = data1.input * data2.input;

        bitset<32> BS1(data1.output);
        bitset<32> BS2(data2.output);
        bitset<32> BS3(data3.output);

        unsigned data1Hex = BS1.to_ulong();
        unsigned data2Hex = BS2.to_ulong();
        unsigned data3Hex = BS3.to_ulong();

        stringstream ss1, ss2, ss3;
        ss1 << hex << data1Hex;
        ss2 << hex << data2Hex;
        ss3 << hex << data3Hex;

        // cout << data1.input << " " << data2.input << " " << data3.input << "\n";
        cout << ss1.str() << " " << ss2.str() << " " << ss3.str() << "\n";
    }

    DataFloat2Int A, B, C;
    A.input = numeric_limits<float>::max();
    B.input = numeric_limits<float>::max();
    C.input = A.input * B.input;

    bitset<32> BS4(C.output);
    unsigned data4Hex = BS4.to_ulong();
    stringstream ss4;
    ss4 << hex << data4Hex;

    cout << ss4.str() << "\n";
    return 0;
}

출력 결과를 복붙해서 벡터 파일을 만들었다.

다 복사해서 나누면 된다. VSCode에서는 여러 줄을 동시에 edit할 수 있어서 몇번만 손대면 벡터 파일을 만들 수 있다.

마지막에 오버/언더플로우가 생길만한 값을 넣으면 된다.

특수 케이스를 위한 값은 아래와 같다.

A:
02000000
7d000002
00800000
7d000000

3d800000
42000000
00800000
7d000000

Output:

00000000
7f800000
00000000
7f800000

모델심 테스트 파형:

테스트를 해보면 가수부 값이 1(1 혹은 -1)의 차이가 생기거나 완전히 일치한다.

보통 절반은 C와 완전히 일치하고, 절반은 1씩 차이난다.

왜 C와 다른지는 모르겠다.

다음에는 덧셈을 구현해볼 생각이다.

Ps. 24비트 가수부 값에서 미미한 차이를 보이는 이유를 아는 사람이 있으면 제보 부탁드립니다.

'Verilog HDL 설계' 카테고리의 다른 글

generate와 반복문 (0)	2023.03.27
Tree Multiplier(16bit Dadda Multiplier) (0)	2023.02.24
4bit/16bit Carry Lookahead Adder 설계 (0)	2023.02.22
Icarus Verilog 사용법 및 유용한 팁 (0)	2022.09.12
Single Cycle RISC-V 32I 프로세서 설계 (6)	2022.07.14

Verilog HDL 설계 길잡이

32비트 부동소수점 곱셈 구현

'Verilog HDL 설계' 카테고리의 다른 글

티스토리툴바

32비트 부동소수점 곱셈 구현

'Verilog HDL 설계' 카테고리의 다른 글

'Verilog HDL 설계' Related Articles

티스토리툴바