mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2005-09-23, 07:38   #1
rdotson
 
rdotson's Avatar
 
Jul 2005

23·5 Posts
Question Number-theoretic FPGA function implementation suggestions?

Hi all,

I'm not sure where to post the following question. I'm open to suggestions and the moderators are welcome to simply move this post to the appropriate sub-forum. Thank you.

Q: Has anyone ever considered implementing portions of the FFT algorithm that I believe is (or used to be) included in Prime95 and were written in assembly language for speed - has anyone ever considered implementing them in hardware; ie, in Verilog HDL on an FPGA for example? I remember looking at the code and thinking it would be ideal for implementation on a closely coupled FPGA (perhaps one with a built-in PCI interface).

Since then, just for the fun of it I've learned enough Verilog to implement both Pollard Rho and ECM in Verilog HDL for 640 and 384 bit-width numbers respectively, and am awaiting delivery of a an Actel ProASIC3 development board so I can try them out (it will take two months to get the board!). Meanwhile, I'm open to suggestions for interesting mathematical algorithms that would be useful to have implemented at the hardware level in an FPGA.

Regards,

Ron
rdotson is offline   Reply With Quote
Old 2005-09-23, 11:47   #2
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

31·43 Posts
Default

I think you need to post charts comparing the times your ECM implementation in hardware takes against known implementations in software, for example GMP-ECM (which are cheaper since no additional hardware should be bought).

For example, fill in these blanks for 384-bit numbers:

Code:
          rdotson FPGA (??? MHz)    GMP-ECM (Pentium 4 2.4 GHz)

B1 = xxx       .... us                         .... us
B1 = yyy       .... us                         .... us
B1 = zzz       .... us                         .... us
where xxx, yyy, zzz are possible values of B1 in your implementation. You can also cite your values of B2.
alpertron is offline   Reply With Quote
Old 2005-09-23, 13:27   #3
rdotson
 
rdotson's Avatar
 
Jul 2005

23·5 Posts
Default

Quote:
Originally Posted by alpertron
I think you need to post charts...
Actually alpertron, I needn't do anything at all. Nor am I interested in rehashing designs that I've already completed and am waiting to get some hardware to implement them on so I can get some timing figures on realistic numbers like you're asking for. At present, both designs work fine in simulation mode with small numbers, but since the simulator runs thousands of times slower (or less) than a real FPGA, I cannot use it to get timing figures on large numbers. That will have to wait until I'm able to actually implement my designs on real hardware. The synthesis reports (area usage, propagation delays, etc) all look fine however so I don't anticipate any major problems.

I'll tell you what alpertron, if you'll provide some B1 numbers and timings for GMP-ECM, I'll provide them for my FPGA implementation as soon as I get it actually running on real hardware. At the moment I'm just looking for new design ideas.

Here's something for you to think about however alpertron: If GMP-ECM s/w were faster than my ECM implementation in hardware, I would simply get a copy of the GMP-ECM source code (I assume it's available somewhere), and convert THAT code into Verilog HDL for execution in hardware and I would be guaranteed to be at least 100 to 1000 times faster than the software version of GMP-ECM. It turns out that some designs (Pollard Rho and ECM are two examples) are ideally suited to FPGA implementation, whereas others would be essentially impossible because of huge storage requirements or other factors. People (computer programmers in general, and myself in particular when I was a computer programmer) tend to think of Verilog HDL as hardware design and think that it's difficult just because of the words "hardware design." In actual fact, Verilog is nothing more than a very primitive programming language with some very unique features (and notable LACK of some features) that any C/C++ programmer could pick up in a few days like I did. In fact the precedence rules, and most of the operators are the same in both C and Verilog. Yes, there are lot's of timing issues with FPGAs that you have to worry about that you needn't worry about with a conventional sequential computer program, but just because Verilog is extremely tedious to code doesn't mean that it's difficult to do. In fact there are only a handful of constructs in the language - far fewer than in most other languages.

In any case I'm not looking to get into some sort of chest beating contest, I'm just looking for ideas for numeric applications that would be suitable for FPGA implementation to keep me entertained while I'm awaiting my new development board. I may or may not use any ideas that may or may not be submitted. That is strictly my own option because I'm not getting paid for any of this.

Regards,

Ron
rdotson is offline   Reply With Quote
Old 2005-09-23, 13:51   #4
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

31·43 Posts
Default

Some timings for GMP-ECM 6 in a particular 384-bit number on a Pentium 4 2.4 GHz:

B1 = 11000, B2 = 1100000 -> step 1: 153 ms, step 2: 111 ms
B1 = 50000, B2 = 5000000 -> step 1: 700 ms, step 2: 350 ms
B1 = 250000, B2 = 25000000 -> step 1: 3560 ms, step 2: 960 ms
alpertron is offline   Reply With Quote
Old 2005-09-23, 23:56   #5
rdotson
 
rdotson's Avatar
 
Jul 2005

23·5 Posts
Default P.S.

By the way, lest anyone think I'm some kind of crackpot or troll, the thread that started this whole thing (my interest in Verilog HDL for factoring) can be found over on the Yahoo PrimeNumbers forum at:

http://groups.yahoo.com/group/primen.../message/16775

I've also posted a couple of followups since then at:

http://groups.yahoo.com/group/primen.../message/16896
and
http://groups.yahoo.com/group/primen.../message/17028

I am a retired computer software engineer with 35+ years of experience developing things like embedded systems for spacecraft and device drivers for prototype hardware and I can easily prove it to anyone who wishes to contact me privately. Now that I'm retired I have the time to do things that I want to do just for fun, instead of being paid to do things are aren't so interesting to me.

-- Ron
rdotson is offline   Reply With Quote
Old 2005-09-24, 01:04   #6
ColdFury
 
ColdFury's Avatar
 
Aug 2002

26×5 Posts
Default

A hardware siever for NFS or QS could probably get quite fast.
ColdFury is offline   Reply With Quote
Old 2005-09-24, 04:38   #7
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

348910 Posts
Default

Quote:
Originally Posted by ColdFury
A hardware siever for NFS or QS could probably get quite fast.
I'm not so sure. If some part of the memory needed by the algorithm could fit completely within the FPGA (either the factor base or the sieve interval) then it would really fly, but for interesting size problems I think that's impossible. You'd be IO limited, and a desktop microprocessor will cycle memory a lot faster than an FPGA can.

There were several papers in the late 90s on hardware QS, and several more recent papers that proposed ASIC solutions to NFS sieving for RSA-contest-size problems. The latter assume wafer scale integration containing gigs of DRAM.

My vote for something cool to implement is a number-theoretic FFT using modular math. The problem can be broken up into large chunks that can fit in an FPGA to run at super speed, and there's a lot of IO but the access pattern is pretty regular. This is a pretty big job though; at least you'd be able to run LL tests with it when you're done. Actually I think somebody has already implemented a first draft of this on a really big Xilinx.

jasonp
jasonp is offline   Reply With Quote
Old 2005-09-24, 05:13   #8
rdotson
 
rdotson's Avatar
 
Jul 2005

23·5 Posts
Default

Quote:
Originally Posted by ColdFury
A hardware siever for NFS or QS could probably get quite fast.
Hi ColdFury,

Yes that's true, but it would probably take an ASIC rather than an FPGA to do it and those things are out of the question because of cost. Even so, it would probably spend all of it's time waiting on data to be transferred to and from external media (probably some mix of RAM and Disk Space). Not to mention that mapping a direct interface from the FPGA to the disk controller (there is no operating system inside an FPGA) would probably be a great deal of work in itself.

I was thinking more in terms of some repetitive arithmetic operation that is perhaps part of NFS or QS or some other factoring program, but that takes place with localized data and is computationally intensive. Maybe a subroutine within the greater NFS and QS programs for example. I don't know enough about the way it's actually implemented in these huge factoring programs to even know where an FPGA might be appropriate, and that's mainly why I'm here asking.

Where an FPGA truly shines is in parallelization. You can actually have multiple processes running concurrently inside an FPGA, and I'm referring to true concurrency not the time-slicing variety used in operating systems. For example, in my IGCDEX module that uses Euclid's extendend algorithm to calculate a GCD and inverse, I simply declare three multipliers and run them all in parallel at the same time. Here's how it's done (this the Verilog source code for my IGCDEX module):
Code:
// igcdex.v, (c) Sept 08, 2005, Ron Dotson
// Use Euclid's extendend algorithm to calculate:
// g=GCD(p1,p2), and s and t such that: g= s*p1 + t*p2

`include "ECM.h"

module IGCDEX(reset,clock,p1,p2,  s,t,g,finished);
parameter N= `L; // Bus Width
parameter S0=0,S1=1,S2=2,S3=3, S4=4,S5=5,S6=6,S7=7;
input       reset, clock;
input       [N-1:0] p1, p2;
output reg  [N-1:0] s, t, g;    // "g" is GCD, and t and s are inverses
output reg  finished;

reg         resetDivider, resetMul, swappedInputs;
reg         [N-1:0] c1, a2,b2,c2, a3,b3,c3; // multiplier inputs
reg         [N-1:0] b,u,v;
reg         [2:0] state;

wire        dividerFinished, mulFinished1, mulFinished2, mulFinished3;
wire        [2*N-1:0] m1, m2, m3;  // multiplier outputs
wire        [N-1:0] q;

DIVIDER    divider (resetDivider,clock,g,b,  q,,dividerFinished); // q= iquo(g/b)
MULTIPLIER mul1    (resetMul,clock,q,b,c1,   m1,mulFinished1); // m1= (a1*b1+c1)
MULTIPLIER mul2    (resetMul,clock,q,u,c2,   m2,mulFinished2); // m2= (a2*b2+c2)
MULTIPLIER mul3    (resetMul,clock,q,v,c3,   m3,mulFinished3); // m3= (a3*b3+c3)

always @(posedge clock)
begin
if (reset==1)
    begin
    finished<=0;
    resetDivider<=0;   // ensure these start off low
    resetMul<=0;
    g<= p1[N-1]? -p1: p1;   // g= abs(p1)
    b<= p2[N-1]? -p2: p2;   // b= abs(p2)
    t<= p1[N-1]?  -1:  1;   // t= sign(p1)
    s<= 0;
    u<= 0;
    v<= p2[N-1]?  -1:  1;   // v= sign(p2)
    resetMul<= 0;
    resetDivider<= 0;
    state<=S0;
    //$display("IGCDEX 10: RESET p1=%X, p2=%X\n", p1,p2);
    end
else
    begin
    case (state)
        S0: begin
            //$display("IGCDEX 20: g=%d, b=%d, t=%X, v=%d\n", g,b,t,v);
            if (g < b)
                begin
                g<= b;    // swap operands so that g>=b
                b<= g;
                t<= v;
                v<= t;
                end
            state<= S1;
            end

        S1: begin          // while b<>0
            if (|b)     // if b!=0 continue loop, else exit
                begin
                resetDivider<= 1;  // enable divider, q= iquo(g/b)
                state<= S2;
                end
            else
                state<= S4;  // finished
            end

        S2: begin
            resetDivider<= 0;
            if (dividerFinished)
                begin
                //$display("IGCDEX 30: q=%X, g=%d, b=%d (q= g/b)\n", q,g,b);
                c1<= -g;
                c2<= -t;
                c3<= -s;
                resetMul<= 1;  // enable multipliers
                state<= S3;
                end
            end

        S3: begin
            resetMul<= 0;
            if (mulFinished1 & mulFinished2 & mulFinished3)
                begin
                if (m1[N]!=m1[N-1] | m2[N]!=m2[N-1] | m3[N]!=m3[N-1])
                    begin          // overflow
                    //$display("\nIGCDEX xx: OVERFLOW!\n");
                    //$display("\tm1=%X, m2=%X, m3=%X\n", m1, m2, m3);
                    //$display("\nIGCDEX Inputs: p1=%X, p2=%X\n", p1,p2);
                    $stop();
                    end
                g<= b;
                b<= -m1[N-1:0];
                t<= u;
                u<= -m2[N-1:0]; // -m2 = x
                s<= v;
                v<= -m3[N-1:0];
                state<= S1;  // end of while loop
                //$display("IGCDEX 30: Looping Back; q=%d, g=%d, b=%d\n\n\n", q,g,b);
                end
            end

        S4: begin
            finished<= 1;
            end
    endcase
    end
end
endmodule                  // IGCDEX
You may be wondering why I use modules (separate algorithms) to do multiplication rather than simply using the built-in Verilog "*" multiplication operator (the same as in any other language). The answer is that Verilog will by default create a combinatorial multiplier that explodes the gate count exponentially as you increase the bus-width; whereas, the gate count for the multiplier algorithm I used increases linearly with the bus-width, but it is very slow - requiring one clock cycle per data bit. Also, for a wide bus the interconnects eat you alive. About 87% of the propagation delay in one of my wide bus-width compilations was used simply for internal interconnects between modules inside the FPGA! Incidentally, I always parameterize the bus-width so that I can change it throughout the entire design simply by changing a single number in a header file.

This was my first "real" Verilog program of any size so please do not think the IGCDEX module above is a sterling example of Verilog. I'm still learning, and I'm sure there may be much better ways to do the same things I've come up with - and in fact I'll probably continue trying to improve my code as I learn more about Verilog.

Regards,

Ron
rdotson is offline   Reply With Quote
Old 2005-09-24, 05:45   #9
rdotson
 
rdotson's Avatar
 
Jul 2005

23×5 Posts
Default

Quote:
Originally Posted by jasonp
My vote for something cool to implement is a number-theoretic FFT using modular math. ... Actually I think somebody has already implemented a first draft of this on a really big Xilinx.

jasonp
That sounds like it might be an excellent application for an FPGA Jason, and in fact I believe it was my original question that I started this thread with. If you happen to recall who the "somebody" was who has already implemented a first draft of it, please contact me because I would love to see what's already been done before I spend to much time on it. I have no desire to waste my time reinventing the wheel.

In any case however, if you or anyone could point me to an example of a working program that implements an FFT using modular math that's written in a conventional programming language, then I'll be happy to give it a try provided it looks do-able to me. I need a conventional program to start with so that I have a baseline against which I can test my Verilog code output, and to generate test cases, and to generate intermediate results for debugging.

Thanks for your suggestion,

Ron
rdotson is offline   Reply With Quote
Old 2005-09-24, 07:24   #10
rdotson
 
rdotson's Avatar
 
Jul 2005

23×5 Posts
Lightbulb

I just had a funny idea. I may be bordering on the verge of getting booted off this forum, but it occurred to me that the only way I'm going to convince the mods to start a separate subforum for Hardware Design Languages (Verilog, VHDL, SystemC) as I mentioned in another thread somewhere, is to turn ALL of you into VLSI designers.

Here's how I'm going to do it. First, go get a copy of the free Icarus Verilog simulator from http://www.icarus.com/eda/verilog/ Next, copy the "igcdex.v" file I posted above and the two I'm posting with this message (Ecm.h and testIgcdex.v) into a directory on your computer somewhere.

Install the Icarus simulator, and then run my test program (testIgcdex.v) using the simulator. Remove the "//" comment marks from the $display lines if you like (ie; change them from //$display to just $display) so that you can see the progress of the program in clock cycle time. Tinker around with it. Change the bus width and other parameters, and play with it until you're hooked and start writing your own Verilog code!

-- Ron


Ecm.h
Code:
// Ecm.h
`define L 32  // Bus Width
`define K 31  // Bus Width-1
testIgcdex.v
Code:
// testIgcdex.v, test IGCDEX module
// (c) July 13, 2005, Ron Dotson

`include "ECM.h"

module test;
parameter N= `L; // Bus Width
reg  reset, clock;
reg  [N-1:0] in1, in2;
reg  [2*N-1:0] expected;
wire [2*N-1:0] answer;
wire ready;
integer clocks;

reg  [N-1:0] x, y;
reg  [N-1:0] g_expected, a_expected, b_expected;
wire [N-1:0] g, a, b;
wire finished;

IGCDEX igcdex1(reset,clock,x,y,    a,b,g,finished);  // g=GCD(x,y), g= a*x + b*y

initial
begin
clock=0;
clocks=0;
reset=0;

// Test Case 1
//x= 34011;    // setup inputs (0x84DB)
//y= 24867;    // setup inputs (0x6123) v
//g_expected= 9;             // g:=GCD(x,y)
//a_expected= -1270;
//b_expected= 1737;

// Test Case 2
//x= 6;    // setup inputs
//y= 52961;
//g_expected= 1;          // g:=GCD(x,y)
//a_expected= -1;         // g= a*x + b*y
//b_expected= 8827;

// Test Case 3
//x= -14722;    // setup inputs
//y= 52961;
//g_expected= 1;          // g:=GCD(x,y)
//a_expected= 4933;         // g= a*x + b*y
//b_expected= 17746;

// Test Case 4
x= 0;    // setup inputs
y= 52961;
g_expected= 52961;          // g:=GCD(x,y)
a_expected= 0;         // g= a*x + b*y
b_expected= 1;


$display("\ntestIgcdex: Starting Test, x=%X, y=%X\n",x,y);
#10 reset=1;
#10 clock=1;          // latch inputs
#10 reset=0;
#10 clock=0;
while (~finished)
    begin
    #10 clock=1;
    #10 clock=0;
    clocks=clocks+1;
    if (clocks > 100000)
        $stop();
    end

if (g===g_expected & a===a_expected & b===b_expected)
    begin
    $display("\ntestIgcdex: PASS\n");
    end
else
    begin
    $display("\ntestIgcdex: FAIL");
    $display("\tInputs to IGCDEX: x=%d, y=%d", x,y);
    $display("\tOutputs:  g=%d, a=%d, b=%X hex, Where: g=IGCDEX(x,y,'a,'b')", g, a, b);
    $display("\tExpected: g=%d, a=%d, b=%X hex\n", g_expected, a_expected, b_expected);
    end
end
endmodule
rdotson is offline   Reply With Quote
Old 2005-09-24, 07:42   #11
rdotson
 
rdotson's Avatar
 
Jul 2005

1010002 Posts
Default Oops, two more files you'll need:

multiplier.v
Code:
//==================================================================
// multiplier.v,  Compute: out= in1*in2 + in3;
// (c) Sept 17, 2005, Ron Dotson
//==================================================================

`include "ECM.h"

module MULTIPLIER(reset,clock,in1,in2,in3,  accum,ready);
parameter N= `L; // Bus Width
parameter S0=0,S1=1,S2=2,S3=3, S4=4,S5=5,S6=6,S7=7;
input reset, clock;
input [N-1:0] in1, in2, in3;
output reg [2*N-1:0] accum;
output reg ready;

reg accumNegative;  // of "accum" output
reg [2:0] state;
reg [N-1:0]   a, i, in2Minus, allOnes;
reg [2*N-1:0] c, bShifted;  // bShifted= in2*2^i

always @(posedge clock)
begin
if (reset==1)
    begin
    ready<=0;
    in2Minus<= -in2;
    accum<= 0;
    allOnes<= -1;
    state<= S0;
    end       // "if (reset)"
else
    begin
    case (state)
        S0: begin
            if ((|in1[N-1:0]) & (|in2[N-1:0]))     // Neither in1 nor in2 = zero
                begin
                i<=0;
                accumNegative<= in1[N-1] ^ in2[N-1];
                state<= S1;
                end
            else                 // either in1==0 or in2==0
                begin
                state<= S2;
                end
            end

        S1: begin      // neither in1 nor in2 = zero
            state<= S3;
            if (in1[N-1])                     // if in1 negative
                begin
                a<= -in1; // make positive
                end
            else
                begin
                a<= in1;
                end

            if (in2[N-1])                     // if in2 negative
                begin
                bShifted[2*N-1:0]<= {`L'd0,in2Minus};  // make positive
                end
            else
                begin
                bShifted[2*N-1:0]<= {`L'd0,in2[N-1:0]};
                end

            if (in3[N-1]) // negative (NOTE THAT "c" RETAINS IT'S SIGN)
                begin
                c[2*N-1:0]<= {allOnes[N-1:0],in3[N-1:0]}; // extend to double precision
                end
            else                               // else positive
                begin
                c[2*N-1:0]<= {`L'd0,in3[N-1:0]}; // extend to double precision
                end
            end

        S2: begin   // either in1==0 or in2==0
            if (in3[N-1])    // if in3 negative
                begin
                accum<= {allOnes[N-1:0],in3[N-1:0]};
                end
            else
                begin
                accum<= {`L'd0,in3[N-1:0]};
                end
            state<=S4;        // Finished!
            end

        S3: begin
            if (i < N)
                begin
                state<= S3;        // Loop
                //$display("\tOperands: in1=%X, in2=%X, in3=%X\n\taccum=%d, i=%d\n", in1,in3,in3,accum,i);
                bShifted<= {bShifted[2*N-2:0],1'b0}; // shift left one
                if (a[i])
                    accum<= accum + bShifted;
                i<= i+1;
                end
            else    // accum= in1*in2
                begin
                state<= S4;        // Finished!
                if (accumNegative)
                    accum<= c - accum;
                else
                    accum<= accum + c;
                end
            end

        S4: begin
            ready<=1;        // Finished!
            end
    endcase
    end
end          // "always @(posedge clock)"
endmodule    // Multiplier
divider.v
Code:
// divider.v, return: dividend= op1/op2, remainder= op1 % op2;
// Note: I wrote this division routine because I needed a quick and dirty way
// to do a divide. Eventually I'll replace it with something more efficient
// but it's performance is not too shabby. It basically takes advantage
// of the equivalence between binary powering and shifting. One of the operands
// ("Little_ShiftedLeft" in this case) is shifted left by one bit position per clock,
// and then right by one bit position per clock
// (c) July 02, 2005, Ron Dotson
//`include "ECM.h"

`include "ECM.h"

module DIVIDER(reset,clock,op1,op2,   dividend,remainder,finished);
parameter N= `L; // Bus Width
input      reset, clock;
input      [N-1:0] op1, op2;
output reg [N-1:0] dividend, remainder;
output reg finished;

reg        [N-1:0] Big, Little_ShiftedLeft, shiftMask;
reg        shiftDir;           // shift direction: 0=Left, 1=Right

always @(posedge clock)
if (reset==1)
    begin
    Little_ShiftedLeft= 0;
    shiftMask=1;               // shiftMask= 2^x, Where x= #Bit positions "Little" has been shifted left
    shiftDir=0;                // Note that "Little_ShiftedLeft" = shiftMask*Little
    finished=0;
    dividend=0;
    remainder=0;
    Big= op1;
    Little_ShiftedLeft= op2;
    end
else
    begin
    if (~finished)
        begin                 // on *both* posedge and negedge of clock
        if (~shiftDir)        // if shifting Left
            begin
            if (Big >= Little_ShiftedLeft)
                begin
                //$display("DIVIDER 1: Big=0x%X, Little_ShiftedLeft=0x%X", Big,Little_ShiftedLeft);
                Big= Big-Little_ShiftedLeft; // subtractions while shifting left are for "free," so may as well do it
                dividend= dividend + shiftMask;
                Little_ShiftedLeft= {Little_ShiftedLeft[N-2:0],1'b0};        // shift "Little_ShiftedLeft" Left one
                shiftMask= {shiftMask[N-2:0],1'b0};                          // shift "shiftMask" Left one
                //$display("DIVIDER 2: Big=0x%X, Little_ShiftedLeft=0x%X,\n\t   shiftMask=%d, dividend=0x%X", Big,Little_ShiftedLeft,shiftMask,dividend);
                end
            else              // Big<Little_ShiftedLeft; so start shifting Right
                begin
                shiftDir=1;                      // shift Right
                Little_ShiftedLeft= {1'b0,Little_ShiftedLeft[N-1:1]};        // shift "Little_ShiftedLeft" Right one
                shiftMask= {1'b0,shiftMask[N-1:1]};                          // shift "shiftMask" Right one
                //$display("DIVIDER 3: Big=0x%X, Little_ShiftedLeft=0x%X,\n\t   shiftMask=%d, dividend=0x%X", Big,Little_ShiftedLeft,shiftMask,dividend);
                end
            end               // end shifting Left
        else                  // shifting Right
            begin
            if (|shiftMask)  // if shiftMask!=0 and shifting Right
                begin
                if (Big >= Little_ShiftedLeft)
                    begin
                    Big= Big-Little_ShiftedLeft;
                    dividend= dividend + shiftMask;
                    //$display("DIVIDER 4: Big=0x%X, Little_ShiftedLeft=0x%X,\n\t   shiftMask=%d, dividend=0x%X", Big,Little_ShiftedLeft,shiftMask,dividend);
                    end
                Little_ShiftedLeft= {1'b0,Little_ShiftedLeft[N-1:1]};        // shift "Little_ShiftedLeft" Right one
                shiftMask= {1'b0,shiftMask[N-1:1]};                          // shift "shiftMask" Right one
                //$display("DIVIDER 5: Big=0x%X, Little_ShiftedLeft=0x%X,\n\t   shiftMask=%d, dividend=0x%X", Big,Little_ShiftedLeft,shiftMask,dividend);
                end
            else                  // shifting Right and shiftMask=0 => Finished
                begin
                remainder= Big;
                finished= 1;
                //$display("\nDIVIDER 6: Big=0x%X, Little_ShiftedLeft=0x%X,\n\t   shiftMask=%d, dividend=0x%X", Big,Little_ShiftedLeft,shiftMask,dividend);
                //$display("DIVIDER 7: dividend=%d (0x%X), remainder=%d (0x%X)", dividend,dividend,remainder,remainder);
                end
            end                   // shifting Right
        end                       // "~finished"
    end
endmodule
rdotson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Basic Number Theory 7: idempotents and Euler's function Nick Number Theory Discussion Group 17 2016-12-01 14:27
Suggestions on good number theory books??? Erasmus Other Mathematical Topics 12 2010-04-13 23:33
Number Theoretic Transforms paulunderwood Software 7 2006-09-19 15:56
ECM/FPGA Implementation rdotson Hardware 12 2006-03-26 22:58
Search for a number theoretic function related to "prime divisor sums" juergen Math 2 2004-07-10 23:01

All times are UTC. The time now is 21:51.

Tue Nov 24 21:51:21 UTC 2020 up 75 days, 19:02, 4 users, load averages: 3.07, 3.09, 3.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.