mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Closed Thread
 
Thread Tools
Old 2008-12-01, 09:41   #89
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×33×109 Posts
Default

i thought George said somewhere that prime95 uses all integer arithmetic or something like that
henryzz is online now  
Old 2008-12-01, 15:42   #90
axn
 
axn's Avatar
 
Jun 2003

2·3·7·112 Posts
Default

Quote:
Originally Posted by cheesehead View Post
As you correctly point out, the claim in post #1 that "... any project that works by multiplying large numbers (LLR, GIMPS) must use fast Fourier transforms (FFTs) implemented in double precision (DP) floating point" needs to be amended. It is, of course, quite possible to use integers entirely in the Fourier transforms. The statement as it stands is especially misleading in a FAQ about computations performed by hardware other than general-purpose CPUs.
Quote:
Originally Posted by jasonp View Post
Q: What is this 'Number Theoretic Transform' that the threads I dutifully read above mention?

A: After explaining at great length why you have to have double precision floating point to do this FFT stuff, I'll backpedal and say that floating point arithmetic is not needed to compute the equivalent of an FFT multiply. You can instead use modular math in the finite field of elements modulo a prime integer. This sounds great, but instead of floating point multiplication you must instead use modular multiplication of integers. There are many tricks available to make this fast, but they all require 32-bit or 64-bit integer multiplication, that produces a 64-bit or 128-bit product. This unfortunately is not an operation needed in graphics cards or most other special processors, so it's not implemented in dedicated hardware, so it's not fast. In fact, we use floating point because most processors, general-purpose or special-purpose, don't care much about integer multiply throughput but care a great deal about making floating point fast. We go where the hardware is, and NTTs are not at present a good substitute for the way things are done now.
The OP is rather more comprehensive than you realize.
axn is offline  
Old 2008-12-01, 15:45   #91
axn
 
axn's Avatar
 
Jun 2003

2·3·7·112 Posts
Default

Quote:
Originally Posted by henryzz View Post
i thought George said somewhere that prime95 uses all integer arithmetic or something like that
Only for Trial Factoring. LL (and P-1 and ECM) use DP floating point FFT. However, Prime95's world-beating speed isn't just about usage of floating point. It took years of handtuning assembly code to optimize memory prefetch and what nots before the code got to where it is now.

It would be really interesting to benchmark and compare _all_ the versions of P95 on all the CPUs (past and present)

Last fiddled with by axn on 2008-12-01 at 15:48
axn is offline  
Old 2008-12-01, 19:48   #92
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×11×137 Posts
Default

Quote:
Originally Posted by henryzz View Post
i thought George said somewhere that prime95 uses all integer arithmetic or something like that
The very, very first version in 1995 - never released to the public - used all integer arithmetic. Then I learned about IBDWT and have been using floating point ever since.
Prime95 is offline  
Old 2008-12-01, 20:16   #93
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·33·109 Posts
Default

i wonder what it was i was thinking about then?
henryzz is online now  
Old 2008-12-01, 23:28   #94
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100101000101102 Posts
Default

There are parts in George's code that are all integer and are still used, e.g. for Generalized Fermat PRP in openpfgw. If you try
> pfgw -f0 -q"10^8192+1"
you will see the message
Switching to special GFN DWT logic
and the calculations are done in GWIntegers. I haven't read too deep into the source though. Note that this acts as a kludge, and most people never used this part of the source, - because they would enter 10^(2^23)+1 and then gwPRP_GFN is not called (the code expects TWO numbers, no additional parsing). It is also not faster than other methods, unless you stretch the trick and realize that you can change the base until gwPRP_GFN() is actually faster (at least within the framework of pfgw). Here's how I actually checked the GFN(10,23) --
> pfgw -q"10000^2097152+1"
Geoff Reynolds used LR (afaik) for a similar task (GFN(12,22) which I DC'd) and the speeds were compatible. (But only if you use a raised base, 12^4, in this case.)

Anyway, the point is just that some things can be computed in all integers not just theoretically (and not in spite of common sense). For me it was fun, because I could run it on some old PC which I wouldnt use for anything else - it ran over a month.

Here's a homework for someone - implement integer PRP GFN in GPU and test GFN(10,24) and GFN(12,24), all right? Good luck!

--Serge
Batalov is offline  
Old 2008-12-02, 11:33   #95
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

Quote:
Originally Posted by axn View Post
The OP is rather more comprehensive than you realize.
... and. therefore, if the not-necessarily-DP-FP information is not to be moved up into the offending paragraph to replace the misleadingly-unconditional declarative statement, then at least there should be a correction and note added, such as:

"... any project that works by multiplying large numbers (LLR, GIMPS) must will usually use fast Fourier transforms (FFTs) implemented in double precision (DP) floating point, but for some alternatives see the question "What is this 'Number Theoretic Transform' that the threads I dutifully read above mention?" below."

so that the same realization will not escape others. :-)

Last fiddled with by cheesehead on 2008-12-02 at 11:36
cheesehead is offline  
Old 2008-12-02, 15:30   #96
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Quote:
Originally Posted by cheesehead View Post
at least there should be a correction and note added, such as:
Note added.

Sheesh.

PS: Henry, in the early days there was a lot of dicussion on the GIMPS mailing lists about using NTTs to perform LL tests; perhaps you read that discussion in an archive somewhere?

Last fiddled with by jasonp on 2008-12-02 at 15:35
jasonp is offline  
Old 2008-12-02, 16:59   #97
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·33·109 Posts
Default

Quote:
Originally Posted by jasonp View Post
Note added.

Sheesh.

PS: Henry, in the early days there was a lot of dicussion on the GIMPS mailing lists about using NTTs to perform LL tests; perhaps you read that discussion in an archive somewhere?
no it wasnt that
it was something i read in the last couple of months
possibly in this thread or a similar thread
henryzz is online now  
Old 2008-12-15, 01:14   #98
jasong
 
jasong's Avatar
 
"Jason Goatcher"
Mar 2005

3·7·167 Posts
Default

Quote:
Q: What is this 'Number Theoretic Transform' that the threads I dutifully read above mention?

A: After explaining at great length why you have to have double precision floating point to do this FFT stuff, I'll backpedal and say that floating point arithmetic is not needed to compute the equivalent of an FFT multiply. You can instead use modular math in the finite field of elements modulo a prime integer. This sounds great, but instead of floating point multiplication you must instead use modular multiplication of integers. There are many tricks available to make this fast, but they all require 32-bit or 64-bit integer multiplication, that produces a 64-bit or 128-bit product. This unfortunately is not an operation needed in graphics cards or most other special processors, so it's not implemented in dedicated hardware, so it's not fast. In fact, we use floating point because most processors, general-purpose or special-purpose, don't care much about integer multiply throughput but care a great deal about making floating point fast. We go where the hardware is, and NTTs are not at present a good substitute for the way things are done now.
I'm not sure how long this "answer" has been here, but it's simply wrong, the AMD based LLR program I mentioned an hour or so ago in another thread uses this method. Saying it isn't fast based on opinion is even more of a religious belief than many Christians belief that Christ died and rose again. There's NO evidence that the "integer transform is slower" belief is true. It's simply people parroting each other, though I'll admit that they're parroting for the same reason I'm parroting, the reputation of the person they got the information from.
jasong is offline  
Old 2008-12-15, 01:56   #99
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2·47·101 Posts
Default

Quote:
Originally Posted by jasong View Post
It's simply people parroting each other, though I'll admit that they're parroting for the same reason I'm parroting, the reputation of the person they got the information from.
This is a fine example of a belief that you have, based on nothing other than your own experience.
Did you ever try Scientific_method? Or you can only retell your friend's stories?
Attached Thumbnails
Click image for larger version

Name:	stand_back.png
Views:	71
Size:	16.8 KB
ID:	3022  
Batalov is offline  
Closed Thread



Similar Threads
Thread Thread Starter Forum Replies Last Post
New PC dedicated to Mersenne Prime Search Taiy Hardware 12 2018-01-02 15:54
The prime-crunching on dedicated hardware FAQ (II) jasonp Hardware 46 2016-07-18 16:41
How would you design a CPU/GPU for prime number crunching? emily Hardware 4 2012-02-20 18:46
DSP hardware for number crunching? ixfd64 Hardware 15 2011-08-09 01:11
Optimal Hardware for Dedicated Crunching Computer Angular Hardware 5 2004-01-16 12:37

All times are UTC. The time now is 21:19.


Sun Aug 1 21:19:32 UTC 2021 up 9 days, 15:48, 0 users, load averages: 1.77, 1.67, 1.60

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.