20030301, 22:58  #1 
Mar 2003
Yucaipa, CA, USA
2^{3} Posts 
Using Motorola 7410s to factor numbers or test for primality
Hi,
I was interested in factoring large Mersenne prime canidates using the Motorola 7410 AltiVec processor. I suspect they would not be too good running Lucas (prime testing), since the device is only 32 bits single precision. If I'm wrong, let me know. The core processor is 64 bit but the real power lies in the AltiVec vector ALUs, which are 32 bits, integer or floating point. If there is a ANSI C or C++ version of a factoring program that uses the AltiVec hardware on the PPC 7410 I'd like to compile it and use it. Any suggestions? What is the best ANSI C program out there that factors large, (2^45,000,000)  1, numbers for any processor. Thank You, Nick 
20030302, 02:22  #2 
Aug 2002
2^{6}×3^{3}×5 Posts 
I've wondered for a while if this paper had any significance for LL testing...
http://developer.apple.com/hardware/ve/pdf/g4fft.pdf 
20030303, 01:53  #3  
Aug 2002
223 Posts 
I posted at the Glucas site on the link Xyzzy a while back:
Glucas Sourceforge Tracker On the topic of Altivec, at the bottom of the above link: Quote:


20030303, 02:04  #4 
Aug 2002
223_{10} Posts 
I just read my old link that I posed on Sourceforge, and they have this doc on there:
Octupleprecision floatingpoint on Apple G4 Abscract: We describe herein a G4 Velocity Engine (Altivec) implementation of "octprecision," i.e. 256bit floatingpoint operations. (We speak of 32bit exponents and 224bit mantissas.) We present performance benchmarks in comparison to an existing C++ library. The basic result is that Altivecbased octprecision can run about 4x faster than a scalar implementation of the same precision; a 500 Mhz.. G4 can therefore perform at 510 Mocts (million octops per second). http://developer.apple.com/hardware/ve/pdf/oct3a.pdf If Altivec isn't good for double precision, would it be worth going octprecision? :D :D :D 
20030303, 02:16  #5 
Oct 2002
Lost in the hills of Iowa
2^{6}×7 Posts 
Isn't trial factoring an integer operation?
I'd think AltiVec code should handle THAT quite well  there are good reasons why the AltiVec G4s are *mosterously* faster than anything else on RC5 crunching. 
20030303, 21:53  #6  
∂^{2}ω=0
Sep 2002
República de California
2^{2}×2,939 Posts 
Quote:
Quote:
Now let's look at PPC and AltiVec (which Klaus Kastens is currently helping me port my Cbased TF code to). PPC (at least the more recent models, e.g. the 7450) is pretty good at 32bit integer math  for instance, 32bit integer multiply needs 2 cycles (pipelined). But, PPC needs two separate multiply operations to get a 64bit product of 32bit inputs; one to get the lower 32 bits of the result, one to get the upper 32. That means (assuming perfect pipelining) 4 cycles to get the 64bit product; 4 of these need 16 cycles. Assembling the resulting pieces is also nontrivial, since 64bit integer adds must also be emulated using 32bit hardware operations, and the carry from the lower 32bit sum into the upper half serializes the code, making it difficult to keep all of the multiple integer units of the CPU busy. But let's assume we can code things perfectly and actually get a 128bit product in just 16 cycles. That is about as fast as we can do the same on a P3 or Alpha 21164 or MIPS R10000 (not sure if the R15000 is any better than the R10K at integer multiply, specifically the doublewide DMULTU operation), but is a factor of 8 slower than Alpha 21264, which needs just 2 pipelined cycles to get a 128bit product. Whether the AltiVec can help out is still an open question. On the one hand its multiply functionality is even more limited than that of the PPC core  it can only get the lower half of a 64bit product of 32bit ints. On the other hand (IIRC) it can do 4 of these at a time. OTOOH (I'm running out of hands here :)), I believe it can only do the 4way multiply if we have 4 32bit inputs all getting multiplied by the same 32bit number, thus it may be difficult to take advantage of the 4way SIMD capability for the purposes of factoring. I still hold out hope that it could help in some way, but even in my most wildly optimistic dreams we're talking perhaps a 2X speedup over the PPC core alone. 

20030304, 15:41  #7  
Aug 2002
223 Posts 
Quote:
Thanks so much for the reply, it's will go in my bag of linkage for future reference (since this question keeps poping up every once and a while). I guess those of us that farm P4's but use PPC's will just have to deal with the fact that the G4's are optimized for media, not crunching. Maybe the 970 will be a boon, assuming a bunch of assumptions actually happen between IBM and Apple. :D 

20030304, 16:08  #8 
Aug 2002
2^{6}×3^{3}×5 Posts 
Those of us with the G3 (Mine is a 750FX) are really up a creek...

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Conjectured Primality Test for Specific Class of Mersenne Numbers  primus  Miscellaneous Math  1  20141012 09:25 
Pretty Fast Primality Test (for numbers = 3 mod 4)  tapion64  Miscellaneous Math  40  20140420 05:43 
Proof of Primality Test for Fermat Numbers  princeps  Math  15  20120402 21:49 
The fastest primality test for Fermat numbers.  Arkadiusz  Math  6  20110405 19:39 
A primality test for Fermat numbers faster than Pépin's test ?  T.Rex  Math  0  20041026 21:37 