![]() |
![]() |
#1 |
Mar 2003
Yucaipa, CA, USA
23 Posts |
![]()
Hi,
I was interested in factoring large Mersenne prime canidates using the Motorola 7410 AltiVec processor. I suspect they would not be too good running Lucas (prime testing), since the device is only 32 bits single precision. If I'm wrong, let me know. The core processor is 64 bit but the real power lies in the AltiVec vector ALUs, which are 32 bits, integer or floating point. If there is a ANSI C or C++ version of a factoring program that uses the AltiVec hardware on the PPC 7410 I'd like to compile it and use it. Any suggestions? What is the best ANSI C program out there that factors large, (2^45,000,000) - 1, numbers for any processor. Thank You, Nick |
![]() |
![]() |
![]() |
#2 |
Aug 2002
26×33×5 Posts |
![]()
I've wondered for a while if this paper had any significance for LL testing...
http://developer.apple.com/hardware/ve/pdf/g4fft.pdf |
![]() |
![]() |
![]() |
#3 | |
Aug 2002
223 Posts |
![]()
I posted at the Glucas site on the link Xyzzy a while back:
Glucas Sourceforge Tracker On the topic of Altivec, at the bottom of the above link: Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Aug 2002
22310 Posts |
![]()
I just read my old link that I posed on Sourceforge, and they have this doc on there:
Octuple-precision floating-point on Apple G4 Abscract: We describe herein a G4 Velocity Engine (Altivec) implementation of "oct-precision," i.e. 256-bit floating-point operations. (We speak of 32-bit exponents and 224-bit mantissas.) We present performance benchmarks in comparison to an existing C++ library. The basic result is that Altivec-based oct-precision can run about 4x faster than a scalar implementation of the same precision; a 500 Mhz.. G4 can therefore perform at 5-10 Mocts (million oct-ops per second). http://developer.apple.com/hardware/ve/pdf/oct3a.pdf If Altivec isn't good for double precision, would it be worth going oct-precision? :D :D :D |
![]() |
![]() |
![]() |
#5 |
Oct 2002
Lost in the hills of Iowa
26×7 Posts |
![]()
Isn't trial factoring an integer operation?
I'd think AltiVec code should handle THAT quite well - there are good reasons why the AltiVec G4s are *mosterously* faster than anything else on RC5 crunching. |
![]() |
![]() |
![]() |
#6 | ||
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
![]() Quote:
Quote:
Now let's look at PPC and AltiVec (which Klaus Kastens is currently helping me port my C-based TF code to). PPC (at least the more recent models, e.g. the 7450) is pretty good at 32-bit integer math - for instance, 32-bit integer multiply needs 2 cycles (pipelined). But, PPC needs two separate multiply operations to get a 64-bit product of 32-bit inputs; one to get the lower 32 bits of the result, one to get the upper 32. That means (assuming perfect pipelining) 4 cycles to get the 64-bit product; 4 of these need 16 cycles. Assembling the resulting pieces is also nontrivial, since 64-bit integer adds must also be emulated using 32-bit hardware operations, and the carry from the lower 32-bit sum into the upper half serializes the code, making it difficult to keep all of the multiple integer units of the CPU busy. But let's assume we can code things perfectly and actually get a 128-bit product in just 16 cycles. That is about as fast as we can do the same on a P3 or Alpha 21164 or MIPS R10000 (not sure if the R15000 is any better than the R10K at integer multiply, specifically the double-wide DMULTU operation), but is a factor of 8 slower than Alpha 21264, which needs just 2 pipelined cycles to get a 128-bit product. Whether the AltiVec can help out is still an open question. On the one hand its multiply functionality is even more limited than that of the PPC core - it can only get the lower half of a 64-bit product of 32-bit ints. On the other hand (IIRC) it can do 4 of these at a time. OTOOH (I'm running out of hands here :)), I believe it can only do the 4-way multiply if we have 4 32-bit inputs all getting multiplied by the same 32-bit number, thus it may be difficult to take advantage of the 4-way SIMD capability for the purposes of factoring. I still hold out hope that it could help in some way, but even in my most wildly optimistic dreams we're talking perhaps a 2X speedup over the PPC core alone. |
||
![]() |
![]() |
![]() |
#7 | |
Aug 2002
223 Posts |
![]() Quote:
Thanks so much for the reply, it's will go in my bag of linkage for future reference (since this question keeps poping up every once and a while). I guess those of us that farm P4's but use PPC's will just have to deal with the fact that the G4's are optimized for media, not crunching. Maybe the 970 will be a boon, assuming a bunch of assumptions actually happen between IBM and Apple. :D |
|
![]() |
![]() |
![]() |
#8 |
Aug 2002
26×33×5 Posts |
![]()
Those of us with the G3 (Mine is a 750FX) are really up a creek...
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Conjectured Primality Test for Specific Class of Mersenne Numbers | primus | Miscellaneous Math | 1 | 2014-10-12 09:25 |
Pretty Fast Primality Test (for numbers = 3 mod 4) | tapion64 | Miscellaneous Math | 40 | 2014-04-20 05:43 |
Proof of Primality Test for Fermat Numbers | princeps | Math | 15 | 2012-04-02 21:49 |
The fastest primality test for Fermat numbers. | Arkadiusz | Math | 6 | 2011-04-05 19:39 |
A primality test for Fermat numbers faster than Pépin's test ? | T.Rex | Math | 0 | 2004-10-26 21:37 |