![]() |
|
|
#12 | |
|
Apr 2003
Berlin, Germany
192 Posts |
Quote:
I see the most difficulties in making the code run in parallel like on a SMT capable CPU. Loops, function calls and such things would just have to be combined carefully. SMT capable CPUs could be fine in this case but they could also just make it worse because the threads can't track the process of eachother to avoid cache thrashing and other side effects of such huge datasets. Did you already write or try to write an implementation of such an "hybrid" transform? Matthias |
|
|
|
|
|
|
#13 | |
|
∂2ω=0
Sep 2002
República de California
1163910 Posts |
Quote:
|
|
|
|
|
|
|
#14 | |
|
Apr 2003
Berlin, Germany
1011010012 Posts |
Quote:
I created an open source project for optimizing code of other open source projects for AMD64 platforms. Maybe by providing a specially optimized client we could win some of the enthusiasts who will buy such a platform. I don't know many facts about the distributed RSA cracking but I'm sure many people joined just because they knew that their machine is good in this task. And maybe it's possible to create a fast integer convolution which could help in other projects. I think the AMD core math library is optimized to the max but not for some special cases (if FPU code is perfectly optimized nobody would look if the integer unit could help). Matthias |
|
|
|
|
|
|
#15 | ||
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
1) I ran on a 21164, which is 8x slower at integer mul than the 21264; 2) I expect a C compiler to do a much better job with the integer instructions, since Fortran compiler technology (at least in the past 20-30 years) has been driven mainly by scientific computing, which is dominated by floating-point math. And C at least provides a reasonable way to include ASM macros, so as a last resort one can bypass the compiler that way if it's doing a poor job with crucial parts of the computation. Quote:
http://www.mail-archive.com/mersenne%40base.com/msg07322.html http://www.mail-archive.com/mersenne%40base.com/msg07320.html |
||
|
|
|
|
|
#16 | |||
|
Apr 2003
Berlin, Germany
16916 Posts |
Quote:
Quote:
Quote:
|
|||
|
|
|
|
|
#17 |
|
Apr 2003
Berlin, Germany
192 Posts |
Now I have some starting point (C implementation) for FGT. I will try to add weighting to it according to Crandalls papers. Then some optimized mod operations will follow.
BTW one drawback of x86 architecture is that imul with double width results have fixed target registers (edx:eax or rdx:rax). I hope the OOO execution allows a dense imul block with some instructions to get the results "out of the way". |
|
|
|
|
|
#18 | |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |
| Intel GPU usable? | tha | Hardware | 4 | 2015-07-28 15:31 |
| earthquakes: a usable power source ? | science_man_88 | Lounge | 42 | 2011-03-27 00:52 |
| Graphic Card usable for Prime? | Riza | Hardware | 11 | 2006-11-09 11:46 |
| Fast Odd Discrete Fourier Transform (ODFT) | dsouza123 | Miscellaneous Math | 1 | 2005-11-13 21:37 |