![]() |
|
|
#12 |
|
Apr 2007
410 Posts |
Hi! I'm the aforementioned student! My name is Tom Harper and I'll be working on some of the software for GIMPS (is that an acceptable acronym!). You can read my progress at http://summerofsolaris.aftereternity.co.uk/
I anticipate a lot of communication with you all, as the knowledge required for this sort of thing seems sort of esoteric. In the meantime, if you have any ideas for Glucas or Mlucas optimisation, let me know @ rtharper@aftereternity.co.uk! |
|
|
|
|
|
#13 |
|
Sep 2002
23×37 Posts |
cant really track it if you dont update your blog
|
|
|
|
|
|
#14 |
|
Apr 2007
22 Posts |
Patience! It's been less than a week since GSoC officially started, and I only got a little bit of a headstart (finals, graduation, etc...). A post about the first week is up there. You can expect more frequent (i.e. daily or more often) posts from now on (Rob admonished me that you all would be suitably interested to the extent that I owe it to you to document every excruciating detail!).
|
|
|
|
|
|
#15 | |
|
Apr 2006
Down Under
89 Posts |
Quote:
![]() I suggest that Mac OS X users stay posted, since Tom's primary machine is a Core 2 duo Mac he is likely to get out some new Glucas / Mlucas builds for these in the coming weeks (even though this is officially an OpenSolaris mentored project). |
|
|
|
|
|
|
#16 |
|
∂2ω=0
Sep 2002
República de California
2·32·647 Posts |
For my part, the operating mantra is "inline assembly code can be fun!"
(cue Rod Serling voiceover) "Consider if you will, some simple trial-factoring 64-bit modmul code running on x86/ia32. In high-level C code, letting the compiler do the 64-bit integer emulation: Starting Trial-factoring Pass 0... Trial-factoring Pass 0: time = 00:01:25.983 Starting Trial-factoring Pass 1... M18018467 has a factor: 195863445150291847. Program: E3.0x Trial-factoring Pass 1: time = 00:01:24.912 Starting Trial-factoring Pass 2... With a whiff of inline ASM, no serious effort at optimization and no use of SSE2: Starting Trial-factoring Pass 0... Trial-factoring Pass 0: time = 00:00:37.554 Starting Trial-factoring Pass 1... M18018467 has a factor: 195863445150291847. Program: E3.0x Trial-factoring Pass 1: time = 00:00:36.963 Starting Trial-factoring Pass 2... More than twice as fast as high-level code using an optimizing compiler, ladies and gentlemen. An effect this profound would cause a person to question their sanity, unless they were writing inline assembler in ... the Twilight Zone." |
|
|
|
|
|
#17 |
|
Apr 2006
Down Under
5916 Posts |
Just a quick update since GSoC has passed the halfway point.
A HEAP of work has been done on the Mlucas 3.x code over the last 8 weeks. Tom Harper has parallelized the FFT routines, while Ernst Mayer has done the same to the carry routines. We are now seeing quite high levels of parallelism when using 2-8 concurrent threads. I don't want anyone to get too excited at this stage as over the next couple of weeks there needs to be some rigorous testing and much further fine tuning performed. At the moment we are limited to 8 threads but should be able to reach 16 very soon at which time a direct comparison can be made between Glucas & Mlucas performance at 16 threads (though I'm putting my money on Mlucas ).We have been testing the performance on the following boxes:
I'm not going to release any specific timings at this stage but I will say that we have in some circumstances seen scaling that is better than this which bodes well for a fast verification of the yet to be found M45 ![]() Cheers, Rob. |
|
|
|
|
|
#18 |
|
Jun 2003
Ottawa, Canada
117310 Posts |
If you want any help testing that MLucas code in Linux, I can try it out on the large Itanium2 beast I have been using for M44/43 verification (128 CPUs). Using 16 cores was the best bang for the buck with GLucas.
|
|
|
|
|
|
#19 | ||
|
∂2ω=0
Sep 2002
República de California
2·32·647 Posts |
Quote:
I've been doing most of my timing tests on a 16-core Itanium 2 system hosted on the HP testdrive program. 16-way ||ism is all I plan to code for in the near future, since the particular || structure of my FFT implementation lends itself best to the 2-16 core range. The Sun folks [Rob and Tom Duell] have some nice multicore Sparc 6 and Opteron/Solaris systems, so we continually monitor and compare the benchmarks on 3 different systems. My brief take or "executive summary" of where things are: - Nearly all of the basic || FFT code - in particular the modified-to-be-thread-friendly data access scheme - was already in place, I did most of that work during a hiatus from work in 2005. Tom Harper's key contribution was tracking down the source of a subtle OpenMP loop-handling issue which was causing the || code to go haywire in unpredictable, nonrepeatable ways, which I didn't have the debug tools or MT experience to solve on my own. Once we had that solved, progress has been very rapid. - Compared to [say] Glucas, the Mlucas MT approach has several distinct advantages. For starters, no performance hit in going from unthreaded to threaded. For instance, here are numbers from Glucas timing tests on a multicore Itanium system [In fact, I believe almost identical to the one I'm using], posted to the "Perpetual benchmark Thread" by Tony Reix: Quote:
Code:
#thread sec/iter speedup ------- ----- ------ - .134 1.00 1 .134 1.00 2 .064 2.09 4 .033 4.06 8 .021 6.38 [We're currently investigating the sudden performance drop in going above 4 threads.] Cheers, -E [BTW, I never forgot your e-mail of last September, asking about a || Mlucas -- but I didn't want to reply with either an excuse or a vague "I'm working on it", instead I thought it better to use as a carrot to actually get something working -- though Rob G. has been a more-than-adequate niggler in that regard, as well. ;) I was actually going to e-mail you later this week to let you know how things were shaping up, but you just saved me the work.] Last fiddled with by ewmayer on 2007-07-25 at 17:03 |
||
|
|
|
|
|
#20 |
|
Apr 2006
Down Under
89 Posts |
|
|
|
|
|
|
#21 |
|
Jun 2003
Ottawa, Canada
3·17·23 Posts |
Nice. I'm glad you have a bunch of machines for testing, just thought I would offer in case you wanted someone else to try and break things.
|
|
|
|
|
|
#22 |
|
∂2ω=0
Sep 2002
República de California
2×32×647 Posts |
If and when we get the ~linear ||ism ratcheted up to 16 threads, you will be welcome to run 8 copies of that [each doing a different exponent] on your Beast. In winter, that would probably heat a small office building. ;)
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Summer is over up here.... | swl551 | Lounge | 0 | 2014-09-13 12:23 |
| Long hot summer... | davieddy | Soap Box | 7 | 2011-09-12 10:45 |
| British Summer time is here at last | davieddy | Lounge | 17 | 2008-04-09 17:09 |
| summer vacation | jasong | jasong | 1 | 2007-09-05 12:31 |
| Prime95 - summer edition | flava | Software | 16 | 2003-05-19 02:17 |