![]() |
|
|
#1 |
|
Apr 2014
Marlow, UK
3816 Posts |
I've managed to find a very small change to the source code that speeds up the factoring of an 80 digit number by about 23% and a 90 digit one by nearer 30% (I'm currently testing with a 100 digit number - should know this evening). How do I go about sharing this change?
I'm running on a Windows VM with i7 processor, using gcc and Cygwin - I can't tell whether the speed-up will apply to other architectures... |
|
|
|
|
|
#2 | |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
29×3×7 Posts |
Quote:
The original author reads this forum but recently posted that Real Life (tm) is taking up most of his time and he won't be able to much, if any, development work. FWIW, many folk here are interested in msieve to factor rather larger numbers, those in the 120-170 digit range by GNFS or 180-250 by SNFS. Your optimizations are less likely to be useful to them because they tend to use the ggnfs siever in conjuction with the msieve toolkit for everything else. Paul |
|
|
|
|
|
|
#3 |
|
Apr 2010
Over the rainbow
23×52×13 Posts |
If the speed improvement is that large (claiming 23-30% on a C100) that would be interesting, even if it just for giving idea to others. If thoses numbers hold, it will be a noticeable change in ECM.
|
|
|
|
|
|
#4 |
|
Aug 2006
597910 Posts |
Definitely interesting (though I usually use yafu for numbers of that size).
|
|
|
|
|
|
#5 |
|
"Ben"
Feb 2007
3·1,171 Posts |
Although it has not been under active development for quite some time, msieve's QS has undergone significant optimization. I'm kinda surprised that a small change could result in such large improvements. I'd very much like to see your updates.
|
|
|
|
|
|
#6 |
|
Apr 2014
Marlow, UK
23·7 Posts |
Yes, it is impressively optimised. I was rather surprised by the result, and before I post the changes I'll wait for RSA-100 to finish (this took 11 hours before, so should finish in the next few hours). If there is a similar improvement with this I'll post the change, if not I'll hang my head in shame and apologise
|
|
|
|
|
|
#7 | |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
29·3·7 Posts |
Quote:
Even if a C100 is no worse than before, the improvements for smaller numbers are well worth including. Paul |
|
|
|
|
|
|
#8 | |
|
Nov 2003
11101001001002 Posts |
Quote:
was involved? |
|
|
|
|
|
|
#9 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
I'd be happy to fold it in; post the changes here or PM me.
|
|
|
|
|
|
#10 |
|
Apr 2014
Marlow, UK
23·7 Posts |
(Struggling to type due to shamed head-hanging...)
It would appear that rumours of a significant speed-up have been greatly exaggerated... Factoring RSA-100, which took almost exactly 11 hours before on my (slow!) VM (at work) took 10h46 after the change - 2% improvement - disappointing. Hard to do timings when the 'tin' is shared by other VMs, I guess, but it's the only place I have a modern Cygwin. The change is in the function add_to_hashtable in sieve_core.c (possibly pplicable to other sieve_core_xxx.c files?). This is very much a hot-spot, and has clearly been carefully optimised, including pre-fetching to maximise cache usage. What I did was this: Instead of allocating a new bucket_entry_t (struct) on the stack, setting its contents then copying to the entry's list, avoid this copy by setting the list's entry's fields directly. This is probably in the cache due to the PREFETCH. The body of the function is as follows, with the mods conditionally compiled on the __MDF__ flag: Code:
static void add_to_hashtable(bucket_t *entry,
uint32 sieve_offset,
uint32 mask,
uint32 prime_index,
uint32 logprime) {
/* add a 'sieve update' to the hashtable bin pointed
to by 'entry'. Hashing is based on the top few
bits of sieve_offset, and the range of sieve values
represented by one hash bin is given by 'mask'.
Note that after the first polynomial it's unlikely
that any hash bin will need to grow in size. */
#if defined(__MDF__)
uint32 i = entry->num_used++;
#else
uint32 i = entry->num_used;
bucket_entry_t new_entry;
#endif
/* always prefetch; we can't rely on the processor
automatically knowing when to do this */
if (!(i % 8))
PREFETCH(entry->list + i + 8);
if (i == entry->num_alloc) {
entry->num_alloc = 2 * i;
entry->list = (bucket_entry_t *)xrealloc(entry->list,
2 * i * sizeof(bucket_entry_t));
}
#if defined(__MDF__)
bucket_entry_t *new_entry = entry->list + i;
new_entry->logprime = logprime;
new_entry->prime_index = prime_index;
new_entry->sieve_offset = sieve_offset & mask;
#else
new_entry.logprime = logprime;
new_entry.prime_index = prime_index;
new_entry.sieve_offset = sieve_offset & mask;
entry->list[i] = new_entry;
entry->num_used++;
#endif
}
It would also be also possible to combine the sieve_offset and mask parameters into a single masked_sieve_offset (by using & at the call sites), to reduce the number of parameters, but I suspect this would make little or no difference with clever optimisers. I would be interested to know what difference, if any, this change makes for other people. Apologies for any time wasted, I'll adopt a test-before-post approach from now on! |
|
|
|
|
|
#11 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
It's a pretty nice patch, and on my Core 2 it does drop the sieving time by a few seconds at 80 digits, so I committed it.
Thanks! |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Source code to mprime 289 out there somewhere? | graysky | Linux | 6 | 2016-04-25 23:03 |
| Source Code for msieve ? | mohamed | Msieve | 8 | 2013-12-14 01:04 |
| Source code for my program | Sam Kennedy | Factoring | 7 | 2012-11-22 18:24 |
| llrnet - source code? | reezer | Prime Sierpinski Project | 11 | 2009-09-11 10:47 |
| Support for other OSs on x86/source code | reezer | Software | 1 | 2007-02-08 12:57 |