![]() |
sr1sieve 1.0.14, sr5sieve 1.4.32
I have changed the 32-bit x86 binaries so that they will run on any Pentium, but use the SSE2 optimisations if SSE2 is available when started.
There are now two binaries: one for Intel machines, tuned for Pentium2/Pentium3 machines with the SSE2 code tuned for Pentium4; the other for AMD machines, tuned for Athlon with SSE2 tuned for Athlon64 (in 32-bit mode). It would be good to have some feedback on whether SSE2 is being detected properly, and whether the AMD binary is actually any faster on AMD machines than the Intel binary and vice versa. (If not then it would be nice to just distribute one binary). Also whether anyone finds the new binaries any slower than the version 1.4.30 binaries. If the `-v --verbose' switch is used, a message like this will be printed if SSE2 is detected: [code] sr1sieve 1.0.14 -- A sieve for one sequence k*b^n+/-1. Using SSE2 code path, L1 data cache 16Kb (detected), L2 cache 256Kb (detected). [/code] Thanks. |
sr1sieve-amd.exe
When I enter "-l 64 -L 512" the program tries to run with L1 of 64Kb but L2 of 524288Kb! (The program stops after a couple of seconds.)
It chooses 16Kb and 256Kb when I do not enter cache values and runs fine. (Athlon XP, barton core.) |
[QUOTE=Flatlander;100891]When I enter "-l 64 -L 512" the program tries to run with L1 of 64Kb but L2 of 524288Kb! (The program stops after a couple of seconds.)
[/quote] Thanks, that is fixed in sr1sieve version 1.0.15. (It didn't affect the other sieves). [quote] It chooses 16Kb and 256Kb when I do not enter cache values and runs fine. (Athlon XP, barton core.)[/QUOTE] Are those sizes correct? edit: Does it say (detected) or (default) after the cache size? |
It says default. (I believe 64 and 512 are the correct sizes.)
|
[QUOTE=Flatlander;101106]It says default.[/QUOTE]
It looks like the AMD cpu detection code is not working yet then. Could someone running an Athlon64 in 32-bit mode check whether SSE2 is being detected? If it isn't then you can use the --sse2 or --no-sse2 switches to override the detection in the meantime. |
[quote=Flatlander;100891]When I enter "-l 64 -L 512" the program tries to run with L1 of 64Kb but L2 of 524288Kb! (The program stops after a couple of seconds.)
It chooses 16Kb and 256Kb when I do not enter cache values and runs fine. (Athlon XP, barton core.)[/quote] What a wally! I just remembered my Athlon (Barton core) busted. I'm sitting at a Sempron 2600+ which, I think, has 128k L2. (I pinched the kids' PC :smile:.) |
I'm using Barton too, and it offers me the same defaults.
For Barton they actually should be: [CODE]L1-Cache: 64 + 64 KiB (Data + Instructions) L2-Cache: 512 KiB, fullspeed SSE enabled[/CODE] When I added --sse2 to the commandline, it threw me an error: [CODE]Illegal instruction[/CODE] [CODE]-l 64 -L 512[/CODE] worked for me |
sr1sieve 1.0.16, sr5sieve 1.4.34
These versions should fix the cache detection problems on AMD machines.
[QUOTE=kuratkull;101159]When I added --sse2 to the commandline, it threw me an error: [CODE]Illegal instruction[/CODE] [/QUOTE] You need SSE2 instructions for --sse2 to work, not just SSE. I haven't been able to find any way to make use of SSE (or SSE3 for that matter). |
What about SSSE3 (Core2 family)?
|
[QUOTE=Cruelty;101266]What about SSSE3 (Core2 family)?[/QUOTE]
No help either. The two things that would be a big help if they existed would be: 1. A 64-bit SIMD multiply. At the moment it is necessary to multiply 32-bits at a time, so on 64-bit machines it is faster to use the general registers for multiplication instead, but most of the gains are lost because of the need to move the results between general purpose registers and SIMD registers 2. A way to convert between integers larger than 32-bits and double precision floating point in SIMD registers without going through the FPU. This would be the SSE2 equivalent of the FPU insructions fildll and fistpll. |
sr1sieve 1.0.17, sr5sieve 1.4.35
These versions fix some problems that would have caused difficulties for those trying to compile from source. The binary archives are not affected.
|
| All times are UTC. The time now is 22:37. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.