mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 26.5 (https://www.mersenneforum.org/showthread.php?t=15224)

Prime95 2011-02-05 23:57

Prime95 version 26.5
 
Version 26.5 build 5 is ready for testing. This version fixes some bugs that were reported for 26.4.

Download links:
Windows: [url]ftp://mersenne.org/gimps/p95v265.zip[/url]
Windows 64-bit: [url]ftp://mersenne.org/gimps/p64v265.zip[/url]
Mac OS X: [url]ftp://mersenne.org/gimps/Prime95-MacOSX-265.zip[/url]
Linux: [url]ftp://mersenne.org/gimps/mprime265.tar.gz[/url]
Linux 64-bit: [url]ftp://mersenne.org/gimps/mprime265-linux64.tar.gz[/url]
FreeBSD: [url]ftp://mersenne.org/gimps/mprime265-FreeBSD.tar.gz[/url]
FreeBSD 64-bit: [url]ftp://mersenne.org/gimps/mprime265-FreeBSD64.tar.gz[/url]
Windows NT service: [url]ftp://mersenne.org/gimps/winnt265.zip[/url]
Windows NT service 64-bit: [url]ftp://mersenne.org/gimps/win64nt265.zip[/url]
Source: [url]ftp://mersenne.org/gimps/source265.zip[/url]

Bug fixes are described here: [url]http://mersenneforum.org/showpost.php?p=237200&postcount=2[/url]

Prime95 2011-02-05 23:58

1) Occasionally, P-1 stage 2 would report 100% complete before stage 2 completed. This bug has been around forever. I've finally found the cause! Fixed in 26.5 build 2.
2) Worktodo.txt was not updated properly when P-1 completed for a PRP test. Fixed in 26.5 build 3.
3) In 26.5 build 3, when benchmarking and measuring CPU speed the program will keep all cores busy by launching auxiliary threads that loop indefinitely. This should prevent Intel Turbo boost from kicking in. This should prevent cases where the CPU speed is reported as 1.87 GHz, yet a benchmark reports very fast timings because the CPU core has been boosted to 3.2 GHz because only one core is active.
4) In 26.5 build 3, the OS's mapping of hyperthreaded logical CPU numbers to physical CPUs is determined automatically at startup. The AffinityScramble setting (see undoc.txt) is no longer supported. It is replaced by the AffinityScramble2 setting.
5) In 26.5 build 3, the message "set affinity to run on any cpu" instead listed all the CPU numbers and as well as CPU #31. Fixed in build 4.
6) In 26.5 build 3 and 4, benchmarking a dual-core non-SSE2 machine will crash. Fixed in build 5.
7) KeepPminus1SaveFiles=0 option described in undoc.txt did not work when a factor was found. Fixed in 26.6.
8) The 32K,64K,80K length FFTs with sumout checking for Pentium 4's with 256K or less cache were not implemented. Symptom is a torture test failure. Fixed in 26.6.
9) Prime95 does not load on Sandy Bridge CPUs running Win XP. I've got a trial fix in 26.6.
10) Prime95 chooses very inefficient FFT implementations for Core 2 Celerons (they have 512K L2 cache). Working on a fix for for 26.6.
11) Logical CPU numbers were sometimes output zero-based and sometimes one-based. In 26.6, they are all one-based.

ixfd64 2011-02-06 00:39

whatsnew.txt needs to be updated.

ET_ 2011-02-06 11:53

Waiting for mprime and mprime_x86_64 :smile:

Luigi

tichy 2011-02-06 13:47

There is a r4delay3_p4tp.o file missing in the source archive, thus making own mprime build impossible.

This is my error message:
[CODE]wmigda@gentoo /home/scr2/000000000/cmake $ make
-- Build type: RELEASE
-- System Info:
SYSTEM_NAME: Linux
SYSTEM_PROCESSOR: i686
SYSTEM_VERSION: 2.6.34-gentoo-r6
-- Compiling for 32-bit system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/scr2/000000000/cmake
[ 93%] Built target gwnum
Linking CXX executable mprime
gwnum/libgwnum.a(mult.o):(_GWDATA+0x2b7c): undefined reference to `xfft_r4delay_720K_2304_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x2e3c): undefined reference to `xfft_r4delay_960K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x2e48): undefined reference to `xfft_r4delay_960K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3020): undefined reference to `xfft_r4delay_1152K_4608_4_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3050): undefined reference to `xfft_r4delay_1152K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3200): undefined reference to `xfft_r4delay_1344K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x320c): undefined reference to `xfft_r4delay_1344K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x32b8): undefined reference to `xfft_r4delay_1440K_4608_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x32c4): undefined reference to `xfft_r4delay_1440K_4608_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3358): undefined reference to `xfft_r4delay_1536K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x33ec): undefined reference to `xfft_r4delay_1600K_5120_4_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x367c): undefined reference to `xfft_r4delay_2240K_5120_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3704): undefined reference to `xfft_r4delay_2304K_4608_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x38b4): undefined reference to `xfft_r4delay_2688K_3072_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3970): undefined reference to `xfft_r4delay_2880K_4608_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x39d4): undefined reference to `xfft_r4delay_3M_6144_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3a98): undefined reference to `xfft_r4delay_3200K_5120_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x3c34): undefined reference to `xfft_r4delay_3840K_6144_4_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x5b6c): undefined reference to `xfft_r4delay_1440K_ac_3840_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x5d70): undefined reference to `xfft_r4delay_2304K_ac_4608_2_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x5e7c): undefined reference to `xfft_r4delay_3M_ac_6144_4_P4TP'
gwnum/libgwnum.a(mult.o):(_GWDATA+0x5f78): undefined reference to `xfft_r4delay_3840K_ac_6144_4_P4TP'
collect2: ld returned 1 exit status
make[2]: *** [mprime] Błąd 1
make[1]: *** [CMakeFiles/mprime.dir/all] Błąd 2
make: *** [all] Błąd 2
[/CODE]EDIT: I see an improvement being made in the Linux makefiles: ,,$(shell pkg-config --static --libs libcurl)'' :) So, from now on the Linux chores shall become nonexistent ?

EDIT2: it looks that there are more '*tp.o' objects required&misssing, e.g. r4delay5_p4tp.o

Prime95 2011-02-06 14:43

[QUOTE=ET_;251516]Waiting for mprime and mprime_x86_64 [/QUOTE]

I'm working on that. I retired my Linux build boxes. I do have a Ubuntu box, but when I used that for 26.4 someone complained about a GLIBC_3.11(?) error.

The question is, should I go find older Linux versions to install using VirtualBox so that fewer users face library issues or should I assume most users upgrade their kernels frequently and GLIBC_3.11 is now commonplace?

tichy 2011-02-06 15:07

[QUOTE=Prime95;251528]The question is, should I go find older Linux versions to install using VirtualBox so that fewer users face library issues or should I assume most users upgrade their kernels frequently and GLIBC_3.11 is now commonplace?[/QUOTE]

Maybe two versions, just like for Win and WinNT ? If not, one can always do his/her own linux build for the older glibc/kernel setup.

Prime95 2011-02-07 15:53

The linux 64-bit version using GLIBC 3.11 is available.

I tried installing 32-bit Debian and soon descended into libcurl dependency hell. I'll wipe it clean and try installing an older version of Ubuntu. If that succeeds I'll create a VirtualBox with an older 64-bit Ubuntu to link to an older GLIBC.

Thanks for your patience.

NBtarheel_33 2011-02-07 17:11

What does "fixed in next release" mean? Fixed in v26.5 or not until v27?

And when will you be implementing the option for worktodo of "test only exponents guaranteed to yield Mersenne primes"? :P

tichy 2011-02-07 17:22

[QUOTE=Prime95;251664]I tried installing 32-bit Debian and soon descended into libcurl dependency hell.[/QUOTE]

So, what is the output of [FONT=Courier New]pkg-config --static --libs libcurl[/FONT] and what specificaly fails when attempting the build ?

henryzz 2011-02-07 19:44

Wasn't a static version version available for 24.14? Surely that would solve the problem for most people.

tichy 2011-02-07 20:24

In the meantime I converted the PE/COFF files into missing ELF objects:

[CODE]objcopy -I pe-i386 r4delay_p4tp.obj -O elf32-i386 r4delay_p4tp.o[/CODE]also for [FONT=Courier New]r4delay5_p4tp.obj[/FONT] and [FONT=Courier New]r4delay3_p4tp.obj[/FONT].
Now mprime compiles in 32-bit version just fine:

[CODE]-- Build type: RELEASE
-- System Info:
SYSTEM_NAME: Linux
SYSTEM_PROCESSOR: i686
SYSTEM_VERSION: 2.6.34-gentoo-r6
-- Compiling for 32-bit system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/scr2/000000000/cmake
[ 1%] Generating /home/scr2/000000000/gwnum/r4dwpn_p4tp.o
[ 2%] Generating /home/scr2/000000000/gwnum/cpuidhlp.o
[ 3%] Generating /home/scr2/000000000/gwnum/gianthlp.o
[ 5%] Generating /home/scr2/000000000/gwnum/hg_blend.o
[ 6%] Generating /home/scr2/000000000/gwnum/hg_core.o
[ 7%] Generating /home/scr2/000000000/gwnum/hg_k10.o
[ 8%] Generating /home/scr2/000000000/gwnum/hg_k8.o
[ 10%] Generating /home/scr2/000000000/gwnum/hg_p4.o
[ 11%] Generating /home/scr2/000000000/gwnum/mult.o
[ 12%] Generating /home/scr2/000000000/gwnum/r4_core.o
[ 13%] Generating /home/scr2/000000000/gwnum/r4_k10.o
[ 15%] Generating /home/scr2/000000000/gwnum/r4_k8.o
[ 16%] Generating /home/scr2/000000000/gwnum/r4_p4.o
[ 17%] Generating /home/scr2/000000000/gwnum/r4delay3_co.o
[ 18%] Generating /home/scr2/000000000/gwnum/r4delay3_k1.o
[ 20%] Generating /home/scr2/000000000/gwnum/r4delay3_k8.o
[ 21%] Generating /home/scr2/000000000/gwnum/r4delay3_p4.o
[ 22%] Generating /home/scr2/000000000/gwnum/r4delay3_p4tp.o
[ 24%] Generating /home/scr2/000000000/gwnum/r4delay5_co.o
[ 25%] Generating /home/scr2/000000000/gwnum/r4delay5_k1.o
[ 26%] Generating /home/scr2/000000000/gwnum/r4delay5_k8.o
[ 27%] Generating /home/scr2/000000000/gwnum/r4delay5_p4.o
[ 29%] Generating /home/scr2/000000000/gwnum/r4delay5_p4tp.o
[ 30%] Generating /home/scr2/000000000/gwnum/r4delay_cor.o
[ 31%] Generating /home/scr2/000000000/gwnum/r4delay_k10.o
[ 32%] Generating /home/scr2/000000000/gwnum/r4delay_k8.o
[ 34%] Generating /home/scr2/000000000/gwnum/r4delay_p4.o
[ 35%] Generating /home/scr2/000000000/gwnum/r4delay_p4tp.o
[ 36%] Generating /home/scr2/000000000/gwnum/r4dwpn3_cor.o
[ 37%] Generating /home/scr2/000000000/gwnum/r4dwpn3_k10.o
[ 39%] Generating /home/scr2/000000000/gwnum/r4dwpn3_k8.o
[ 40%] Generating /home/scr2/000000000/gwnum/r4dwpn3_p4.o
[ 41%] Generating /home/scr2/000000000/gwnum/r4dwpn5_cor.o
[ 43%] Generating /home/scr2/000000000/gwnum/r4dwpn5_k10.o
[ 44%] Generating /home/scr2/000000000/gwnum/r4dwpn5_k8.o
[ 45%] Generating /home/scr2/000000000/gwnum/r4dwpn5_p4.o
[ 46%] Generating /home/scr2/000000000/gwnum/r4dwpn_core.o
[ 48%] Generating /home/scr2/000000000/gwnum/r4dwpn_k10.o
[ 49%] Generating /home/scr2/000000000/gwnum/r4dwpn_k8.o
[ 50%] Generating /home/scr2/000000000/gwnum/r4dwpn_p4.o
[ 51%] Generating /home/scr2/000000000/gwnum/timeit.o
[ 53%] Generating /home/scr2/000000000/gwnum/xmult1ax.o
[ 54%] Generating /home/scr2/000000000/gwnum/xmult2.o
[ 55%] Generating /home/scr2/000000000/gwnum/xmult2a_cor.o
[ 56%] Generating /home/scr2/000000000/gwnum/xmult2a_k8.o
[ 58%] Generating /home/scr2/000000000/gwnum/xmult2ax.o
[ 59%] Generating /home/scr2/000000000/gwnum/xmult3.o
[ 60%] Generating /home/scr2/000000000/gwnum/xmult3a_cor.o
[ 62%] Generating /home/scr2/000000000/gwnum/xmult3a_k8.o
[ 63%] Generating /home/scr2/000000000/gwnum/xmult3ax.o
[ 64%] Generating /home/scr2/000000000/gwnum/hg_p4tp.o
[ 65%] Generating /home/scr2/000000000/gwnum/mult1.o
[ 67%] Generating /home/scr2/000000000/gwnum/mult1aux.o
[ 68%] Generating /home/scr2/000000000/gwnum/mult2.o
[ 69%] Generating /home/scr2/000000000/gwnum/mult2a.o
[ 70%] Generating /home/scr2/000000000/gwnum/mult2aux.o
[ 72%] Generating /home/scr2/000000000/gwnum/mult2p.o
[ 73%] Generating /home/scr2/000000000/gwnum/mult3.o
[ 74%] Generating /home/scr2/000000000/gwnum/mult3a.o
[ 75%] Generating /home/scr2/000000000/gwnum/mult3ap.o
[ 77%] Generating /home/scr2/000000000/gwnum/mult3p.o
[ 78%] Generating /home/scr2/000000000/gwnum/mult4.o
[ 79%] Generating /home/scr2/000000000/gwnum/mult4p.o
[ 81%] Generating /home/scr2/000000000/gwnum/r4_p4tp.o
[ 82%] Generating /home/scr2/000000000/gwnum/r4delay_p4t.o
[ 83%] Generating /home/scr2/000000000/gwnum/r4dwpn3_p4t.o
[ 84%] Generating /home/scr2/000000000/gwnum/r4dwpn5_p4t.o
Linking CXX static library libgwnum.a
[ 93%] Built target gwnum
Linking CXX executable mprime
[100%] Built target mprime[/CODE]I'm dynamically linking aganst [FONT=Courier New]libcurl[/FONT], so the output from [FONT=Courier New]ldd[/FONT] is
[CODE] linux-gate.so.1 => (0xb7744000)
libstdc++.so.6 => /usr/lib/gcc/i686-pc-linux-gnu/4.4.3/libstdc++.so.6 (0xb7631000)
libcurl.so.4 => /usr/lib/libcurl.so.4 (0xb75e5000)
libm.so.6 => /lib/libm.so.6 (0xb75c0000)
libgcc_s.so.1 => /usr/lib/gcc/i686-pc-linux-gnu/4.4.3/libgcc_s.so.1 (0xb75a4000)
libc.so.6 => /lib/libc.so.6 (0xb745d000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7444000)
/lib/ld-linux.so.2 (0xb7745000)
libldap-2.3.so.0 => /usr/lib/libldap-2.3.so.0 (0xb7411000)
librt.so.1 => /lib/librt.so.1 (0xb7408000)
libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0xb73d8000)
libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0xb732a000)
libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0xb7305000)
libcom_err.so.2 => /lib/libcom_err.so.2 (0xb7301000)
libresolv.so.2 => /lib/libresolv.so.2 (0xb72ec000)
libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0xb72a7000)
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0xb7167000)
libdl.so.2 => /lib/libdl.so.2 (0xb7163000)
libz.so.1 => /lib/libz.so.1 (0xb7150000)
liblber-2.3.so.0 => /usr/lib/liblber-2.3.so.0 (0xb7143000)
libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0xb713b000)[/CODE]

Prime95 2011-02-07 23:11

[QUOTE=NBtarheel_33;251672]What does "fixed in next release" mean? Fixed in v26.5 or not until v27?

And when will you be implementing the option for worktodo of "test only exponents guaranteed to yield Mersenne primes"? :P[/QUOTE]

In this case it means fixed in 26.5.

I'm working on your second suggestion, but it will require several years of testing on my own computers.

James Heinrich 2011-02-07 23:12

Works much better with regard to my complaints about workers not starting while [i]worktodo[/i] is locked and/or communicating with the server. Thanks!

Prime95 2011-02-07 23:13

[QUOTE=tichy;251674]So, what is the output of [FONT=Courier New]pkg-config --static --libs libcurl[/FONT] and what specificaly fails when attempting the build ?[/QUOTE]

pkg-config gives a big long list. Specifically, the linker complains that it cannot find libgssapi_krb5.

Prime95 2011-02-07 23:15

[QUOTE=henryzz;251693]Wasn't a static version version available for 24.14? Surely that would solve the problem for most people.[/QUOTE]

That sure would be nice, but Linux has decided some of the routines that are called can only be dynamically linked in. I'm sure there was a good reason, but I can't fathom what it might be.

Rhyled 2011-02-08 02:07

Mildly interesting benchmark report error
 
I ran a couple of benchmarks against the new 26.5 - 64 bit Windows version today. When reviewing my benchmarks, they are reported as version 26.4 build 1
[CODE]
Rhyled 2011-02-07 Intel Core i7 920 @ 2.67GHz Windows64,Prime95,v26.4,build 1 3620 11.05 14.39 17.57 21.29[/CODE]
Although my Results file says 26.5 as expected
[CODE]
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
CPU speed: 3801.14 MHz, 4 cores
CPU features: Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 26.5, RdtscTiming=1
Best time for 768K FFT length: 8.07 ms., avg: 8.16 ms.
Best time for 896K FFT length: 9.96 ms., avg: 10.08 ms.
Best time for 1024K FFT length: 10.90 ms., avg: 11.05 ms.
Best time for 1280K FFT length: 14.31 ms., avg: 14.39 ms.
Best time for 1536K FFT length: 17.35 ms., avg: 17.57 ms.
Best time for 1792K FFT length: 21.18 ms., avg: 21.29 ms.
[/CODE]

It never has reported the actual clock speed at least since 25.11, but I've gotten used to that. It assumes I'm using 20x 180 rather than the actual 21x 180

sdbardwick 2011-02-08 02:24

I'm confused; 180*21=3780, which is very close to the 3800 reported by Prime95.

Rhyled 2011-02-08 02:59

[QUOTE=sdbardwick;251750]I'm confused; 180*21=3780, which is very close to the 3800 reported by Prime95.[/QUOTE]

The value in my Results.txt file is correct (my error, it's actually 21x 181 = 3801 MHz) Somehow, that shows up as 3620 in the benchmark page (scroll my first code window to the right) and also as 3620 when I look at my CPU statistics.
[CODE]
Name CPU Model / Software GHz
[URL="http://www.mersenne.org/editcpu/?g=c87a4a4b48005bbdc12e6c1accac4af4"]Rhyled[/URL] Intel Core i7 920 @ 2.67GHz Windows64,v26.4,build 1 3.620
[/CODE]

The reporting was accurate when I used a 20x 200 multiplier, but I couldn't keep it stable at acceptable temperatures much longer than a benchmark run.

No biggie - it doesn't affect my GHz-day count one way or the other. My guess is that the calculation sees that I have turboboost enabled on the processor and deducts 1 from the multiplier even though CPU-Z reports the full 3801 while Prime 95 is running.

Christenson 2011-02-08 04:36

Hi Prime95...
Elsewhere, you suggested I try 26.4 for not getting six cores automagically recognized...I haven't been able to do that just yet...should I just jump to 26.5 and try that?

Meanwhile, six cores crunch away...real LL results in a month or so....patience, patience...

Commaster 2011-02-08 05:13

[B]Rhyled[/B], Core-i7 920 (rev.D0) running rock-solid at 20*200 at my place (crashes instantly, if I enable TurboBoost and it switches on). I'm using the ThermalRight U120 AirCooler.
Although, some reviews state an opinion, that by rising 185->200 on BaseClock, we increase the power consumption by 50%...

Prime95 2011-02-08 05:30

[QUOTE=Christenson;251760]Elsewhere, you suggested I try 26.4 for not getting six cores automagically recognized...I haven't been able to do that just yet...should I just jump to 26.5 and try that?[/QUOTE]

Go straight to 26.5 unless you're using Linux 32-bit. I hope to get that built tomorrow.

Prime95 2011-02-08 15:00

[QUOTE=Prime95;251664]I tried installing 32-bit Debian and soon descended into libcurl dependency hell.[/QUOTE]

I found my notes about how to get out of dependency hell. The 32-bit Linux version is now available!

The 32-bit version was built on Ubuntu 10.04, the 64-bit version was built on Ubuntu 10.10. If needed I can install 64-bit Ubuntu 10.04 and rebuild the 64-bit version. Just let me know if you run into glibc problems.

James Heinrich 2011-02-08 18:45

I'll repeat my minor feature request for the next version:
When doing P-1 work as [i]Pfactor=[/i], it reports like this:[code][Feb 8 13:37] Optimal P-1 factoring of M58020713 using up to 4608MB of memory.
[Feb 8 13:37] Assuming no factors below 2^69 and 2 primality tests saved if a factor is found.
[Feb 8 13:37] Optimal bounds are B1=685000, B2=18837500
[Feb 8 13:37] Chance of finding a factor is an estimated 6.38%[/code]Which is good. But if I specify the bounds with [i]Pminus1=[/i], it does not give the probability:[code][Feb 8 13:37] P-1 on M334000013 with B1=8250000, B2=250000000[/code]That's all it says. Can you please add the line that says "Chance of finding a factor is an estimated 7.39%"?

tichy 2011-02-08 19:09

[SPOILER]Shouldn't the gwnum.a library contain the gwnum.o object ? I tried the official makefiles and it seems that its absence is causing ld to fail. OTOH, when I use the cmake build system, where gwnum.c is compiled in the process, everything links ok.[/SPOILER]
My bad - forgot about the makefile in the gwnum folder.

tichy 2011-02-08 20:25

Maybe the libcurl static linking dependencies problems could be mitigated by having mprime sources being distributed along with libcurl ones as a single archive. There are many projects employing such scheme, so maybe mprime could follow such approach too. Bulding mprime would involve creating static libcurl along the way.

Prime95 2011-02-08 22:52

[QUOTE=James Heinrich;251823]I'll repeat my minor feature request for the next version. Can you please add the line that says "Chance of finding a factor is an estimated 7.39%"?[/QUOTE]

That would involve changing the P-1 worktodo line. I need to know how far the number has been factored to give an accurate probability.

James Heinrich 2011-02-08 23:08

[QUOTE=Prime95;251850]That would involve changing the P-1 worktodo line. I need to know how far the number has been factored to give an accurate probability.[/QUOTE]Any harm in adding that as an optional 8th parameter? (Pminus1=[id,]k,b,n,c,B1,B2[,tf])

Christenson 2011-02-09 13:25

[QUOTE=Christenson;251760]Hi Prime95...
Elsewhere, you suggested I try 26.4 for not getting six cores automagically recognized...I haven't been able to do that just yet...should I just jump to 26.5 and try that?

Meanwhile, six cores crunch away...real LL results in a month or so....patience, patience...[/QUOTE]

[QUOTE=Prime95;251763]Go straight to 26.5 unless you're using Linux 32-bit. I hope to get that built tomorrow.[/QUOTE]

On a six-core, 64 bit machine, I need help understanding why I might want to fool with 32 bit....except as a developer of 32-bit things.:batalov:
:busy: :cmd:

James Heinrich 2011-02-09 13:33

[QUOTE=Christenson;251929]On a six-core, 64 bit machine, I need help understanding why I might want to fool with 32 bit[/QUOTE]You don't. 32-bit is only for ancient beasts that can't run 64-bit.

Wizzard 2011-02-09 15:21

Stupid question, but, why is 26.5 size 25 MB and 25.11 only 5 MB? :)

Brain 2011-02-09 17:14

[QUOTE=Wizzard;251938]Stupid question, but, why is 26.5 size 25 MB and 25.11 only 5 MB? :)[/QUOTE]
More optimizations. This question had been answered before somewhere else in this forum in more detail.

James Heinrich 2011-02-09 17:32

[QUOTE=Wizzard;251938]Stupid question, but, why is 26.5 size 25 MB and 25.11 only 5 MB? :)[/QUOTE]Previously discussed in other v25.x threads, such as here:
[url]http://www.mersenneforum.org/showthread.php?p=238089#post238089[/url]

ET_ 2011-02-09 18:25

[QUOTE=Prime95;251804]I found my notes about how to get out of dependency hell. The 32-bit Linux version is now available!

The 32-bit version was built on Ubuntu 10.04, the 64-bit version was built on Ubuntu 10.10. If needed I can install 64-bit Ubuntu 10.04 and rebuild the 64-bit version. Just let me know if you run into glibc problems.[/QUOTE]


Ubuntu 9.10 64-bit

[code]
./mprime: /lib/libc.so.6: version `GLIBC_2.11' not found (required by ./mprime)
[/code]

:no:

Luigi

James Heinrich 2011-02-09 18:38

Completion percentage is certainly not right:[code][Feb 9 10:00] Waiting 5 seconds to stagger worker starts.
[Feb 9 10:00] Worker starting
[Feb 9 10:00] Setting affinity to run worker on logical CPUs 2,3
[Feb 9 10:00] Optimal P-1 factoring of M58021673 using up to 4608MB of memory.
[Feb 9 10:00] Assuming no factors below 2^70 and 2 primality tests saved if a factor is found.
[Feb 9 10:00] Optimal bounds are B1=660000, B2=16830000
[Feb 9 10:00] Chance of finding a factor is an estimated 5.62%
[Feb 9 10:00] Using Core2 type-3 FFT length 3M, Pass1=1K, Pass2=3K, 2 threads
[Feb 9 10:00] Setting affinity to run helper thread 1 on logical CPUs 2,3
[Feb 9 10:00] Available memory is 1494MB.
[Feb 9 10:00] Using 1472MB of memory. Processing 51 relative primes (384 of 480 already processed).
[Feb 9 10:05] M58021673 stage 2 is 91.868590% complete.
[Feb 9 10:19] M58021673 stage 2 is 92.737674% complete. Time: 867.268 sec.
[Feb 9 10:35] M58021673 stage 2 is 93.600987% complete. Time: 938.293 sec.
[Feb 9 10:50] M58021673 stage 2 is 94.456527% complete. Time: 900.866 sec.
[Feb 9 11:05] M58021673 stage 2 is 95.374253% complete. Time: 911.726 sec.
[Feb 9 11:20] M58021673 stage 2 is 96.330139% complete. Time: 912.869 sec.
[Feb 9 11:35] M58021673 stage 2 is 97.283906% complete. Time: 926.810 sec.
[Feb 9 11:51] M58021673 stage 2 is 98.237672% complete. Time: 945.193 sec.
[Feb 9 12:06] M58021673 stage 2 is 99.200625% complete. Time: 883.005 sec.
[Feb 9 12:21] M58021673 stage 2 is 100.000000% complete. Time: 883.203 sec.
[Feb 9 12:35] M58021673 stage 2 is 100.000000% complete. Time: 872.055 sec.
[Feb 9 12:47] Available memory is 1476MB.
[Feb 9 12:47] Using 1326MB of memory. Processing 45 relative primes (435 of 480 already processed).
[Feb 9 12:49] M58021673 stage 2 is 100.000000% complete. Time: 844.224 sec.
[Feb 9 13:04] M58021673 stage 2 is 100.000000% complete. Time: 882.639 sec.
[Feb 9 13:19] M58021673 stage 2 is 100.000000% complete. Time: 873.327 sec.
[Feb 9 13:33] M58021673 stage 2 is 100.000000% complete. Time: 880.313 sec.[/code]

Prime95 2011-02-09 19:02

[QUOTE=ET_;251963]
[code]
./mprime: /lib/libc.so.6: version `GLIBC_2.11' not found (required by ./mprime)
[/code][/QUOTE]

Do you think if I download Ubuntu 10.04 and build 64-bit mprime that it will run properly on your Ubuntu 9.10 system?

Linux folks, help me out here! Why is it so hard to build one mprime executable that runs on all Linux variants? Or is it simply that I adopted the bleeding edge 10.10 release too soon?

tichy 2011-02-09 19:33

[QUOTE=Prime95;251970]Do you think if I download Ubuntu 10.04 and build 64-bit mprime that it will run properly on your Ubuntu 9.10 system?

Linux folks, help me out here! Why is it so hard to build one mprime executable that runs on all Linux variants? Or is it simply that I adopted the bleeding edge 10.10 release too soon?[/QUOTE]

Ad 1. If the glibc versions are different and not backward compatible then problaby not.

Ad 2. I don't know if it is hard in a general sense - I'd rather say that it is hard doing it the way you do it. If I had to do it, that is to prepare a staticaly linked binary, I'd separate the build process from the actual linux distribution used. I think this can be achieved by praparing a cross-target build environment employing custom toolchain based on selected glibc/gcc/(kernel ?) combo. Of course once you make a decision on the oldest glibc version supported (which would be the baseline for your custom toolchain) it will imply a clear cut for users having older installations, but then they can always make their own builds.
I think googling for topics on cross-compilation would bring more details on that - there are many of these since the popularity of embedded deviced programming environments.
I have some printouts in the cabinet at work (and some practical knowledge regarding cross compiling), so if you don't mind waiting till tomorrow I can check them for you.
Of course I may be wrong will all the above, but then I hope someone can jump in here and correct me.

HTH

[EDIT] If the above is valid then an added value would be having two build enviroments for 32- and 64-bit binary on a single linux box.

Prime95 2011-02-09 20:33

[QUOTE=ET_;251963]
./mprime: /lib/libc.so.6: version `GLIBC_2.11' not found (required by ./mprime)
[/QUOTE]

I had an old Ubuntu 9.04 64-bit CD lying around. I installed that and built a new 64-bit mprime. Download it and give it a try.

Batalov 2011-02-09 20:47

[QUOTE=Prime95;251970]Linux folks, help me out here! Why is it so hard to build one mprime executable that runs on all Linux variants? Or is it simply that I adopted the bleeding edge 10.10 release too soon?[/QUOTE]
Because some distros Linux are the new Windows. :-) (In more ways than one. There was a time when kernels were even faster incompatible than recently.)

Idea: you could 'exec' curl calls to an external binary, and cut that Gordian knot. The users should be then instructed to have a working curl binary on their system. [Some programs do that with gzip, instead of linking to zlib.]

ET_ 2011-02-10 12:20

[QUOTE=Prime95;251979]I had an old Ubuntu 9.04 64-bit CD lying around. I installed that and built a new 64-bit mprime. Download it and give it a try.[/QUOTE]

Thanks, I'll test it as I get home tonight, if it's the actual linux 64-bit link on post #1!

Luigi

Rhyled 2011-02-10 14:15

Thanks for the quick fix
 
[QUOTE=Rhyled;251745]I ran a couple of benchmarks against the new 26.5 - 64 bit Windows version today. When reviewing my benchmarks, they are reported as version 26.4 build 1
[/QUOTE]

When checking my latest benchmark, I noticed that both the Prime95 version and my clock speed are now being reported correctly. Thanks for the quick work.

tichy 2011-02-10 18:35

glibc or not glibc
 
Is glibc a must ? If one would like to staticaly link C library into a binary (this is one of the options which could be considered for mprime to avoid problems encountered by George) then there are [URL="http://lists.busybox.net/pipermail/uclibc/2010-May/044057.html"]reports[/URL] that glibc is not necessarily the best choice. Instead, [URL="http://www.uclibc.org/"]uClibc[/URL] (or [URL="http://www.uclibc.org/other_libs.html"]other alternatives[/URL]) could be used as a C library and [URL="http://cxx.uclibc.org/"]uClibc++[/URL] as a C++ library.

James Heinrich 2011-02-11 01:04

1 Attachment(s)
RAM usage in worker window titles don't match what the worker is actually doing:

Prime95 2011-02-11 04:21

Build 2 is available.

tichy 2011-02-12 14:06

Missing object files
 
1 Attachment(s)
For all those who need/want to compile on Linux - attached missing object files.

James Heinrich 2011-02-12 14:35

v25.6.2 [Win64] seems to have (partially) broken [i]LowMemWhileRunning[/i]. It only notices some of the running programs, not all of them. I tried v25.6.1 again to make sure and that worked fine.[quote]LowMemWhileRunning=[b]photoshop[/b],[b]ptgui[/b],bsplayer.exe,googleearth.exe,StaxRip[/quote](bold is broken, works as expected). The only common issue I see is that both undetected programs begin with "P". Are you maybe enumerating running processes and then handling ones starting with "P" (like "Prime95") in some special way that no longer includes LowMemWhileRunning check?

Prime95 2011-02-12 16:40

[QUOTE=James Heinrich;252251]v25.6.2 [Win64] seems to have (partially) broken [i]LowMemWhileRunning[/i]. [/QUOTE]

Odd. That code hasn't changed in ages.

James Heinrich 2011-02-12 18:52

[QUOTE=Prime95;252264]Odd. That code hasn't changed in ages.[/QUOTE]Maybe the code pixies have been flipping bits when you weren't looking :smile:
Works perfectly fine in v25.6.1, works imperfectly in v25.6.2
Let me know if you want me to test anything related to this.

James Heinrich 2011-02-12 19:22

I'm not sure if it's related, but I see a bunch of "Memory allocation error. Trying again using less memory" messages overnight in results.txt, from running v25.6.2. I never had that problem with v25.6.1

ixfd64 2011-02-12 20:37

It could be some sort of software regression.

Christenson 2011-02-13 06:43

From running 25.11,, I upgraded to 26.5 on my AMD Phenom II six core beast running xubuntu 10.10.

25.11 is in a directory called ~/prime95 and running happily (except that ./mprime -dm doesn't seem to work properly; I get status but not a menu)( about 25% into 3 separate LL tests) 26.5 is in ~/Downloads/prime95_26_5.

When I run 26.5 from the 25.11 directory, without specifying a working directory, it uses the directory of the mprime image instead of the current working directlory; thus I have a bit of cleanup to do; I've been too sick this weekend to do much.
that is
~/prime95:../Downloads/prime95_26_5/mprime -md
finds the six cores no problem, but it doesn't find the existing worktodo.txt in ~/prime95.

Not sure if this is a bug, but I do find the behavior counterintuitive.

engracio 2011-02-13 20:38

Issue with SOB P-1 factoring using 26.5 build 1.

Last night I pick up a new wu for SOB using 26.5. It ran P-1 and completed with this residue. [Sat Feb 12 23:42:47 2011]
UID: engracio, 55459*2^19307926+1 completed P-1, B1=55000, B2=440000, We1: F3AE675D, AID: 00000000000000000000000000100CF8

I checked my worktodo file and it had this.
[Worker #2]
PRP=00000000000000000000000000100CF8,55459,2,19307926,1,56,2.1
PRP=00000000000000000000000000100BDB,21181,2,19471412,1

It completed 6% overnight. This morning when the computer rebooted, the client started to run P-1 again. I let it run/complete and got this residue.
[Sun Feb 13 14:07:54 2011]
UID: engracio, 55459*2^19307926+1 completed P-1, B1=55000, B2=440000, We1: F3AE675D, AID: 00000000000000000000000000100CF8

Again it still had the 56,2.1 on the worktodo file.
PRP=00000000000000000000000000100CF8,55459,2,19307926,1,56,2.1

When I restarted the client just poking around, P-1 started again.

[Feb 13 14:01] 55459*2^19307926+1 stage 2 is 72.94% complete. Time: 481.353 sec.
[Feb 13 14:07] 55459*2^19307926+1 stage 2 complete. 54462 transforms. Time: 1429.160 sec.
[Feb 13 14:07] Starting stage 2 GCD - please be patient.
[Feb 13 14:07] Stage 2 GCD complete. Time: 37.997 sec.
[Feb 13 14:07] 55459*2^19307926+1 completed P-1, B1=55000, B2=440000, We1: F3AE675D
[Feb 13 14:07] Setting affinity to run helper thread 1 on any logical CPU.
[Feb 13 14:07] Resuming PRP test of 55459*2^19307926+1 using all-complex AMD K10 type-3 FFT length 1728K, Pass1=384, Pass2=4608, 2 threads
[Feb 13 14:07] Iteration: 1236008 / 19307941 [6.40%].
[Feb 13 14:09] Iteration: 1240000 / 19307941 [6.42%]. Per iteration time: 0.024 sec.
[Feb 13 14:10] Stopping PRP test of 55459*2^19307926+1 at iteration 1243193 [6.43%]
[Feb 13 14:10] Worker stopped.
[Feb 13 14:11] Waiting 5 seconds to stagger worker starts.
[Feb 13 14:11] Worker starting
[Feb 13 14:11] Setting affinity to run worker on any logical CPU.
[Feb 13 14:11] Optimal P-1 factoring of 55459*2^19307926+1 using up to 1536MB of memory.
[Feb 13 14:11] Assuming no factors below 2^56 and 2.1 primality tests saved if a factor is found.
[Feb 13 14:11] Optimal bounds are B1=55000, B2=440000
[Feb 13 14:11] Chance of finding a factor is an estimated 0.349%
[Feb 13 14:11] Using all-complex AMD K10 type-3 FFT length 1728K, Pass1=384, Pass2=4608, 2 threads
[Feb 13 14:11] Setting affinity to run helper thread 1 on any logical CPU.
[Feb 13 14:15] 55459*2^19307926+1 stage 1 is 12.61% complete. Time: 246.959 sec.
[Feb 13 14:15] Worker stopped.

An Mxxxxx_xxxxxx residue is also on the work folder. I moved the wu below until ideas are suggested. Any ideas?

Prime95 2011-02-13 22:23

[QUOTE=engracio;252383]Issue with SOB P-1 factoring using 26.5 build 1. Any ideas?[/QUOTE]

This is a side effect of a bug fix in 26.4. I've coded up a fix for 26.5 build 3.

engracio 2011-02-13 23:08

[QUOTE=Prime95;252396]This is a side effect of a bug fix in 26.4. I've coded up a fix for 26.5 build 3.[/QUOTE]

Thanks George, I was able to reproduce the issue on a different computer except I ran 26.4 first with no issue, then 26.5 where 56,2.1 was added and not removed after completion of P-1. The wu did not restart P-1 when I manually deleted 26,2.1 add on.:smile:

rogue 2011-02-14 20:48

gwnum v26.5 in PFGW gives 885176830*3^5-1 as PRP, when it can be factored as 101207*2125327. There are a couple more if you care, but I suspect that a small change in gwnum could fix all three of them.

nuggetprime 2011-02-14 21:56

[QUOTE=rogue;252499]gwnum v26.5 in PFGW gives 885176830*3^5-1 as PRP, when it can be factored as 101207*2125327. There are a couple more if you care, but I suspect that a small change in gwnum could fix all three of them.[/QUOTE]
The number you mention is a base-3 pseudoprime. I validated that using PARI-GP. Test with -b5 and you see it's composite.

Nugget

S34960zz 2011-02-15 12:04

est completion time Win7x64 v25.11/v26.5b2 i7 Q840
 
Win7 Pro x64, Prime95 x64
From the Benchmarks page: Intel Core i7 Q 840 @ 1.87GHz [correct, with 16GB RAM, 10GB allowed] Windows64,Prime95,v25.11,build 2 (p64v2511.zip) Windows64,Prime95,v26.5,build 2 (p64v265[b2].zip)
The estimated completion times (Test, Status) seem to be underestimated significantly for this machine, compared with the per-iteration timings.

I have been watching the soon-approaching estimated completion date for a double-check, and it keeps moving further into the future, disproportionate to the amount of time the process is not running (machine off or Prime95 turned off).

Prime95 gets almost 100% of this machine; except I suppress two of the workers (one core) when running VMware for real work. Also, v26.5b2 also reports a benchmark Speed (MHz) that is incorrect.

+++++ I took the following measurements this morning.

For a double-check M26xxxxxx, I have about 4900000 = 4.9e6 iterations remaining. v25.11b2 reports approx. 0.088 sec/iter, measured, in the status window. v26.5b2 reports approx. 0.077 sec/iter (14% faster!), measured, in the status window.

Running 24h/day, v25.11b2 reports a completion time: (Fri. 18-Feb-2011 02:30) - (Tue. 15-Feb-2011 04:02) = 70.47 hours = 253700 sec.

Measured: 4.9e6 iter * 0.088 sec/iter = 431200 sec. = 119.78 hours and then: (est)/(meas) = 0.59 Running 24h/day, v26.5b2 reports a completion time: (Wed. 16-Feb-2011 22:14) - (Tue. 15-Feb-2011 05:00) = 41.23 hours = 148,400 sec.

Measured: 4.9e6 iter * 0.077 sec/iter = 377300 sec. = 104.81 hours and then: (est)/(meas) = 0.39 +++++

The benchmark timings reported appear consistent with the ~14% FFT timing improvement. However, v26.5b2 mis-reports the processor speed at 3192 MHz; v25.11b2 gets it right at 1862 MHz. +++++

rogue 2011-02-15 13:51

[QUOTE=rogue;252499]gwnum v26.5 in PFGW gives 885176830*3^5-1 as PRP, when it can be factored as 101207*2125327. There are a couple more if you care, but I suspect that a small change in gwnum could fix all three of them.[/QUOTE]

Nevermind. These were tested as PRP with GMP not gwnum.

S34960zz 2011-02-15 15:11

Apparently something ate all the linebreaks in my post #57 above. And, there's no EDIT button alongside for some reason, though the box down below says I _may_ edit my posts), so I can't fix it. Apologies.

Though, this quick-reply offers me an EDIT button. Interesting.

James Heinrich 2011-02-15 16:27

I believe you can edit your own posts for about an hour or so, after that they're locked.

Prime95 2011-02-15 16:27

[QUOTE=S34960zz;252560]v26.5b2 mis-reports the processor speed at 3192 MHz; v25.11b2 gets it right at 1862 MHz.[/QUOTE]

Do you know if your CPU runs faster if just one core is busy? If so, at what speed does that one core run?

S34960zz 2011-02-15 17:51

[QUOTE=Prime95;252576]Do you know if your CPU runs faster if just one core is busy? If so, at what speed does that one core run?[/QUOTE]

v25.11b2, if I stop all 8 workers and re-start just Worker #7, M26xxxxxx double-check, the resource monitor shows CPU 6 at 100% usage and 113-114% of Maximum Frequency.

v26.5b2, same thing.

So, perhaps that accounts for the difference between v25.11b2 and v26.5b2 benchmark times. But it should not affect the per-iteration times, as those were examined with all 8 cores running a Worker. Just re-verified. With all 8 cores running a worker, 100% CPU usage, 99% Maximum Frequency, the v26.5b2 M26xxxxxx double-check is running 0.080 to 0.082 sec/iteration, compared to v25.11b2 running 0.087 to 0.088 sec/iteration.

Corrected from earlier mangled post:
+++++

I took the following measurements this morning.
For a double-check M26xxxxxx, I have about 4900000 = 4.9e6 iterations remaining.

v25.11b2 reports approx. 0.088 sec/iter, measured, in the status window.
v26.5b2 reports approx. 0.082 sec/iter (7% faster!), measured, in the status window. [updated]

Running 24h/day, v25.11b2 reports a completion time:
(Fri. 18-Feb-2011 02:30) - (Tue. 15-Feb-2011 04:02) = 70.47 hours = 253700 sec.
Measured: 4.9e6 iter * 0.088 sec/iter = 431200 sec. = 119.78 hours
and then: (est)/(meas) = 0.59

Running 24h/day, v26.5b2 reports a completion time:
(Wed. 16-Feb-2011 22:14) - (Tue. 15-Feb-2011 05:00) = 41.23 hours = 148,400 sec.
Measured: 4.9e6 iter * 0.082 sec/iter = 401800 sec. = 111.61 hours [updated per-iter time]
and then: (est)/(meas) = 0.37 [updated ratio]

+++++

xilman 2011-02-15 18:42

[QUOTE=S34960zz;252567]Apparently something ate all the linebreaks in my post #57 above. And, there's no EDIT button alongside for some reason, though the box down below says I _may_ edit my posts), so I can't fix it. Apologies.

Though, this quick-reply offers me an EDIT button. Interesting.[/QUOTE]I added a few line breaks and hope that they are approximately where you wanted them.

Prime95 2011-02-15 23:07

[QUOTE=S34960zz;252581] (est)/(meas) = 0.37 [/QUOTE]

I'm not too worried about this. When 26/5 detected a new CPU speed it reset the RollingAverage. After a week or two, the rolling average should rise so that your time estimates are wore accurate.

In 26.5b3 I've added code to not reset the rolling average. I've also disabled the new CPU speed code for your CPU that was designed for primarily for Sandy Bridge cpus.

S34960zz 2011-02-16 00:49

[QUOTE=Prime95;252622]I'm not too worried about this. When 26/5 detected a new CPU speed it reset the RollingAverage. After a week or two, the rolling average should rise so that your time estimates are wore accurate.

In 26.5b3 I've added code to not reset the rolling average. I've also disabled the new CPU speed code for your CPU that was designed for primarily for Sandy Bridge cpus.[/QUOTE]

Sounds good. [In fact, the CPU speed reported during the Benchmark may have happened because I ran the benchmark immediately after starting the new version. While looking at local.txt for "second comment" (next), I note that the cpu speed does get reset to the 1862 MHz value quite soon.]

A second comment, regarding first startup when moving from v25.11b2 to v26.5b2. If the following is expected behavior, that's fine.

It appears that the v26.5b2 version makes new assumptions on the best CPU affinity/allocation, different from what might have been happening previously (no special settings in place, no "NumCPUs=" line). This results in a message about the worktodo.txt file.

I copied the v25.11b2 Prime95 working directory and, in the copy, replaced the v25.11b2 executable with the v26.5b2 prime95.exe. When the new program is run (with no changes at all to the old parameter and data files), I get:
[code]
[Main thread Feb 15 18:46] Mersenne number primality test program version 26.5
[Main thread Feb 15 18:46] Too many sections in worktodo.txt. Line #13
[Main thread Feb 15 18:46] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Comm thread Feb 15 18:46] Exchanging program options with server
[Main thread Feb 15 18:46] Starting workers.
[Main thread Feb 15 18:46] Too many sections in worktodo.txt. Line #13
[Comm thread Feb 15 18:46] Done communicating with server.
[/code]In the worktodo.txt file, there are "[Worker #n]" lines, n = 1 to 8 sequentially, each followed by a "Test=" line and a blank line (n = 1:6, 8) and followed by two "DoubleCheck=" lines and a blank line (n = 7). Line 13 was:
[Worker #5]

So, the new version preferred to use only 4 workers, each assigned to logical CPUs j and j+1, and when it sees the (first) extra worker line it prints a warning.

The original local.txt file contained:
...
Affinity=100
ThreadsPerTest=1
...
SrvrPO9=8
SrvrP00=2
...

Changing "Test, Worker Windows ..., Number of worker windows to run:" to 8 results in local.txt containing:
...
SrvrPO9=4
SrvrP00=3
...
WorkerThreads=8

I did have to Stop then Continue for all 8 threads to begin computing.

Prime95 2011-02-16 00:56

[QUOTE=S34960zz;252629]
So, the new version preferred to use only 4 workers, each assigned to logical CPUs j and j+1[/QUOTE]

Prime95 will probably get better throughput using this configuration. You may want to consider merging your worktodo.txt down to 4 worker threads. It's your choice.

Prime95 2011-02-16 01:31

[QUOTE=S34960zz;252560]However, v26.5b2 mis-reports the processor speed at 3192 MHz; v25.11b2 gets it right at 1862 MHz. +++++[/QUOTE]

According to [url]http://ark.intel.com/Product.aspx?id=43125[/url] your max CPU speed is 3.2 GHz with one core running. So the new speed detection code worked - kinda.

Prime95 2011-02-18 22:49

Build 3 is now available. If you have a multi-core hyperthreaded machine please test this version out. Let me know if it properly detects which logical cpus constitute a physical CPU. A message is output at startup.

If it isn't working, add the line "DebugAffinityScramble=1" to prime.txt and post the startup output.

James Heinrich 2011-02-18 23:26

Seems to work properly for an i7-920:[quote]Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
Logical CPUs 0,1 form one physical CPU.
Logical CPUs 2,3 form one physical CPU.
Logical CPUs 4,5 form one physical CPU.
Logical CPUs 6,7 form one physical CPU.[/quote]

And the LowMemWhileRunning issue is fixed! :smile:
[quote]Entering a period of low memory usage because C:\PROGRAM FILES\PTGUI\PTGUI.EXE is running.
Entering a period of low memory usage because C:\PROGRAM FILES\ADOBE\ADOBE PHOTOSHOP CS5 (64 BIT)\PHOTOSHOP.EXE is running.[/quote]

drh 2011-02-18 23:54

Works on my Quad Core X5355

Optimizing for CPU architecture: Core 2, L2 cache size: 4 MB
Logical CPUs 0,1 form one physical CPU.
Logical CPUs 2,3 form one physical CPU.
Logical CPUs 4,5 form one physical CPU.
Logical CPUs 6,7 form one physical CPU.

drh 2011-02-19 00:10

Also works on my i7 Q720 Quad Core

Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 6 MB
Logical CPUs 0,1 form one physical CPU.
Logical CPUs 2,3 form one physical CPU.
Logical CPUs 4,5 form one physical CPU.
Logical CPUs 6,7 form one physical CPU.

drh 2011-02-19 00:19

On my i7 machine, I just noticed my worker windows have the following lines in them ...

[Feb 18 19:05] Setting affinity to run worker on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 18 19:05] Setting affinity to run helper thread 1 on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 18 19:05] Resuming primality test of M49025983 using Core2 type-3 FFT length 2560K, Pass1=640, Pass2=4K, 2 threads
[Feb 18 19:05] Iteration: 23493594 / 49025983 [47.92%].

On my X5355 machine, I'm running 8 worker windows, and the CPU's match the worker windows.

Prime95 2011-02-19 00:42

[QUOTE=James Heinrich;252977]And the LowMemWhileRunning issue is fixed! :smile:[/QUOTE]

Alas, I did nothing to fix this. I could not reproduce the problem. It is a subtle bug, sure to reveal itself again at a later date.

engracio 2011-02-19 00:50

[QUOTE=Prime95;252396]This is a side effect of a bug fix in 26.4. I've coded up a fix for 26.5 build 3.[/QUOTE]

Upgraded to 26.5b3 and downloaded new wu. Worktodo.txt was updated properly when P-1 completed for a PRP test.

Thanks.:smile:

e

Prime95 2011-02-19 01:42

[QUOTE=drh;252983]On my i7 machine, I just noticed my worker windows have the following lines in them ...

[Feb 18 19:05] Setting affinity to run worker on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 18 19:05] Setting affinity to run helper thread 1 on logical CPUs 0,1,2,3,4,5,6,7,31[/QUOTE]

It looks like you have 8 worker windows each with a helper thread (16 threads total). This is very non-optimal. You likely better off just running 4 worker windows with no helper threads.

I'll investigate the mysterious cpu 31

drh 2011-02-19 02:10

Sorry, more specifically, on my i7, I'm running 4 worker windows, each with a helper, (8 threads total), not 8 windows with 8 helpers. This can be seen by my post #71.

Prime95 2011-02-19 03:12

[QUOTE=drh;252990]Sorry, more specifically, on my i7, I'm running 4 worker windows, each with a helper, (8 threads total), not 8 windows with 8 helpers. This can be seen by my post #71.[/QUOTE]

Did you set the affinity for each thread to "run on any cpu"? That is the only explanation I have for the output you are seeing. (I've coded up a fix for the CPU #31 bug).

Two things you should do to try and get better throughput:

1) See if you get better timings with smart-affinity rather than run-on-any-CPU
2) See if you get better timings by not using a helper thread for each worker.

drh 2011-02-19 03:34

[QUOTE=Prime95;252993]Did you set the affinity for each thread to "run on any cpu"? That is the only explanation I have for the output you are seeing. (I've coded up a fix for the CPU #31 bug).[/QUOTE]

I checked my settings, and for some reason, Worker #1 is set to "run on any CPU", and Workers #2-4 are set to "Smart Assignment." Unintentional, but should they be the same? I'll modify this first, if need be, prior to testing without the helpers. I'm getting the same timings on all three LL windows.

I'm running LL's on #1,3 and 4, and P-1 on #2.

S34960zz 2011-02-19 15:12

[QUOTE=Prime95;252633]According to [URL]http://ark.intel.com/Product.aspx?id=43125[/URL] your max CPU speed is 3.2 GHz with one core running. So the new speed detection code worked - kinda.[/QUOTE]

Thanks for the link! I've looked up some of the Intel docs previously, but not seen that page. I also have not (knowingly) encountered the full max CPU speed, but not sure what enables that feature or if the manufacturer may inhibit that for thermal management (Dell Precision M6500 laptop).

Re: workers and threads. I have experimented with different numbers of threads and workers (I did a small parametric study yesterday, needed to wait for the weekend for time to play). That's actually a topic for not-this-thread, so no more.

I'll load up the v26.5b3 release soon, give it a test drive. (I'm still using v25.11 for "production" work; my M26xxxxxx double-check finished last night but didn't match the original LL *frown*). Thanks for all the efforts on Prime95 over the years --- it is a magnum opus.

S34960zz 2011-02-19 16:36

[QUOTE]I'll load up the v26.5b3 release soon, give it a test drive. (I'm still using v25.11 for "production" work; my M26xxxxxx double-check finished last night but didn't match the original LL *frown*).[/QUOTE]Win7 Pro x64
p64v265_build3.zip (note: File version says "26.5.1.0", should that value [have] change[d] for beta 3, etc.?)

From v25.11 benchmark:
Intel(R) Core(TM) i7 CPU Q 840 @ 1.87GHz
CPU speed: 1862.13 MHz, 8 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64

Experimenting with the M26 that did not match for worktodo.txt, off-line (no PrimeNet), and settings:
# worker windows: 1,
CPU affinity: run on any CPU,
CPUs to use: 4,
then V26.5 beta 3 says:
[CODE]
[Main thread Feb 19 11:00] Mersenne number primality test program version 26.5
[Main thread Feb 19 11:00] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread Feb 19 11:00] Logical CPUs 0,1 form one physical CPU.
[Main thread Feb 19 11:00] Logical CPUs 2,3 form one physical CPU.
[Main thread Feb 19 11:00] Logical CPUs 4,5 form one physical CPU.
[Main thread Feb 19 11:00] Logical CPUs 6,7 form one physical CPU.
[Main thread Feb 19 11:04] Starting worker.
[Main thread Feb 19 11:11] Stopping all worker threads.
[Main thread Feb 19 11:11] Execution halted.
[Main thread Feb 19 11:11] Choose Test/Continue to restart.
and
[Feb 19 11:04] Worker starting
[Feb 19 11:04] Setting affinity to run worker on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 19 11:04] Setting affinity to run helper thread 1 on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 19 11:04] Setting affinity to run helper thread 2 on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 19 11:04] Setting affinity to run helper thread 3 on logical CPUs 0,1,2,3,4,5,6,7,31
[Feb 19 11:04] Starting primality test of M26505287 using Core2 type-3 FFT length 1440K, Pass1=320, Pass2=4608, 4 threads
[Feb 19 11:05] Iteration: 10000 / 26505287 [0.03%]. Per iteration time: 0.009 sec.
[Feb 19 11:07] Iteration: 20000 / 26505287 [0.07%]. Per iteration time: 0.010 sec.
[Feb 19 11:08] Iteration: 30000 / 26505287 [0.11%]. Per iteration time: 0.010 sec.
[Feb 19 11:10] Iteration: 40000 / 26505287 [0.15%]. Per iteration time: 0.009 sec.
[Feb 19 11:11] Stopping primality test of M26505287 at iteration 46438 [0.17%]
[Feb 19 11:11] Worker stopped.
[/CODE]So, the same "logical CPU 31" issue mentioned previously.

And it seems to be faster, too: more-or-less 9.5 msec/iteration for v26.5b3, compared to 12.0 msec/iteration for v25.11. 99% of Maximum Frequency with the 1 worker, 4 thread.

clarke 2011-02-20 12:34

[code]
Intel Xeon CPU 3.00GHz
CPU Speed: 2992.81 MHz, 2 hyperthreaded cores
CPU features: Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 2 MB
[/code]
[code]
Affinity=100
WorkerThreads=1
ThreadsPerTest=2
[/code]
[code]
[Main thread Feb 20 15:19] Mersenne number primality test program version 26.5
[Main thread Feb 20 15:19] Optimizing for CPU architecture: Pentium 4, L2 cache size: 2 MB
[Main thread Feb 20 15:19] Logical CPUs 0,1 form one physical CPU.
[Main thread Feb 20 15:19] Logical CPUs 2,3 form one physical CPU.
[Main thread Feb 20 15:19] Starting worker.
[Work thread Feb 20 15:19] Worker starting
[Work thread Feb 20 15:19] Setting affinity to run worker on logical CPUs 0,1,2,3,31
[Work thread Feb 20 15:19] Setting affinity to run helper thread 1 on logical CPUs 0,1,2,3,31
[Work thread Feb 20 15:19] Resuming primality test of M27148801 using Pentium4 type-3 FFT length 1440K, Pass1=320, Pass2=4608, 2 threads
[Work thread Feb 20 15:19] Iteration: 20013229 / 27148801 [73.71%].
[Work thread Feb 20 15:22] Iteration: 20020000 / 27148801 [73.74%]. Per iteration time: 0.030 sec.
[Work thread Feb 20 15:27] Iteration: 20030000 / 27148801 [73.77%]. Per iteration time: 0.030 sec.
[Main thread Feb 20 15:37] Restarting all worker threads.
[Worker #1 Feb 20 15:37] Stopping primality test of M27148801 at iteration 20048464 [73.84%]
[Worker #1 Feb 20 15:37] Worker stopped.
[Main thread Feb 20 15:37] Restarting all worker threads using new settings.
[Worker #1 Feb 20 15:37] Worker starting
[Worker #1 Feb 20 15:37] Setting affinity to run worker on logical CPUs 0,1
[Worker #2 Feb 20 15:37] Waiting 5 seconds to stagger worker starts.
[Worker #1 Feb 20 15:37] Resuming primality test of M27148801 using Pentium4 type-3 FFT length 1440K, Pass1=320, Pass2=4608
[Worker #1 Feb 20 15:37] Iteration: 20048465 / 27148801 [73.84%].
[Worker #2 Feb 20 15:37] Worker starting
[Worker #2 Feb 20 15:37] Setting affinity to run worker on logical CPUs 2,3
[Worker #2 Feb 20 15:37] Stopping worker at user request.
[Worker #1 Feb 20 15:38] Iteration: 20050000 / 27148801 [73.85%]. Per iteration time: 0.048 sec.
[Worker #1 Feb 20 15:46] Iteration: 20060000 / 27148801 [73.88%]. Per iteration time: 0.048 sec.
[/code]
Prime96 x64 utilizes exact 50% of CPU resources in this 2 physical, hyperthreaded setup. I've configured to use 1 worker window and 2 CPUs to use (multithreading). Switching to 1 worker/1thread slows down iteration time from 0.030 to 0.048. CPU utilization decreased to 25%. After switching to 2 workers/1 thread the second worker haven'r received work to do automatically, after manual communication the second worker was in the "Starting" state. Restarting Prime95 relieved this and the 2nd worked started.
[code]
[Main thread Feb 20 15:52] Mersenne number primality test program version 26.5
[Main thread Feb 20 15:52] Optimizing for CPU architecture: Pentium 4, L2 cache size: 2 MB
[Main thread Feb 20 15:52] Logical CPUs 0,1 form one physical CPU.
[Main thread Feb 20 15:52] Logical CPUs 2,3 form one physical CPU.
[Main thread Feb 20 15:52] Starting workers.
[Worker #1 Feb 20 15:52] Worker starting
[Worker #1 Feb 20 15:52] Setting affinity to run worker on logical CPUs 0,1
[Worker #2 Feb 20 15:52] Waiting 5 seconds to stagger worker starts.
[Worker #1 Feb 20 15:52] Resuming primality test of M27148801 using Pentium4 type-3 FFT length 1440K, Pass1=320, Pass2=4608
[Worker #1 Feb 20 15:52] Iteration: 20067929 / 27148801 [73.91%].
[Worker #2 Feb 20 15:52] Worker starting
[Worker #2 Feb 20 15:52] Setting affinity to run worker on logical CPUs 2,3
[Worker #2 Feb 20 15:52] Trying 1000 iterations for exponent 26111857 using 1344K FFT.
[Worker #2 Feb 20 15:52] If average roundoff error is above 0.2422, then a larger FFT will be used.
[Worker #2 Feb 20 15:52] After 100 iterations average roundoff error is 0.21373.
[Worker #2 Feb 20 15:53] After 200 iterations average roundoff error is 0.214.
[Worker #2 Feb 20 15:53] After 300 iterations average roundoff error is 0.21413.
[Worker #2 Feb 20 15:53] After 400 iterations average roundoff error is 0.21405.
[Worker #2 Feb 20 15:53] After 500 iterations average roundoff error is 0.21405.
[Worker #2 Feb 20 15:53] After 600 iterations average roundoff error is 0.2138.
[Worker #2 Feb 20 15:53] After 700 iterations average roundoff error is 0.21379.
[Worker #2 Feb 20 15:53] After 800 iterations average roundoff error is 0.21396.
[Worker #2 Feb 20 15:53] After 900 iterations average roundoff error is 0.21391.
[Worker #2 Feb 20 15:53] Final average roundoff error is 0.21391, using 1344K FFT for exponent 26111857.
[Worker #2 Feb 20 15:53] Starting primality test of M26111857 using Pentium4 type-3 FFT length 1344K, Pass1=896, Pass2=1536
[Worker #1 Feb 20 15:54] Iteration: 20070000 / 27148801 [73.92%]. Per iteration time: 0.050 sec.
[Worker #2 Feb 20 16:01] Iteration: 10000 / 26111857 [0.03%]. Per iteration time: 0.049 sec.
[Worker #1 Feb 20 16:02] Iteration: 20080000 / 27148801 [73.96%]. Per iteration time: 0.050 sec.
[Worker #2 Feb 20 16:10] Iteration: 20000 / 26111857 [0.07%]. Per iteration time: 0.049 sec.
[/code]
On other computer with Core-i3 hyperthreaded CPU after switching from 1 worker/2 threads to 2 workers/1 thread added work couldn't start after restarting all workers. Of course no work for it in worktodo.txt
[code]
[Main thread Feb 20 15:36] Mersenne number primality test program version 26.5
[Main thread Feb 20 15:36] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 4 MB
[Main thread Feb 20 15:36] Starting worker.
[Work thread Feb 20 15:36] Worker starting
[Work thread Feb 20 15:36] Setting affinity to run worker on any logical CPU.
[Work thread Feb 20 15:36] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Feb 20 15:36] Resuming primality test of M25901143 using Core2 type-3 FFT length 1344K, Pass1=896, Pass2=1536, 2 threads
[Work thread Feb 20 15:36] Iteration: 25205235 / 25901143 [97.31%].
[Work thread Feb 20 15:37] Iteration: 25210000 / 25901143 [97.33%]. Per iteration time: 0.014 sec.
...
[Work thread Feb 20 15:59] Iteration: 25310000 / 25901143 [97.71%]. Per iteration time: 0.013 sec.
[Main thread Feb 20 16:00] Restarting all worker threads.
[Worker #1 Feb 20 16:00] Stopping primality test of M25901143 at iteration 25313798 [97.73%]
[Worker #1 Feb 20 16:00] Worker stopped.
[Main thread Feb 20 16:00] Restarting all worker threads using new settings.
[Worker #1 Feb 20 16:00] Worker starting
[Worker #1 Feb 20 16:00] Setting affinity to run worker on any logical CPU.
[Worker #2 Feb 20 16:00] Waiting 5 seconds to stagger worker starts.
[Worker #1 Feb 20 16:00] Resuming primality test of M25901143 using Core2 type-3 FFT length 1344K, Pass1=896, Pass2=1536
[Worker #1 Feb 20 16:00] Iteration: 25313799 / 25901143 [97.73%].
[Worker #2 Feb 20 16:00] Worker starting
[Worker #2 Feb 20 16:00] Setting affinity to run worker on any logical CPU.
[Worker #2 Feb 20 16:00] Stopping worker at user request.
[/code]
After restarting Prime95 all workers have become working.
[code]
[Main thread Feb 20 16:04] Mersenne number primality test program version 26.5
[Main thread Feb 20 16:04] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 4 MB
[Comm thread Feb 20 16:04] Exchanging program options with server
[Main thread Feb 20 16:04] Starting workers.
[Worker #1 Feb 20 16:04] Worker starting
[Worker #1 Feb 20 16:04] Setting affinity to run worker on any logical CPU.
[Worker #2 Feb 20 16:04] Waiting 5 seconds to stagger worker starts.
[Comm thread Feb 20 16:04] Getting assignment from server
[Worker #1 Feb 20 16:04] Resuming primality test of M25901143 using Core2 type-3 FFT length 1344K, Pass1=896, Pass2=1536
[Worker #1 Feb 20 16:04] Iteration: 25318443 / 25901143 [97.75%].
[Comm thread Feb 20 16:04] PrimeNet success code with additional info:
[Comm thread Feb 20 16:04] Server assigned Lucas Lehmer primality double-check work.
[Comm thread Feb 20 16:04] Got assignment A329F888B017E080133C56D9C4BEBF05: Double check M26151659
[Comm thread Feb 20 16:04] Sending expected completion date for M26151659: Feb 28 2011
[Comm thread Feb 20 16:04] Done communicating with server.
[Worker #2 Feb 20 16:04] Worker starting
[Worker #2 Feb 20 16:04] Setting affinity to run worker on any logical CPU.
[Worker #2 Feb 20 16:04] Trying 1000 iterations for exponent 26151659 using 1344K FFT.
[Worker #2 Feb 20 16:04] If average roundoff error is above 0.2422, then a larger FFT will be used.
[Worker #2 Feb 20 16:04] After 100 iterations average roundoff error is 0.22142.
[Worker #2 Feb 20 16:04] After 200 iterations average roundoff error is 0.22202.
[Worker #2 Feb 20 16:04] After 300 iterations average roundoff error is 0.2221.
[Worker #2 Feb 20 16:04] After 400 iterations average roundoff error is 0.22187.
[Worker #2 Feb 20 16:04] After 500 iterations average roundoff error is 0.22176.
[Worker #2 Feb 20 16:04] After 600 iterations average roundoff error is 0.22206.
[Worker #2 Feb 20 16:04] After 700 iterations average roundoff error is 0.22209.
[Worker #2 Feb 20 16:04] After 800 iterations average roundoff error is 0.22212.
[Worker #2 Feb 20 16:04] After 900 iterations average roundoff error is 0.22202.
[Worker #2 Feb 20 16:04] Final average roundoff error is 0.22188, using 1344K FFT for exponent 26151659.
[Worker #2 Feb 20 16:04] Starting primality test of M26151659 using Core2 type-3 FFT length 1344K, Pass1=896, Pass2=1536
[Worker #1 Feb 20 16:05] Iteration: 25320000 / 25901143 [97.75%]. Per iteration time: 0.035 sec.
[Worker #2 Feb 20 16:10] Iteration: 10000 / 26151659 [0.03%]. Per iteration time: 0.036 sec.
[/code]
In that case iteration time increased from 0.013 to 0.035. So 1 worker/2 threads in Core-i3 gives more than 2 workers/1 thread and it is opposite to the what old Xeon does.

clarke 2011-02-20 16:32

I've cocnluded too fast. Iteration time on Core-i3 got to steady 0.025
[code]
[Worker #1 Feb 20 16:10] Iteration: 25330000 / 25901143 [97.79%]. Per iteration time: 0.035 sec.
[Worker #1 Feb 20 16:16] Iteration: 25340000 / 25901143 [97.83%]. Per iteration time: 0.036 sec.
[Worker #2 Feb 20 16:16] Iteration: 20000 / 26151659 [0.07%]. Per iteration time: 0.037 sec.
[Worker #1 Feb 20 16:22] Iteration: 25350000 / 25901143 [97.87%]. Per iteration time: 0.035 sec.
[Worker #2 Feb 20 16:22] Iteration: 30000 / 26151659 [0.11%]. Per iteration time: 0.036 sec.
[Worker #1 Feb 20 16:27] Iteration: 25360000 / 25901143 [97.91%]. Per iteration time: 0.030 sec.
[Worker #2 Feb 20 16:28] Iteration: 40000 / 26151659 [0.15%]. Per iteration time: 0.031 sec.
[Worker #1 Feb 20 16:32] Iteration: 25370000 / 25901143 [97.94%]. Per iteration time: 0.026 sec.
[Worker #2 Feb 20 16:32] Iteration: 50000 / 26151659 [0.19%]. Per iteration time: 0.027 sec.
[Worker #1 Feb 20 16:36] Iteration: 25380000 / 25901143 [97.98%]. Per iteration time: 0.026 sec.
[Worker #2 Feb 20 16:36] Iteration: 60000 / 26151659 [0.22%]. Per iteration time: 0.026 sec.
[Worker #1 Feb 20 16:40] Iteration: 25390000 / 25901143 [98.02%]. Per iteration time: 0.026 sec.
[Worker #2 Feb 20 16:41] Iteration: 70000 / 26151659 [0.26%]. Per iteration time: 0.026 sec.
[Worker #1 Feb 20 16:44] Iteration: 25400000 / 25901143 [98.06%]. Per iteration time: 0.025 sec.
[Worker #2 Feb 20 16:45] Iteration: 80000 / 26151659 [0.30%]. Per iteration time: 0.026 sec.
[/code]
It seems timings of 1 worker/2 threads is very similar to 2 workers/1 thread on Core-i3. Xeon does slightly better the latter.

S34960zz 2011-02-21 04:15

[QUOTE=Prime95;252630]Prime95 will probably get better throughput using this configuration. You may want to consider merging your worktodo.txt down to 4 worker threads. It's your choice.[/QUOTE]
[QUOTE=Prime95;252988]You likely better off just running 4 worker windows with no helper threads.[/QUOTE]
[QUOTE=Prime95;252993]Two things you should do to try and get better throughput:

1) See if you get better timings with smart-affinity rather than run-on-any-CPU
2) See if you get better timings by not using a helper thread for each worker.[/QUOTE]

No surprise, these suggestions most diplomatically point the best way.

OS: Win7 Pro x64

Using: Prime95 v26.5 beta3 x64 (p64v265_build3.zip)

From v25.11 benchmark:
Intel(R) Core(TM) i7 CPU Q 840 @ 1.87GHz
CPU speed: 1862.13 MHz, 8 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB, L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes, TLBS: 64

Experiment using a previous assignment (same for all workers**):
Double-check M26505287
Core2 type-3 FFT length 1440K, Pass1=320, Pass2=4608
[CODE]
+---------+---------------------------------------+
| | Worker2 #thread |
| Worker1 |-------+-------+-------+-------+-------+
| #thread | 0 | 1 | 2 | 3 | 4 |
+---------+-------+-------+-------+-------+-------+
| 1 | --- | 0.026 | | | T_w2 |
| | 0.020 | 0.026 | symm. | symm. | T_w1 |
| | 115% | 108% | | | %MxFrq|
| | 50.0 | 76.9 | | | WU/s |
+---------+-------+-------+-------+-------+-------+
| 2 | --- | 0.032 | 0.018 | | |
| | 0.014 | 0.017 | 0.018 | symm. | symm. |
| | 110% | 103% | 99% | | |
| | 71.4 | 90.1 | 111.1 | | |
+---------+-------+-------+-------+-------+-------+
| 3 | --- | 0.036 | 0.022 | 0.023 | |
| | 0.011 | 0.013 | 0.022 | 0.023 | symm. |
| | 103% | 99% | 99% | 99% | |
| | 90.9 | 104.7 | 90.9 | 87.0 | |
+---------+-------+-------+-------+-------+-------+
| 4 | --- | 0.044 | 0.031 | 0.025 | 0.021 |
| | 0.009 | 0.016 | 0.018 | 0.019 | 0.021 |
| | 99% | 99% | 99% | 99% | 99% |
| | 111.1 | 85.2 | 87.8 | 92.6 | 95.2 |
+---------+-------+-------+-------+-------+-------+
| 5 | --- | 0.057 | 0.036 | 0.027 | |
| | 0.012 | 0.014 | 0.015 | 0.017 | n/a |
| | 100% | 99% | 99% | 99% | |
| | 83.3 | 89.0 | 94.4 | 95.9 | |
+---------+-------+-------+-------+-------+-------+
| 6 | --- | 0.062 | 0.040 | | |
| | 0.012 | 0.013 | 0.014 | n/a | n/a |
| | 99% | 99% | 99% | | |
| | 83.3 | 93.1 | 96.4 | | |
+---------+-------+-------+-------+-------+-------+
| 7 | --- | 0.075 | | | |
| | 0.011 | 0.013 | n/a | n/a | n/a |
| | 99% | 99% | | | |
| | 90.9 | 90.3 | | | |
+---------+-------+-------+-------+-------+-------+
| 8 | --- | | | | T_w2 |
| | 0.011 | n/a | na/ | n/a | T_w1 |
| | 99% | | | | %MxFrq|
| | 90.9 | | | | WU/s |
+---------+-------+-------+-------+-------+-------+
Table: Running 1 or 2 workers, #threads as shown.
CPU Affinity: Smart Assignment for 2 workers, Run Any for 1 worker.
T_w1, T_w2: time (seconds) per iteration (in this case, M26505287).
%MxFrq: %Maximum Frequency from Windows Task Manager Resource Monitor
WU/s: Work Units/second = ( 1/T_w1 + 1/T_w2 ), where
1 WU = 1 iteration (of M26505287, for this experiment)**

**Because both workers have the same assignment for this experiment
(not the usual case!), the time-per-iteration is comparable and the
inverses may be added. With different exponents, the time-per-iteration
for one of the workers would need to be adjusted to be comparable to
the other.

4 workers, 1 thread each, Smart Assign: T_w = 0.034 s/iter, throughput = 4 / T_w = 117.6 WU/s
4 workers, 2 thread each, Smart Assign: T_w = 0.041 s/iter, throughput = 4 / T_w = 97.6 WU/s

8 workers, 1 thread each, Smart Assign: T_w = 0.080 s/iter, throughput = 8 / T_w = 100.0 WU/s
8 workers, 1 thread each, Run Any ECU: Same results as Smart Assignment.[/CODE]Conclusions for an i7 with 4-core, 2 logical CPU/core:
========================================================
4 worker with 1 thread: 117.6 WU/s = best throughput (overall, 4 assignments)
1 worker with 4 thread: 111.1 WU/s = best throughput (single assignment)
2 worker with 2 thread: 111.1 WU/s = best throughput (two assignments)
========================================================
These results seem reasonable. The L1 and L2 cache is per-core. When there is more than one thread per core, the threads compete for whose data is in cache.

In addition to providing best throughput, the CPU core temperatures are significantly lower with the one-thread-per-core choices, compared to the (4 worker, 2 thread each) or (8 worker, 1 thread each) choices. Faster, less CPU load and thermal stress, what's not to love?

Prime95 2011-02-21 05:07

[QUOTE=S34960zz;253230]
OS: Win7 Pro x64

Using: Prime95 v26.5 beta3 x64 (p64v265_build3.zip)

From v25.11 benchmark:
Intel(R) Core(TM) i7 CPU Q 840 @ 1.87GHz
CPU speed: 1862.13 MHz, 8 cores[/Quote]

Crap, crap, crap. The CPU is not properly identified as hyperthreaded.

Hyperthreading has been nothing but one gigantic pain in the neck

S34960zz 2011-02-21 10:13

[QUOTE=Prime95;253233]Crap, crap, crap. The CPU is not properly identified as hyperthreaded.

Hyperthreading has been nothing but one gigantic pain in the neck[/QUOTE]
No, I think that v26.5b3 has it right**. Please note that the *CPU description* in that post came from v25.11 (b2) benchmark results, but the *experiment* was run using v26.5b3. I apologize for the confusion. (** The measured/reported CPU speed is higher than the nominal 1.87GHz, but that has been discussed previously in this thread as an effect of the "self-overclocking" %MaximumFrequency feature.)

Note also that v25.11 lists "CPU features: RDTSC, CMOV" and v26.5b3 does not (but both versions say "RdtscTiming=1").

Same machine, benchmark using v26.5b3:
[CODE]
[Mon Feb 21 04:51:36 2011]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Intel(R) Core(TM) i7 CPU Q 840 @ 1.87GHz
CPU speed: 1995.18 MHz, 4 hyperthreaded cores
CPU features: Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 256 KB, L3 cache size: 8 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 64
Prime95 64-bit version 26.5, RdtscTiming=1
Best time for 768K FFT length: 15.397 ms., avg: 15.622 ms.
Best time for 896K FFT length: 18.341 ms., avg: 18.531 ms.
Best time for 1024K FFT length: 20.803 ms., avg: 20.887 ms.
Best time for 1280K FFT length: 27.036 ms., avg: 27.136 ms.
Best time for 1536K FFT length: 33.111 ms., avg: 33.477 ms.
Best time for 1792K FFT length: 39.186 ms., avg: 39.339 ms.
Best time for 2048K FFT length: 44.259 ms., avg: 45.179 ms.
Best time for 2560K FFT length: 56.100 ms., avg: 56.599 ms.
Best time for 3072K FFT length: 69.216 ms., avg: 70.447 ms.
Best time for 3584K FFT length: 81.815 ms., avg: 82.935 ms.
Best time for 4096K FFT length: 92.389 ms., avg: 93.616 ms.
Best time for 5120K FFT length: 118.915 ms., avg: 119.814 ms.
Best time for 6144K FFT length: 148.842 ms., avg: 150.269 ms.
Best time for 7168K FFT length: 177.632 ms., avg: 179.913 ms.
Best time for 8192K FFT length: 200.319 ms., avg: 201.995 ms.
Timing FFTs using 2 threads on 1 physical CPUs.
Best time for 768K FFT length: 18.353 ms., avg: 18.642 ms.
Best time for 896K FFT length: 19.112 ms., avg: 19.219 ms.
Best time for 1024K FFT length: 25.302 ms., avg: 25.550 ms.
Best time for 1280K FFT length: 31.674 ms., avg: 32.059 ms.
Best time for 1536K FFT length: 39.112 ms., avg: 39.331 ms.
Best time for 1792K FFT length: 43.132 ms., avg: 43.374 ms.
Best time for 2048K FFT length: 51.286 ms., avg: 51.812 ms.
Best time for 2560K FFT length: 68.074 ms., avg: 68.423 ms.
Best time for 3072K FFT length: 81.658 ms., avg: 81.769 ms.
Best time for 3584K FFT length: 94.266 ms., avg: 94.512 ms.
Best time for 4096K FFT length: 111.718 ms., avg: 112.401 ms.
Best time for 5120K FFT length: 143.131 ms., avg: 143.769 ms.
Best time for 6144K FFT length: 166.248 ms., avg: 168.507 ms.
Best time for 7168K FFT length: 191.382 ms., avg: 192.504 ms.
Best time for 8192K FFT length: 227.229 ms., avg: 228.413 ms.
Timing FFTs using 4 threads on 2 physical CPUs.
Best time for 768K FFT length: 9.561 ms., avg: 10.046 ms.
Best time for 896K FFT length: 9.933 ms., avg: 10.732 ms.
Best time for 1024K FFT length: 13.227 ms., avg: 13.287 ms.
Best time for 1280K FFT length: 16.470 ms., avg: 17.173 ms.
Best time for 1536K FFT length: 20.197 ms., avg: 21.040 ms.
Best time for 1792K FFT length: 23.074 ms., avg: 23.854 ms.
Best time for 2048K FFT length: 27.300 ms., avg: 27.446 ms.
Best time for 2560K FFT length: 35.759 ms., avg: 36.440 ms.
Best time for 3072K FFT length: 42.766 ms., avg: 43.726 ms.
Best time for 3584K FFT length: 49.614 ms., avg: 50.185 ms.
Best time for 4096K FFT length: 58.419 ms., avg: 58.523 ms.
Best time for 5120K FFT length: 74.971 ms., avg: 75.388 ms.
Best time for 6144K FFT length: 88.798 ms., avg: 89.183 ms.
Best time for 7168K FFT length: 101.399 ms., avg: 102.508 ms.
Best time for 8192K FFT length: 120.146 ms., avg: 121.852 ms.
Timing FFTs using 6 threads on 3 physical CPUs.
Best time for 768K FFT length: 6.954 ms., avg: 7.351 ms.
Best time for 896K FFT length: 7.260 ms., avg: 7.390 ms.
Best time for 1024K FFT length: 9.615 ms., avg: 9.822 ms.
Best time for 1280K FFT length: 11.815 ms., avg: 13.397 ms.
Best time for 1536K FFT length: 14.396 ms., avg: 15.312 ms.
Best time for 1792K FFT length: 16.323 ms., avg: 17.016 ms.
Best time for 2048K FFT length: 19.372 ms., avg: 19.523 ms.
Best time for 2560K FFT length: 25.121 ms., avg: 26.262 ms.
Best time for 3072K FFT length: 30.062 ms., avg: 30.239 ms.
Best time for 3584K FFT length: 34.665 ms., avg: 34.981 ms.
Best time for 4096K FFT length: 41.161 ms., avg: 42.443 ms.
Best time for 5120K FFT length: 52.592 ms., avg: 53.964 ms.
Best time for 6144K FFT length: 61.433 ms., avg: 63.074 ms.
Best time for 7168K FFT length: 72.798 ms., avg: 73.608 ms.
Best time for 8192K FFT length: 83.950 ms., avg: 86.793 ms.
Timing FFTs using 8 threads on 4 physical CPUs.
Best time for 768K FFT length: 8.415 ms., avg: 8.790 ms.
Best time for 896K FFT length: 5.782 ms., avg: 6.397 ms.
Best time for 1024K FFT length: 11.511 ms., avg: 12.147 ms.
Best time for 1280K FFT length: 12.716 ms., avg: 14.675 ms.
Best time for 1536K FFT length: 11.510 ms., avg: 12.826 ms.
Best time for 1792K FFT length: 13.006 ms., avg: 14.319 ms.
Best time for 2048K FFT length: 15.768 ms., avg: 17.353 ms.
Best time for 2560K FFT length: 19.431 ms., avg: 20.929 ms.
Best time for 3072K FFT length: 24.715 ms., avg: 24.772 ms.
Best time for 3584K FFT length: 27.547 ms., avg: 27.996 ms.
Best time for 4096K FFT length: 33.542 ms., avg: 34.651 ms.
Best time for 5120K FFT length: 42.907 ms., avg: 43.321 ms.
[Mon Feb 21 04:56:37 2011]
Best time for 6144K FFT length: 51.813 ms., avg: 53.090 ms.
Best time for 7168K FFT length: 64.447 ms., avg: 65.585 ms.
Best time for 8192K FFT length: 69.297 ms., avg: 70.542 ms.
Best time for 58 bit trial factors: 3.884 ms.
Best time for 59 bit trial factors: 3.933 ms.
Best time for 60 bit trial factors: 3.930 ms.
Best time for 61 bit trial factors: 4.381 ms.
Best time for 62 bit trial factors: 4.581 ms.
Best time for 63 bit trial factors: 5.405 ms.
Best time for 64 bit trial factors: 6.481 ms.
Best time for 65 bit trial factors: 7.053 ms.
Best time for 66 bit trial factors: 7.387 ms.
Best time for 67 bit trial factors: 7.327 ms.
[/CODE]

Prime95 2011-02-21 13:03

[QUOTE=S34960zz;253253]No, I think that v26.5b3 has it right**.[/quote]

Whew, you gave me a bit of a scare!

[quote]
Note also that v25.11 lists "CPU features: RDTSC, CMOV" and v26.5b3 does not [/QUOTE]

This is OK. Prime95 detects but no longer outputs those features. I needed to make room for new features like AVX and FMA.

Prime95 2011-02-22 00:35

Build 4 now available.

If all goes well, this will become the end the beta of version 26.

James Heinrich 2011-02-22 01:36

Whereas 26.5.3 made no complaints, v26.5.4 now says (for an i7-920):[quote]Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
Unable to detect some of the hyperthreaded logical CPUs.
See AffinityScramble2 in undoc.txt.[/quote]Also, there's a typo "assigni" in undoc.txt (AffinityScramble2 section).

Prime95 2011-02-22 01:53

[QUOTE=James Heinrich;253328]Whereas 26.5.3 made no complaints, v26.5.4 now says (for an i7-920):Also, there's a typo "assigni" in undoc.txt (AffinityScramble2 section).[/QUOTE]

Set DebugAffinityScramble=1 in prime.txt and let me know what happens

James Heinrich 2011-02-22 01:59

[QUOTE=Prime95;253331]Set DebugAffinityScramble=1 in prime.txt and let me know what happens[/QUOTE][code]Mersenne number primality test program version 26.5
Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
Test clocks: 95248
Logical CPU 1 clocks: 98237
Logical CPU 2 clocks: 95248
Logical CPU 3 clocks: 95248
Logical CPU 4 clocks: 95265
Logical CPU 5 clocks: 95283
Logical CPU 6 clocks: 95248
Logical CPU 7 clocks: 95248
Test clocks: 95248
Logical CPU 2 clocks: 95248
Logical CPU 3 clocks: 95248
Logical CPU 4 clocks: 95245
Logical CPU 5 clocks: 95248
Logical CPU 6 clocks: 95265
Logical CPU 7 clocks: 95248
Test clocks: 95265
Logical CPU 3 clocks: 182924
Test clocks: 95248
Logical CPU 5 clocks: 98163
Logical CPU 6 clocks: 95248
Logical CPU 7 clocks: 95248
Unable to detect some of the hyperthreaded logical CPUs.
See AffinityScramble2 in undoc.txt.
Starting workers.[/code]

Prime95 2011-02-22 02:29

Interesting. You're in luck. Prime95's default is to assume logical 0,1 are one physical CPU, 2 and 3, 4 and 5, and 6 and 7. From your output, 2 and 3 clearly are one physical CPU. Thus you don't need to mess with AffinityScramble2.

My i7-860 is detected properly on Linux and Windows XP 64-bit. What's weird is that Linux numbers the logical CPUs differently than Windows.

v26.5b3 was not counting clocks properly on Windows. It was always getting a count of 0. When it tested the next logical processor, it also got a count of zero. Since that's twice as slow, it thought the two logicals made up one physical.

I don't know how to proceed. For now, I'll just gather more data unless a brainstorm hits me.

KingKurly 2011-02-22 02:55

Running a benchmark seems to crash ("Illegal instruction") on the Linux 32-bit build. I recall seeing the crash in build 3 and 4; have not retested earlier builds, but I do know that the machine has run benchmarks before. Benchmark does work on Linux 64-bit and Windows 32-bit.

The machine in question is a dual PIII 1.4GHz, and it is using Slackware 13.1.

Let me know if you need additional information or for me to run with special settings on that machine.

Prime95 2011-02-22 03:24

[QUOTE=KingKurly;253334]Running a benchmark seems to crash ("Illegal instruction") on the Linux 32-bit build. [/QUOTE]

Can you tell me the last text output prior to the seg fault?

KingKurly 2011-02-22 05:45

[QUOTE=Prime95;253336]Can you tell me the last text output prior to the seg fault?[/QUOTE]
Sure, here it is. Let me know if there's anything else I can do.

[CODE]kurly@slice:~$ mprime/mprime -m
[Main thread Feb 22 00:38] Mersenne number primality test program version 26.5
[Main thread Feb 22 00:38] Optimizing for CPU architecture: Pre-SSE2, L2 cache size: 512 KB
Main Menu

1. Test/Primenet
2. Test/Worker threads
3. Test/Status
4. Test/Continue
5. Test/Exit
6. Advanced/Test
7. Advanced/Time
8. Advanced/P-1
9. Advanced/ECM
10. Advanced/Manual Communication
11. Advanced/Unreserve Exponent
12. Advanced/Quit Gimps
13. Options/CPU
14. Options/Preferences
15. Options/Torture Test
16. Options/Benchmark
17. Help/About
18. Help/About PrimeNet Server
Your choice: 16


Hit enter to continue: [Main thread Feb 22 00:38] Starting worker.
[Worker #1 Feb 22 00:38] Worker starting
[Worker #1 Feb 22 00:38] Your timings will be written to the results.txt file.
[Worker #1 Feb 22 00:38] Compare your results to other computers at http://www.mersenne.org/report_benchmarks
Illegal instruction[/CODE]

Prime95 2011-02-22 07:07

Build 5 available. Fixes the dual-core non-SSE2 benchmark crash

oilwarzonedotco 2011-02-22 10:17

I am trying to use the new build and register a new computer using the 26.5 version and all i am getting is the following

[Main thread Feb 22 10:03] Mersenne number primality test program version 26.5
[Main thread Feb 22 10:03] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 3 MB
[Main thread Feb 22 10:03] Logical CPUs 0,1 form one physical CPU.
[Main thread Feb 22 10:03] Logical CPUs 2,3 form one physical CPU.
[Main thread Feb 22 10:04] Starting workers.
[Comm thread Feb 22 10:04] Updating computer information on the server
[Comm thread Feb 22 10:04] PrimeNet error 9: Access denied
[Comm thread Feb 22 10:04] Untrusted program versions currently excluded by PrimeNet
[Comm thread Feb 22 10:04] Visit [URL]http://mersenneforum.org[/URL] for help.
[Comm thread Feb 22 10:04] Will try contacting server again in 70 minutes.

Has the new software been activated to work on the servers?

Jeff Gilchrist 2011-02-22 12:17

Hi George,

You had mentioned previously that for 26.5 you would update Prime95 to delete the P-1 save and backup files when they are finished being processed (or have a switch to do that). Was that complete?

The 26.5b3 client seems to be leaving them around still. Is there a switch I should be using or did you not have time to implement that?

Thanks,
Jeff.


All times are UTC. The time now is 20:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.