![]() |
![]() |
#1 |
Mar 2009
22·5 Posts |
![]()
Hi,
I'm have a fresh install of mprime on a linux x64 (Ubuntu 2204) machine. I have it testing an exponent (ECM) and it's running really really slow. On a computer with almost identical hardware running Windows 10, the ECM on almost same exponent is running about 2 times faster. I thought mprime would run a self bench (autobench) after a day or two of running? I suspect the linux mprime is using a non-optimized FFT, or the FFT size is too big. I have explicitly set AutoBench=1 for prime.txt on the mprime machine, but I still haven't seen the program trigger the autobench. Where is the optimized FFT data stored, so maybe i can copy the FFT knowledge from one machine to the other. |
![]() |
![]() |
![]() |
#2 |
"Matthew Anderson"
Dec 2010
Oregon, USA
2·11·53 Posts |
![]()
Welcome to mersenneforum.org !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
![]() |
![]() |
![]() |
#3 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
298E16 Posts |
![]()
The memory on the slower machine, are all the modules same speed and same manufacturer? And how are the banks filled? Improper memory set-up can slow down a machine dramatically.
|
![]() |
![]() |
![]() |
#4 | |
Mar 2009
22·5 Posts |
![]() Quote:
Again, slower machine is 2 times slower. No idea why. I'm trying to get slower machine to run an autobench, but does not do so for whatever reason. Faster one does. |
|
![]() |
![]() |
![]() |
#5 | |
Mar 2009
1416 Posts |
![]() Quote:
4 sticks DDR4-2133 ECC RIMM Quad channel Intel Xeon E5-1607 v3 @ 3.1 Ghz Does ECM 4 threads at ~M1000000 range (B1=1000000) stage 1 = 1450 sec, stage 2, 850 sec, total = ~2300 sec Slower machine = Ubuntu 2204 4 sticks DDR4-2400 ECC RDIMM Quad channel Intel Xeon E5-2680 v4 @ 2.9 Ghz Does ECM 6 threads at ~M1000000 range (B1=1000000) stage 1 = 3400 sec, stage2 = 1100 sec, total = ~4500 sec Faster RAM, more threads, and 2 times slower? Again, I'm trying to trigger an autobench so the program can choose the best FFT algorithm. Surely the linux box can do a curve faster than 4500 sec. |
|
![]() |
![]() |
![]() |
#6 |
Jun 2003
5,387 Posts |
![]()
It would be more helpful if you post the screen outputs from both systems.
BTW, according to ark, E5-2680 v4 is a 14-core system, so you should be able to run 14 threads at the same time. |
![]() |
![]() |
![]() |
#7 | |
Mar 2009
22×5 Posts |
![]() Quote:
Is 8 threads the max for any worker? I am setting 1 worker, 14 cores and the most I see for any worker is 8 threads. I would expect 14 unless that is some upper limit? I don't know what you are expecting to see from the output logs. Looks like usual ECM logs except it is taking very long time to finish. How does mprime select the fastest FFT implementation? Does manually starting a benchmark help? Last fiddled with by timbit on 2022-06-27 at 16:19 Reason: I can obtain logs later on. Both machines are in different locations. |
|
![]() |
![]() |
![]() |
#8 |
Jun 2003
5,387 Posts |
![]()
The output will show the FFT selected, the worker configuration, affinities, etc. Let us understand the problem first before attempting a solution/
ECM work you're mentioning (M1000000) is very small and does not multithread. You'll most likely get the best thruput by running 14 workers. |
![]() |
![]() |
![]() |
#9 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
6,673 Posts |
![]()
No. I've benchmarked up to 68 cores/worker on a Xeon Phi 7250.
prime95 number of cores (threads) supported is 512 or 1024 https://mersenneforum.org/showpost.p...&postcount=202 Yes, you can manually trigger a benchmark, and optionally specify what range of fft sizes are benchmarked, what list or range of core counts per worker, whether HT is tried or not, etc. Start by experimenting with few fft sizes for speed of experimentation. See also https://www.mersenneforum.org/showpo...4&postcount=11 and its attachments. Last fiddled with by kriesel on 2022-06-27 at 17:07 |
![]() |
![]() |
![]() |
#10 |
Mar 2009
22·5 Posts |
![]()
OK I've deleted the results.bench.txt and the gwnum.txt files.
I've manually run the thoughput tests on the same size FFT. Now it's running the ECM again. I'll let this go for a day or two and I'll see if the autobench runs again. Regardless of the results in results.bench.txt or gwnum.txt, it never seems to select the FFT with the most throughput. Odd. Also I still cannot get more than 8 threads on a worker. Last fiddled with by timbit on 2022-06-28 at 05:17 Reason: I forgot something |
![]() |
![]() |
![]() |
#11 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
6,673 Posts |
![]()
As axn wrote, # of useful cores/worker is a function of fft size, which is a function of exponent.
I don't run 1M ECM, but run DC & first time primality test wavefront PRP, and up to 1G P-1, big exponents, big ffts, higher core counts. Your 1M/ ~20 bits/word ~ 50K fft size. As axn wrote, that only needs/uses one core, not multithreaded. Run lots of workers, one core each. Downside is that will multiply demand for main memory. How many GB of ram do you have installed per system? How much did you set the prime95 setting to, increased from the very low default for daytime and nighttime P-1, P+1, ECM stage 2? Last fiddled with by kriesel on 2022-06-28 at 05:58 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Running fstrim on SSD while mprime is running might cause errors in mprime | AwesomeMachine | Software | 4 | 2021-10-07 23:49 |
Radeon VII on a mining-like bench | Viliam Furik | Viliam Furik | 17 | 2021-01-14 08:12 |
mprime from git | SELROC | Software | 2 | 2018-10-30 10:16 |
2 x AMD Opteron 2427 @ 2.39 GHz - prime95 bench- | joblack | Hardware | 2 | 2010-03-12 19:38 |
Problem with mprime (Fixed with mprime -d) | antiroach | Software | 2 | 2004-07-19 04:07 |