mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Trying to build dedicated hardware for LL testing - Poor performance (https://www.mersenneforum.org/showthread.php?t=26608)

Lasse 2021-03-15 07:53

Trying to build dedicated hardware for LL testing - Poor performance
 
Hi All

I’m trying to assemble dedicated hardware for LL testing.
Currently I have a few I7-9700K (8 cores @ 3.6GHz) processors with a single 4GB ram stick but performance is not anywhere near what I expected.

Testing a 103M exponent with one worker on all 8 cores takes around 30 days. (25ms / iteration)
In comparison I have a laptop doing LL testing with a E3-1575M (4 cores @ 3GHz) processor and 4 x 16GB memory and that takes 8-9 days (6-7ms / iteration). Since the I7-9700K should be faster I was excepting it to take less time.

When I look at the memory usage the entire system is using less than 500MB so 4GB memory should be enough. I’m wondering if there is a bottleneck on the memory bandwidth?
Would it help if I installed 2 or even 4 ram sticks to give more bandwidth?

I have been trying to find a way to see how busy the memory is with no luck.


I’m using Fedora with Linux64,Prime95,v30.3,build 6.


Any points would be highly apricated.
Also if someone is having a recipe for a 2021 bang for bug hardware list I would very much like to hear about it.


Thanks.

axn 2021-03-15 07:58

[QUOTE=Lasse;573740]I’m wondering if there is a bottleneck on the memory bandwidth?
Would it help if I installed 2 or even 4 ram sticks to give more bandwidth? [/QUOTE]
Yes and yes. You are severely bottlenecked on memory bandwidth.

Either 2x16GB or 4x8GB of the fastest RAM that you can get will give you the most bandwidth. Performance will scale (near) linearly with RAM bandwidth.

Lasse 2021-03-15 08:08

[QUOTE=axn;573741]Yes and yes. You are severely bottlenecked on memory bandwidth.

Either 2x16GB or 4x8GB of the fastest RAM that you can get will give you the most bandwidth. Performance will scale (near) linearly with RAM bandwidth.[/QUOTE]




Thanks a lot for fast reply. I will order 4 of the fastest supported RAM i can find and test with 2 and 4 modules.

Is there any way i can check the system to see if i'm maxing out the memory bandwidth?

axn 2021-03-15 08:39

[QUOTE=Lasse;573742]Is there any way i can check the system to see if i'm maxing out the memory bandwidth?[/QUOTE]

If you're memory bottlenecked, downclocking the CPU will cause no (or virtually no) reduction in performance, and neither will overclocking give any increase in performance. Once you've sufficient memory bandwidth, you'll see performance being better correlated with CPU clockspeed. That's an indirect way of verifying this.

I don't know how to monitor the memory bandwidth usage directly, sorry.

Lasse 2021-03-15 09:14

[QUOTE=axn;573743]If you're memory bottlenecked, downclocking the CPU will cause no (or virtually no) reduction in performance, and neither will overclocking give any increase in performance. Once you've sufficient memory bandwidth, you'll see performance being better correlated with CPU clockspeed. That's an indirect way of verifying this.

I don't know how to monitor the memory bandwidth usage directly, sorry.[/QUOTE]




Thanks for the clarification. I have just tested with a single core and I’m getting the same speed as when I’m running on all 8 cores.

For reference the current setup is as follows:
Motherboard: TUF H370-PRO GAMING
CPU: I7-9700K
RAM: 1 x Kingston ValueRAM - DDR4 - 4 GB - DIMM 288-PIN - 2400 MHz / PC4-19200 - CL17 - 1.2 V
PSU: 1650W (Totally overkill)


I have now ordered:
2 x HyperX Predator - DDR4 8 GB - DIMM 288-PIN - 2666 MHz / PC4-21300 - CL13 - 1.35 V
2 x HyperX FURY - DDR4 4GB - DIMM 288-PIN - 2666 MHz / PC4-21300 - CL16 - 1.2 V

When the RAM is delivered and I’m getting some time to play around with it I will post results here.


If anyone else have any inputs please do not hesitate to post :)

mackerel 2021-03-15 10:56

Faster ram always helps in this scenario, but with a H370 mobo I believe you're limited to whatever the official speed is supported by the CPU. It would have cost more, but perform better to get a Z370/Z390 mobo and faster ram.

If you keep the current mobo, then the best you can do is to put in 4 modules. Capacity doesn't matter. This is in two parts: firstly you get dual channel which already doubles what you had. Secondly, you get more than one rank per channel. This helps you get more effective usage of the bandwidth of dual channels. 4x4gb might be the most economic if you can still find modules that small. You can try testing with the mismatched pairs already ordered.

4GB modules probably were always single rank.
8GB modules way back when it was still relatively new in 2015-ish might have been dual rank, but they've been single rank for a long time.
16GB modules were all dual rank, but I understand newer ones coming out now are single rank.
So generally speaking if you don't need capacity, the cheapest way to rank up is to use 4 modules on a dual channel system.

Even on a quad core 6700k dual channel ram is wholly inadequate. It was a while ago, but from memory going 2xSR to 4xSR or equivalently 2xDR at 3000 speed gave around 20-25% speedup.

CL rating doesn't seem to make much difference in my testing so I wouldn't pay extra for it.

Lasse 2021-03-15 12:29

[QUOTE=mackerel;573750]Faster ram always helps in this scenario, but with a H370 mobo I believe you're limited to whatever the official speed is supported by the CPU. It would have cost more, but perform better to get a Z370/Z390 mobo and faster ram.

If you keep the current mobo, then the best you can do is to put in 4 modules. Capacity doesn't matter. This is in two parts: firstly you get dual channel which already doubles what you had. Secondly, you get more than one rank per channel. This helps you get more effective usage of the bandwidth of dual channels. 4x4gb might be the most economic if you can still find modules that small. You can try testing with the mismatched pairs already ordered.

4GB modules probably were always single rank.
8GB modules way back when it was still relatively new in 2015-ish might have been dual rank, but they've been single rank for a long time.
16GB modules were all dual rank, but I understand newer ones coming out now are single rank.
So generally speaking if you don't need capacity, the cheapest way to rank up is to use 4 modules on a dual channel system.

Even on a quad core 6700k dual channel ram is wholly inadequate. It was a while ago, but from memory going 2xSR to 4xSR or equivalently 2xDR at 3000 speed gave around 20-25% speedup.

CL rating doesn't seem to make much difference in my testing so I wouldn't pay extra for it.[/QUOTE]




Thanks. That was very helpful.


I have now ordered the following:
Gigabyte Z390 M Micro-ATX LGA1151 Intel Z390

4 x CORSAIR Vengeance DDR4 8GB 3600MHz CL18





Will update once i get everything tested. :)

Uncwilly 2021-03-15 12:45

Please don't do LL tests, unless they are to double check. PRP is know the preferred test time for first time tests. It has superior error checking built into it. (Making errors that pas through very few and far between.) Also, using the latest version of either Prime95 or GpuOwL will produce a file that will allow the run to be verified quickly on another nachine. This will save 95% of the effort that a traditional Double Check would take,

Also, consider adding a GPU. You will get more throughput from a good GPU than the CPU.

Lasse 2021-03-15 13:00

[QUOTE=Uncwilly;573752]Please don't do LL tests, unless they are to double check. PRP is know the preferred test time for first time tests. It has superior error checking built into it. (Making errors that pas through very few and far between.) Also, using the latest version of either Prime95 or GpuOwL will produce a file that will allow the run to be verified quickly on another nachine. This will save 95% of the effort that a traditional Double Check would take,

Also, consider adding a GPU. You will get more throughput from a good GPU than the CPU.[/QUOTE]


Thanks for your advice. I have briefly looked into PRP and i need to look more into it but i will definitely use PRP going forward.


In regards to GPU i have picked up 10 used Nvidia Tesla K80 i'm hoping to get up and run soon. Just waiting for some power adapters to arrive. Long delivery time.

From what i could see from benchmark reports the K80 and GPU's in general is that they are way faster to be used for factor checking compare to LL testing. Maybe this is not the case with PRP?

For that reasons my plan was to get some fast CPU's to do LL/PRP testing and use the GPU's for factor checking.



Please correct me if i am mistaken :)

Uncwilly 2021-03-15 13:42

There are some GPU's that are better than others at PRP. This chart can be an approximate guide: [url]https://www.mersenne.ca/cudalucas.php[/url]
This one covers factoring: [url]https://www.mersenne.ca/mfaktc.php[/url]

You can punch in different cards and compare them. PRP and LL compare well for speed comparisons.

DrobinsonPE 2021-03-15 14:16

1 Attachment(s)
[QUOTE=Lasse;573740]Hi All

I’m trying to assemble dedicated hardware for LL testing.
Currently I have a few I7-9700K (8 cores @ 3.6GHz) processors with a single 4GB ram stick but performance is not anywhere near what I expected. [/QUOTE]

I am currently running a similar setup

Gigabyte B365M DS3H, I7-9700K, four sticks of DDR-2666 4GB ram. The motherboard limits the ram to 2666.

Attached is a picture of my testing data for the CPU. It might be useful to you. You can save a lot of energy by decreasing the CPU clock without decreasing mprime throughput.


All times are UTC. The time now is 16:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.