#1

"Kieren"
Jul 2011
In My Own Galaxy!

2×33×5×37 Posts
There are two different situations, with two different machines. Here's the first.

Yesterday, I started to get our newest computer more operational. First, I replaced the crappy power supply that came with. It is below Bronze rating efficiency. Worse, its fan runs constantly at max, regardless of load. A loud fan is the last thing we need for a music workstation. I was able to pick up a refurb Corsair Platinum supply for $100 to fix that. Having run many hours of torture testing, I decided to try some DCs. I initially set up with one worker running all eight cores of the 9700K (see attached specs.) I was surprised that the performance wasn't as good as my 6700K machine with similar basic specs, such as memory. I then tried it with 2 workers, 4 cores each. If anything, these results were worse. I have gone through various BIOS settings trying to replicate the 6700K settings, but this was no help. See "P95-Music Stock 1-2 wrkrs" attached jpg. (For instance, I turned off all power saving functions, and tried running with all cores locked at 4.5 GHz.) Note that the image shows both one worker running and with the second worker added later. I then ran a stripped down benchmark which I could compare to a similar run on the 6700K. Benchmarks attached. The first result for each machine kind of says it all. The 6700K running 4 cores locked at 4.3 GHz beats the tar out of the 9700K running 8 cores mostly running Turbo at 4.5GHz. The memory in these systems is very similar on the face of it. Both are DDR4-3200 dual channel with similar timings. The 6700K does have Micron dual rank RAM. The 9700K has single rank Hynix. That's the major difference I can put a finger on. Do the benchmarks reflect memory strangulation on the new machine, relative to the old one? I am out of ideas here. After messing with the 9700K for some hours I returned to the 6700K to find that the (rare) first time LL at just over 50% completion had failed Jacobi testing, and had failed all attempts to backtrack to interim files. As noted, I run very few first time tests, and this failure comes amid constant good DCLL results. Previous first time runs had no declared errors. I have put the broken test on hold and returned to DCs for the moment. It is in a "Confidence is fair" state right now. I used to run with frequent interim results saved, but I guess I let that practice lapse. I'd like to roll this back further rather than turn in a flawed result. If I can't do that I am tempted to just ditch that assignment. This is a really puzzling event, given the long DC history for this machine. Previous first time runs had no declared errors. I have put the broken test on hold and returned to DCs for the moment. It is in a "Confidence is fair" state right now. I used to run with frequent interim results saved, but I guess I let that practice lapse. I'd like to roll this back further rather than turn in a flawed result. If I can't do that I am tempted to just ditch that assignment. This is a really puzzling event, given the long DC history for this machine. Last fiddled with by kladner on 2020-01-09 at 17:01 Reason: Extra automatically inserted line breaks removed.  2020-01-09, 17:12 #4 masser Jul 2003 wear a mask 101101000102 Posts Are you adequately cooling both CPUs? I imagine 8 cores at 4.5 Ghz might require significantly more cooling than 4 cores at 4.3 Ghz. 2020-01-09, 17:34 #5 kladner "Kieren" Jul 2011 In My Own Galaxy! 2·33·5·37 Posts Quote:  Originally Posted by axn I'm guessing you're somehow running in single channel mode. As soon as the second worker started, your first worker timing just tanked. You should consult your MB documentation to see if you've correctly populated the DIMMs to enable dual channel operation. Perhaps check if XMP is enabled as well in BIOS. Thanks for the tips. I'll go have a look at those factors. So..... The DIMM placement should be correct. They are in slots 2 and 4 out of 4 total slots. CPU-z says dual channel, single rank. It appears to be running XMP, as 3200 is the XMP speed. Off for a look in the BIOS..... All looks the same as what CPU-z reported. Timings are 16-18-18-36. The 6700K is actually running at 17-18-18-36, and still kicks ass. It is also running well beyond the rated XMP-2666. I'll try running memtest86, for what that might reveal. Interesting. Memtest reports the processor caches 1 and 2 as being x4 instead of x8, which is the correct number according to CPU-z. EDIT: However, memtest86 reported no problems in 3 full sets of tests. Last fiddled with by kladner on 2020-01-10 at 00:40 2020-01-09, 18:06 #6 kladner "Kieren" Jul 2011 In My Own Galaxy! 2·33·5·37 Posts Quote:  Originally Posted by masser Are you adequately cooling both CPUs? I imagine 8 cores at 4.5 Ghz might require significantly more cooling than 4 cores at 4.3 Ghz. It runs pretty cool. The cooler looks like a Cooler Master 212. Running the 2 DCLL tests puts it in the mid 60s C, max. This is a bit cooler than the 6700K, which has 240mm liquid cooling. The latter keeps the temps lower when running Small FFT torture test, however. EDIT: I do expect the older chip to draw more power per core than the newer, FWIW. Last fiddled with by kladner on 2020-01-09 at 18:15  2020-01-09, 18:57 #7 mackerel Feb 2016 UK 389 Posts What FFT size are the tests? Assuming they're big enough to be ram bandwidth limited, the rank is the factor. When I originally built a 6700k system I got some performance and thought nothing of it. When I decided to get a 6600k to increase my throughput, no matter what I did the performance of it was much lower. Initially I blamed cache, as I had tried overclocking to make the running speed the same. No, the difference was the ram. dual rank gave something of the order of 30% difference from memory at the same speed. Cache size didn't matter in any detectable way. If you can't find dual rank ram, the alternative is to fit two modules per channel. So for dual channel, use 4 single rank sticks. As far as I can test, (single rank) 2DPC works the same as 2 rank modules.  2020-01-09, 19:40 #8 kladner "Kieren" Jul 2011 In My Own Galaxy! 2·33·5·37 Posts FFTs are 2880. I remember going to some lengths to get dual rank. It's not always listed, and seems fairly rare in the consumer realm. It might be easier to go with Plan B. I can get matching sticks for about$70.
 2020-01-09, 20:31 #9 PhilF     Feb 2005 Colorado 2×263 Posts I remember having to play with the affinity settings in the worker settings in order to get it right. Either I got it wrong or Prime95 did, but at first I was getting miserable performance on a i7-8700. My best settings have always been to run half as many tests concurrently as I have physical cores for. Each test is given 2 workers, and the affinity is paid close attention to, and adjusted if necessary to make sure each worker AND each helper both has it's own physical core. With a hyperthreaded CPU, that leaves a little wiggle room for the OS too, so the system remains quite responsive while at the same time giving great iteration time. So on a CPU with 8 physical cores, for example, I would try 4 tests, each one assigned to 1 worker, with each worker having 1 helper, all spread out among the 8 cores.
#10
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×3×13×59 Posts

Quote:
 Originally Posted by PhilF I remember having to play with the affinity settings in the worker settings in order to get it right. ... So on a CPU with 8 physical cores, for example, I would try 4 tests, each one assigned to 1 worker, with each worker having 1 helper, all spread out among the 8 cores.
What's optimal throughput seems to depend on system details and fft length. See https://www.mersenneforum.org/showpo...18&postcount=4 and fhttps://www.mersenneforum.org/showpost.php?p=504219&postcount=5 or some exhaustive tests.
Another factor is latency. Running the optimal throughput configuration may not be the best choice, if assignments expire before they're finished. Move up to more cores/worker generally for that, as long as it doesn't cross chip packages.

#11

"Kieren"
Jul 2011
In My Own Galaxy!

270616 Posts

Quote:
 Originally Posted by PhilF I remember having to play with the affinity settings in the worker settings in order to get it right. Either I got it wrong or Prime95 did, but at first I was getting miserable performance on a i7-8700. My best settings have always been to run half as many tests concurrently as I have physical cores for. .....
That sounds like a good formula (among others). I do tend to favor more cores per worker, but I am kind of thrashing about trying to get better performance.The affinity advice is truly important, especially with more workers. This 9700K is not running hyper threaded, as far as I know. I am still figuring out the BIOS on this board. P95 reports the right core assignments when it starts up, though it refers cores to 1-8 instead of 0-7.

I do run hyper threading on the 6700K, for the reason you give: it seems to let other processes operate more smoothly. The latest model of P95 doesn't allow HT for LL tests. That is greyed out. I should track it down, though, and enable it for the system.

I mentioned in another thread that I have followed user mackerel's advice to add another DIMM to each channel. I have the matching RAM, but the machine is tied up doing something that precludes a shutdown. I'll report results if the "Mackerel Cure®" works

