![]() |
Oh Brother, What betid to mine Haswell 4770?
Here are the specs from the Computer Properties screen.
[CODE]Software Version Windows64,Prime95,v28.5,build 2 Model Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Features 4 core, hyperthreaded, Prefetch,SSE,SSE2,SSE4,AVX,AVX2,FMA, Speed 3.435 GHz (21.359 GHz P4 effective equivalent) L1/L2 Cache 32 / 256 KB Computer Memory 4008 MB configured usage 800 MB day / 800 MB night [/CODE] And here is a summary of the work and timings of the 4 workers over the last week or so. I noted the times when 1 or more workers iteration times changed a lot. [CODE] Date/Time #1 TF and DC #2 TF #3 DC #4 LL 18/09/2014 8:19 TF 490 Sec TF Unknown 36.2M 17 Ms 67.8M 33 Ms 19/09/2014 0:12 TF 480 Sec TF Unknown 33.2M 16 Ms 67.8M 33 Ms 19/09/2014 17:54 TF 475 Sec TF Unknown 33.2M 16 Ms 67.8M 52 Ms 21/09/2014 20:21 35.7M 26 Ms TF 470 Sec 33.2M 24 Ms 67.8M 80 Ms 22/09/2014 17:55 35.7M 24 Ms TF 500 Sec 33.2M 23 Ms 67.8M 48 Ms 23/09/2014 17:56 35.7M 25 Ms TF 490 Sec 33.2M 23 Ms 67.8M 50 Ms 25/09/2014 17:57 35.7M 24 Ms TF 495 Sec 33.2M 30 Ms 67.8M 48 Ms[/CODE] The questions that come to mind in no specific order of importance: 1. Almost every time all the workers stop before they send new end dates....well almost. Not on the 24th. Why are they stopping? 2. What would cause such a drastic increase in iteration times in Workers #3 and #4 when Worker #1 changes from TF to DC. Sept 21 17:55? I thought Haswell (as with Ivy and Sandy and all i-series) were much better at channel capacity and worker independence. 3. When worker #4 changed from 33 to 52 there were NO changes in work on the other 3 workers. I might just chalk that one up to external forces on the PC. Though it seemed to increase to an iteration time that is where it is consistently now. 4. Granted Benchmarks are "perfect" situations... that being said my times are WAY WAY above the benchmark I ran only a few weeks ago. About 10 Ms for the 35M DC and 20Ms for the 68M LL. 5. Could slower RAM make SUCH a big difference? Considering the TF times very little suggest to me RAM is NOT the issues...I may be wrong. 6. In a few weeks worker #2 will also be doing LL ..... are they all going to get SLOWER yet? Or simply give me some hints of where to start looking.... |
Have you turned off hyperthreading in the BIOS? I suspect two LL tests are getting assigned to the same physical core.
|
So setting 1 core per worker is not enough for Haswell?
Not sure I can change the BIOS. It is a "borg". Any other ways around it.? Thanks |
With HT and Windows, I always end up playing around with AffinityScramble2 to make sure threads don't share physical cores.
For example, I had to set my 2600K running 4 workers to [CODE]AffinityScramble2=02461357[/CODE] |
[QUOTE=petrw1;383942]So setting 1 core per worker is not enough for Haswell?[/QUOTE]
It should be, but prime95's hyperthread detection does not always work. Can you run task manager and see if two workers are running on one core? |
[QUOTE=sdbardwick;383944]With HT and Windows, I always end up playing around with AffinityScramble2 to make sure threads don't share physical cores.
For example, I had to set my 2600K running 4 workers to [CODE]AffinityScramble2=02461357[/CODE][/QUOTE] You may wish to use 13570246 instead. The first CPU core in x86 usually handles more interrupts, so having it free to handle those is an advantage. |
[QUOTE=Mark Rose;383967]You may wish to use 13570246 instead. The first CPU core in x86 usually handles more interrupts, so having it free to handle those is an advantage.[/QUOTE]
So this didn't make a difference...I suspect I still have 2 tests running on the same physical core. I still need to verify this. Could it be their Cores (Physical and HT) are numbered different? For example (completely made up guess) ... maybe Physical Core 0's HT partner is 7 ( 1 is 6, etc) ... How could I find out? Or could it even be that Haswell has in a way randomized how it numbers them based on the work load? |
[QUOTE=petrw1;384080]
Could it be their Cores (Physical and HT) are numbered different? For example (completely made up guess) ... maybe Physical Core 0's HT partner is 7 ( 1 is 6, etc) ... How could I find out?[/QUOTE] Set DebugAffinityScramble=1 in prime.txt. At startup, prime95 will output its calculations trying to determine logical/physical CPUs. Prime95 does this by running some code it thinks should take 100K clock cycles. It then puts a logical CPU in a busy loop and times this 100K code on the other 7 logical CPUs. The theory is that 6 logical CPUs will time at 100K and one will time at 200K. Then the busy loop logical CPU and the 200K logical CPU are on one physical core. |
[QUOTE=Prime95;384083]Set DebugAffinityScramble=1 in prime.txt. At startup, prime95 will output its calculations trying to determine logical/physical CPUs.
Prime95 does this by running some code it thinks should take 100K clock cycles. It then puts a logical CPU in a busy loop and times this 100K code on the other 7 logical CPUs. The theory is that 6 logical CPUs will time at 100K and one will time at 200K. Then the busy loop logical CPU and the 200K logical CPU are on one physical core.[/QUOTE] ok will do ... I need to get someone else to do this .... not as geeky. Will it simply output this to results.txt or do I need to look at the actual window that runs it? AND.... Just so I get it right once I know the pairs is the proper way to record AffinityScramble2= A). In Physical/Logical pairs B). All the Physical then all the Logical i.e. if 0 Physical is with 4 Logical; and 1 with 5; 2 with 6; 3 with 7. Do I code AffinityScramble2=04152637 (This is my guess) OR AffinityScramble2=01234567 |
[CODE][Main thread Sep 30 11:23] Test clocks on logical CPU #1: 214592
[Main thread Sep 30 11:23] Logical CPU 2 clocks: 407000 [Main thread Sep 30 11:23] Logical CPU 3 clocks: 214576 [Main thread Sep 30 11:23] Logical CPU 4 clocks: 214720 [Main thread Sep 30 11:23] Logical CPU 5 clocks: 214576 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 214712 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 214608 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 214856 [Main thread Sep 30 11:23] Test clocks on logical CPU #3: 214576 [Main thread Sep 30 11:23] Logical CPU 4 clocks: 201806 [Main thread Sep 30 11:23] Logical CPU 5 clocks: 113962 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 114196 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 113964 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 114040 [Main thread Sep 30 11:23] Test clocks on logical CPU #5: 114028 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 177253 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 93538 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 93586 [Main thread Sep 30 11:23] Test clocks on logical CPU #7: 93583 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 177235 [Main thread Sep 30 11:23] Logical CPUs 1,2 form one physical CPU. [Main thread Sep 30 11:23] Logical CPUs 3,4 form one physical CPU. [Main thread Sep 30 11:23] Logical CPUs 5,6 form one physical CPU. [Main thread Sep 30 11:23] Logical CPUs 7,8 form one physical CPU. [Main thread Sep 30 11:23] Starting workers.[/CODE] So this tells me I want AffinityScramble2=02461357 (or 13570246) Correct??? Turns out the program wasn't completely stopped/started yesterday so I don't believe the above changes actually took effect...stay tuned... |
So here is where I am at....
My worker doing 67M LL is still getting iteration times of 50Ms. It was at 33Ms when only 1 other worker was doing DC and the rest were doing TF.
So I suspect I still don't have it right.... 1. DebugAffinityScramble determined the following CPU Pairings. [CODE]Logical CPUs 1,2 form one physical CPU. Logical CPUs 3,4 form one physical CPU. Logical CPUs 5,6 form one physical CPU. Logical CPUs 7,8 form one physical CPU.[/CODE] Do I correctly assume that once it runs it will use that knowledge to assign the correct CPUs to each worker so that each gets a separate physical core? Or is it strictly informational and I use that knowledge as I see fit to set AffinityScramble2? What if I also have AffinityScramble2 set? Which setting takes precedence? 2. I tried to set AffinityScramble2 but I think I screwed up. But is that discussion even relevant if the DebugAffinityScramble forced the correct worker/CPU settings? 3. Turns out the AffinityScramble2 I had the person enter likely did NOT take affect because Prime95 was not exited and restarted to grab the new settings. It was only a stop all/start all workers. Am I correct here that it did not take affect? 4. Furthermore I suspect it was placed in the wrong place in local.txt. I incorrectly said it could go "anywhere" in that file. It was placed at the very end within the [Worker #4] section. Can I assume it would have been ignored even if Prime95 was completely exited/restarted? 5. I had it set as 13570246. Is this a correct setting based on what DebugAffinityScramble determined? Or is it not? By putting the HT cores all first will that cause Prime95 to assign the work to the HT cores instead of the Physical cores? BOTTOM LINE: Should I simply use the output from DebugAffinityScramble to set AffinityScramble2 correctly? What is correct? 02461357? 13570246? something else? OR should I leave in DebugAffinityScramble and remove AffinityScramble2? |
[QUOTE=petrw1;384133]By putting the HT cores all first will that cause Prime95 to assign the work to the HT cores instead of the Physical cores?[/QUOTE]Sorry I can't answer the rest of your questions. But there is a misunderstanding in that question. Each logical core is indistinguishable from the other. One physical execution core with two logical cores feeding it instructions. It makes no difference to the CPU if you use either the left or the right logical instruction generator to feed the execution core.
|
[QUOTE=petrw1;384116][CODE][Main thread Sep 30 11:23] Test clocks on logical CPU #1: 214592
[Main thread Sep 30 11:23] Logical CPU 2 clocks: 407000 [Main thread Sep 30 11:23] Logical CPU 3 clocks: 214576 [Main thread Sep 30 11:23] Logical CPU 4 clocks: 214720 [Main thread Sep 30 11:23] Logical CPU 5 clocks: 214576 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 214712 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 214608 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 214856[/CODE][/QUOTE] What a weird set of results you have. During this test my 100K clock routine took 200K on all logical CPUs!! Unless there is a bug in my code, some other process was running on that core as well (and that process was also CPU bound), or perhaps there is some clock throttling going on. |
[QUOTE=Prime95;384142]What a weird set of results you have.
During this test my 100K clock routine took 200K on all logical CPUs!! Unless there is a bug in my code, some other process was running on that core as well (and that process was also CPU bound), or perhaps there is some clock throttling going on.[/QUOTE] Similar odd results on 3/4. But 5-8 look normal. Could clock throttling only affect some cores? Or I guess there could have been something busy only on cores 1-4. Though that is unlikely as this is an office computer used in a clerical role. Also all workers including 3 and 4 are slow but their timings seemed normal in the test I ran. |
What are the CPU core temperatures when you run Prime95 on all cores? It might be a cooling problem causing thermal throttling.
|
[QUOTE=Mark Rose;384167]What are the CPU core temperatures when you run Prime95 on all cores? It might be a cooling problem causing thermal throttling.[/QUOTE]
Good thought. It would be a stock cooler but no overclock. |
[QUOTE=petrw1;384168]Good thought. It would be a stock cooler but no overclock.[/QUOTE]
Then you're almost certainly running into thermal throttling. The stock cooler can't keep up. There's also little point in running more than 3 cores, btw. There isn't enough memory bandwidth to get much if anything out of the 4th core. |
[QUOTE=Mark Rose;384170]Then you're almost certainly running into thermal throttling. The stock cooler can't keep up. There's also little point in running more than 3 cores, btw. There isn't enough memory bandwidth to get much if anything out of the 4th core.[/QUOTE]
You're probably right. @OP: Check the temperatures with something like SpeedFan(4.50). |
I understand how it could be temp except I would expect to see these effects during a long continuous heavy load.... and possibly gradual.
But rather what I am seeing from the worker windows is drastic immediate changes whenever one of the other workers changed from TF to LL/DC. This to me is symptomatic of workers sharing physical cores. Before I looked at the optional parameter AffinityScramble2 I noted the workers are sharing (in order) CPUS 1,2 ; 3,4 ; 5,6 ; 7,8 even though I have the worker windows set as: CPUs to use (Multi-threading) 1 Then I used DebugAffinityScramble=1 to have Prime95 run a test to determine which CPUs are HT Pairs. It confirmed the above pairings ... this tells me that even though it is ignoring the "CPUs to Use 1" setting and assigning 2 cores to each worker it appears to be assigning them properly so there should be no sharing of physical cores. To Mark's comment of not enough Memory Bandwidth to support 4 cores: I found that to be painfully true on my first generation 4-core (Q9550). But since then every i5 or i7 (Ivy, Sandy, etc.) have shown minimal degradation of throughput even with all 4 cores doing LL. |
Memory bandwidth bottleneck is back big-time with the Haswell -- especially with standard DDR3-1600.
If it is temp related throttling, then it will happen nearly immediately. |
[QUOTE=Prime95;384203]Memory bandwidth bottleneck is back big-time with the Haswell -- especially with standard DDR3-1600.[/quote]
Yep. The machine I'm typing this on is a 4770 with DDR3-1600. I only run three cores because the fourth is basically useless. :) |
[QUOTE=Prime95;384203]Memory bandwidth bottleneck is back big-time with the Haswell -- especially with standard DDR3-1600.
If it is temp related throttling, then it will happen nearly immediately.[/QUOTE] OK....so knowing that ... 2 questions: 1. Without opening the box how can I find out what kinds of RAM there is? I don't see it in the Device Manager. 2. Assuming it is DDR3-1600 is there a better (less RAM intensive) worker setup I can choose? I probably do NOT want to do PM1 or ECM correct? Would running 2 workers (2 pairs of CPUs each help? or be worse?) My timings are currently way off my Benchmarks (which I realize are near optimal conditions with only 1 worker running at a time). Benchmarks says 68M LL should be about 20Ms .... with 2 LL/DC workers I was getting 33Ms; with 3 LL/DC workers it went to 48Ms. Benchmarks says 36M DC should be about 11Ms .... with 2 LL/DC workers I was getting 16Ms; with 3 LL/DC workers it went to 33Ms. I now have 3 of 4 workers doing LL/DC ... I suspect it could get worse yet in a few days when the 4th starts LL/DC. Mark may have it right....4 LL/DC workers appears to be too many. |
[QUOTE=petrw1;384218]OK....so knowing that ... 2 questions:
1. Without opening the box how can I find out what kinds of RAM there is? I don't see it in the Device Manager.[/QUOTE] It's very unlikely to be anything faster. Officially only DDR3-1333/1600 are supported. Anything faster would be an overclock. On Linux, dmidecode gives the memory speed, so there should be some tool on Windows that can give you the same. Apparently CPU-Z can do it. |
[QUOTE]1. Without opening the box how can I find out what kinds of RAM there is? I don't see it in the Device Manager. [/QUOTE]CPUID CPU-Z will tell you for each stick.
EDIT: On my system, 1600 RAM shows a frequency of 800, (taking DDR into account.) |
I found George's first Stock test from 12 June 2013
[url]http://www.mersenneforum.org/showpost.php?p=343172&postcount=98[/url]
[QUOTE]Haswell at stock CPU speed and stock memory speed DDR3-1600 Times for LL test on 77000003 (4M FFT): 1 worker: 23.4 ms. 2 workers: 24.1, 24.1 ms. 3 workers: 26.1, 26.5, 26.1 ms. 4 workers: 30.7, 30.7, 30.5, 30.8 ms.[/QUOTE] Definitely a slow down but nowhere near what I am seeing. |
Let me ask the "obvious". Are you running in dual-channel mode?
|
[QUOTE=axn;384263]Let me ask the "obvious". Are you running in dual-channel mode?[/QUOTE]
At the risk of showing my ignorance....how can I tell? Non-invasively... This is a "borged" work PC so I have no say in the setup. But since it was built by the company "experts" I have to assume they know enough to do so. I certainly cannot change the setup. In fact in general I cannot even install extra software; at least nothing that requires any kind of Admin authority... i.e. nothing like CoreTemp or CPU-Z. |
Are you using the internal HD graphics or is there a dedicated GPU? An internal graphics might also use some mem bandwidth.
|
[QUOTE=VictordeHolland;384286]Are you using the internal HD graphics or is there a dedicated GPU? An internal graphics might also use some mem bandwidth.[/QUOTE]
I use the internal graphics while running three cores without problem. |
[QUOTE=VictordeHolland;384286]Are you using the internal HD graphics or is there a dedicated GPU? An internal graphics might also use some mem bandwidth.[/QUOTE]
Has to be internal Graphics; they would not install GPUs here. |
[QUOTE=petrw1;384275]At the risk of showing my ignorance....how can I tell? Non-invasively...
This is a "borged" work PC so I have no say in the setup. But since it was built by the company "experts" I have to assume they know enough to do so. I certainly cannot change the setup. In fact in general I cannot even install extra software; at least nothing that requires any kind of Admin authority... i.e. nothing like CoreTemp or CPU-Z.[/QUOTE] Actually, there's a very simple way to tell: open the case and count the memory sticks. If you only one stick it's single channel. If there are two sticks immediately beside each other (no empty slots), it's dual channel. |
[QUOTE]1. [U] Without[/U] opening the box how can I find out what kinds of RAM there is?[/QUOTE].
|
[QUOTE=kladner;384294].[/QUOTE]
:bangheadonwall: |
[QUOTE=Mark Rose;384289]Actually, there's a very simple way to tell: open the case and count the memory sticks. If you only one stick it's single channel. If there are two sticks immediately beside each other (no empty slots), it's dual channel.[/QUOTE]
I do know it only has 4G of RAM. It would not surprise me a lot if there was only one 4G stick. |
[QUOTE=petrw1;384301]I do know it only has 4G of RAM.
It would not surprise me a lot if there was only one 4G stick.[/QUOTE] That does seem low for a new machine. |
[QUOTE=Mark Rose;384305]That does seem low for a new machine.[/QUOTE]
IIRC, that is what the non-profit I work for is getting for the new Dell compact i5 desktops they've been ordering. A single DIMM would not surprise me, as 2 GB sticks are likely less cost effective than one 4 GB. |
So if I were to assume it has one 4G stick it will almost certainly be memory bound....and not be able to perform anywhere near its potential.
Should I: - only run 2 workers; one on 1 of the first 4 logical cores and 1 on the second set? - run 2 workers but let them each have 4 logical cores? - run 4 workers and let them fight it out knowing they will be slow but still produce more overall thru-put than 2 workers? |
[QUOTE=petrw1;384317]So if I were to assume it has one 4G stick it will almost certainly be memory bound....and not be able to perform anywhere near its potential.
Should I: - only run 2 workers; one on 1 of the first 4 logical cores and 1 on the second set? - run 2 workers but let them each have 4 logical cores? - run 4 workers and let them fight it out knowing they will be slow but still produce more overall thru-put than 2 workers?[/QUOTE] I have two 2400 MHz dual-channel memory.. still bottlenecked. |
[QUOTE=petrw1;384317]So if I were to assume it has one 4G stick it will almost certainly be memory bound....and not be able to perform anywhere near its potential.
Should I: - only run 2 workers; one on 1 of the first 4 logical cores and 1 on the second set? - run 2 workers but let them each have 4 logical cores? - run 4 workers and let them fight it out knowing they will be slow but still produce more overall thru-put than 2 workers?[/QUOTE] Since we haven't ruled out thermal throttling, fewer workers is probably best. And since you probably have half the memory bandwidth of most Haswells, you are unlikely to see much gain from more than two workers. Thus, I'd run two workers and letting the OS figure out which logical CPUs to use may be best. Sad. |
[QUOTE=petrw1;384275]In fact in general I cannot even install extra software; at least nothing that requires any kind of Admin authority... i.e. nothing like CoreTemp or CPU-Z.[/QUOTE]
CPU-Z has non-install version. Just download and run. However, while running, it may ask for admin clearance. Can you try it out? |
Can you open the PC case and clean up the dust of the cooler?
|
[QUOTE=pinhodecarlos;384355]Can you open the PC case and clean up the dust of the cooler?[/QUOTE]
PC is only a couple months old. No need to dust yet unless ... wink wink ... good opportunity to inspect what is there. |
Well with 4 workers doing LL .... IT REALLY SUCKS
Per iteration time almost exactly DOUBLED from when 2 workers were doing LL and 2 were doing TF.
I'm going to see what happens when I stop workers #1 and #3. Will #2 and #4 drop by half again ... or even more.... I'm am going to guess about the same and in the end I will be running TF on 2 workers and LL on the other 2. Tune in tomorrow: Same Bat Time Same Bat Channel |
If all 4 doing TF is slower per worker than 2 doing TF, you have a heat problem- probably a misaligned heatsink, maybe a non-functioning CPU fan. Testing all TF removes the memory bottleneck, I believe.
Two LLs could saturate single-stick memory throughput, but I would think TF's lighter memory use would allow LL to much-less-than-double when you add two TFs to 2 LLs; so I think you have a serious heat problem, rather than just a memory bottleneck. |
[QUOTE=VBCurtis;384528]If all 4 doing TF is slower per worker than 2 doing TF, you have a heat problem- probably a misaligned heatsink, maybe a non-functioning CPU fan. Testing all TF removes the memory bottleneck, I believe.
Two LLs could saturate single-stick memory throughput, but I would think TF's lighter memory use would allow LL to much-less-than-double when you add two TFs to 2 LLs; so I think you have a serious heat problem, rather than just a memory bottleneck.[/QUOTE] I think you missed the post title which says "Well with 4 workers doing LL .... IT REALLY SUCKS". |
[QUOTE=petrw1;384501]I'm am going to guess about the same and in the end I will be running TF on 2 workers and LL on the other 2.[/QUOTE]
Another (potential) option would be to run small exponent ECM on two of the cores (instead of TF), which would run out of cache and thus wouldn't contribute to the memory bottleneck. The reason I am recommending ECM over TF on a CPU is ... :smile: If you don't mind, could you try out that combination as well to see what is the performance hit to the LLs? |
[QUOTE=axn;384534]I think you missed the post title which says "Well with 4 workers doing LL .... IT REALLY SUCKS".[/QUOTE]
No, I was suggesting a way to possibly narrow down whether it's a memory bottleneck from single-channel memory or an instant-overheat problem. I agree that ECM is a smarter way to reduce memory contention- I forgot that's a part of GIMPS these days. |
Test #1 ... more extreme than expected.
With 4 workers doing LL/DC
Worker #2: 35M DC Ms/Iter: 34.34 Worker #4: 68M LL Ms/Iter: 66.02 (#1 is doing 35M DC; #3 is doing 55M LL both with relatively similar Iter. Times) With only Workers #2 and #4 running. Worker #2: 35M DC Ms/Iter: 16.30 Worker #4: 68M LL Ms/Iter: 32.42 In both cases LESS than half the time per iteration. The difference was immediate....as soon as workers #1 and #3 were stopped the iteration times dropped ... in fact they dropped to just OVER half the time per iteration initially and then a few percent faster yet over the next few hours before they stabilized at the above posted times. That being said, even with only 2 workers running they are still quite a bit over the benchmark I ran only a few weeks ago. Benchmark times are almost 40% faster yet: 2048 FFT as fast as 10.90 Ms (DC above using 1920 FFT) 3584 FFT as fast as 20.33 Ms I would like to believe that barring a REALLY bad installation that running 2 out of 4 cores would not be enough to over heat the CPU. PS There seems to be some native Turbo Boost (OC going on). The chip is rated at 3.4 Ghz but running at 3.7 Ghz. SO.... Next I can add TF and/or ECM to workers #1 and #3. I know what will happen with TF because I started the PC with workers #1 and #2 doing TF inherited from the 2 cores predecessor and workers #3 and #4 doing LL/DC with very close to the same iteration times as above with only 2 cores running. So I know I can add TF without impacting LL/DC... As for ECM; axn is also suggesting it will NOT slow down the LL/DC. If that is the case I think I will let then run ECM and leave TF for the GPUs. I have 800MB allocated to Prime95 (out of 4GB) ... I believe that is more than enough for ECM (Not ECM-Fermat). |
Assuming you have the latest Prime95, try the benchmark and post your results here? :smile:
|
1 Attachment(s)
[QUOTE=kracker;384679]Assuming you have the latest Prime95, try the benchmark and post your results here? :smile:[/QUOTE]
Benchmark attached...note both lines show 26.5 in the benchmark but in fact it was upgraded to 28.5 for the second benchmark (hence the MUCH better numbers). Apparently there is some issue with the benchmark software that may not note the version change if it is upgraded and a benchmark run right after??? [CODE]Last Activity 2014-10-07 23:59, Updated 2014-10-07 23:59, Registered 2014-08-25 17:48 GUID 63D1FAC485154FFF8132C68B36D70C9D Software Version Windows64,Prime95,v28.5,build 2 Model Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Features 4 core, hyperthreaded, Prefetch,SSE,SSE2,SSE4,AVX,AVX2,FMA, Speed 3.392 GHz (19.762 GHz P4 effective equivalent) L1/L2 Cache 32 / 256 KB Computer Memory 4008 MB configured usage 800 MB day / 800 MB night[/CODE] |
Your CPU is dying....
|
[QUOTE=petrw1;384684]Benchmark attached...note both lines show 26.5 in the benchmark but in fact it was upgraded to 28.5 for the second benchmark (hence the MUCH better numbers). Apparently there is some issue with the benchmark software that may not note the version change if it is upgraded and a benchmark run right after???
[CODE]Last Activity 2014-10-07 23:59, Updated 2014-10-07 23:59, Registered 2014-08-25 17:48 GUID 63D1FAC485154FFF8132C68B36D70C9D Software Version Windows64,Prime95,v28.5,build 2 Model Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Features 4 core, hyperthreaded, Prefetch,SSE,SSE2,SSE4,AVX,AVX2,FMA, Speed 3.392 GHz (19.762 GHz P4 effective equivalent) L1/L2 Cache 32 / 256 KB Computer Memory 4008 MB configured usage 800 MB day / 800 MB night[/CODE][/QUOTE] Hmm, I see. Well, what I really meant was for you to post the log of Prime95's benchmark(in results.txt I think after it's finished) Here's mine for example: [code] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz CPU speed: 3534.93 MHz, 4 cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 6 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.5, RdtscTiming=1 Best time for 1024K FFT length: 3.806 ms., avg: 3.890 ms. Best time for 1280K FFT length: 4.897 ms., avg: 5.268 ms. Best time for 1536K FFT length: 5.942 ms., avg: 7.670 ms. Best time for 1792K FFT length: 7.161 ms., avg: 9.124 ms. Best time for 2048K FFT length: 8.152 ms., avg: 8.240 ms. Best time for 2560K FFT length: 10.399 ms., avg: 12.288 ms. Best time for 3072K FFT length: 12.516 ms., avg: 12.628 ms. Best time for 3584K FFT length: 14.924 ms., avg: 15.013 ms. Best time for 4096K FFT length: 17.032 ms., avg: 20.831 ms. Best time for 5120K FFT length: 21.642 ms., avg: 23.980 ms. Best time for 6144K FFT length: 26.061 ms., avg: 28.492 ms. Best time for 7168K FFT length: 31.057 ms., avg: 32.707 ms. Best time for 8192K FFT length: 35.937 ms., avg: 36.401 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 2.057 ms., avg: 2.094 ms. Best time for 1280K FFT length: 2.613 ms., avg: 4.898 ms. Best time for 1536K FFT length: 3.181 ms., avg: 3.390 ms. Best time for 1792K FFT length: 3.828 ms., avg: 4.099 ms. Best time for 2048K FFT length: 4.351 ms., avg: 4.538 ms. Best time for 2560K FFT length: 5.509 ms., avg: 5.555 ms. Best time for 3072K FFT length: 6.687 ms., avg: 7.799 ms. Best time for 3584K FFT length: 7.878 ms., avg: 10.341 ms. Best time for 4096K FFT length: 9.130 ms., avg: 9.168 ms. Best time for 5120K FFT length: 11.478 ms., avg: 17.221 ms. Best time for 6144K FFT length: 13.682 ms., avg: 16.085 ms. Best time for 7168K FFT length: 16.449 ms., avg: 16.585 ms. Best time for 8192K FFT length: 19.097 ms., avg: 21.210 ms. Timing FFTs using 3 threads. Best time for 1024K FFT length: 1.455 ms., avg: 1.485 ms. Best time for 1280K FFT length: 1.873 ms., avg: 1.907 ms. Best time for 1536K FFT length: 2.274 ms., avg: 2.319 ms. Best time for 1792K FFT length: 2.733 ms., avg: 2.774 ms. Best time for 2048K FFT length: 3.198 ms., avg: 3.241 ms. Best time for 2560K FFT length: 4.037 ms., avg: 4.079 ms. Best time for 3072K FFT length: 4.846 ms., avg: 5.031 ms. Best time for 3584K FFT length: 5.773 ms., avg: 5.833 ms. Best time for 4096K FFT length: 6.659 ms., avg: 8.788 ms. Best time for 5120K FFT length: 8.332 ms., avg: 8.451 ms. Best time for 6144K FFT length: 10.070 ms., avg: 10.253 ms. Best time for 7168K FFT length: 11.973 ms., avg: 12.058 ms. Best time for 8192K FFT length: 13.976 ms., avg: 14.087 ms. Timing FFTs using 4 threads. Best time for 1024K FFT length: 1.226 ms., avg: 1.260 ms. Best time for 1280K FFT length: 1.555 ms., avg: 1.600 ms. Best time for 1536K FFT length: 1.937 ms., avg: 3.906 ms. Best time for 1792K FFT length: 2.353 ms., avg: 2.400 ms. Best time for 2048K FFT length: 2.795 ms., avg: 2.846 ms. Best time for 2560K FFT length: 3.538 ms., avg: 3.576 ms. Best time for 3072K FFT length: 4.157 ms., avg: 4.215 ms. Best time for 3584K FFT length: 5.036 ms., avg: 7.363 ms. Best time for 4096K FFT length: 5.831 ms., avg: 5.877 ms. Best time for 5120K FFT length: 7.318 ms., avg: 9.553 ms. Best time for 6144K FFT length: 8.738 ms., avg: 8.837 ms. Best time for 7168K FFT length: 10.378 ms., avg: 11.727 ms. Best time for 8192K FFT length: 12.161 ms., avg: 12.284 ms. Timings for 1024K FFT length (1 cpu, 1 worker): 3.91 ms. Throughput: 255.67 iter/sec. Timings for 1024K FFT length (2 cpus, 2 workers): 4.09, 4.08 ms. Throughput: 489.46 iter/sec. Timings for 1024K FFT length (3 cpus, 3 workers): 4.91, 4.54, 4.47 ms. Throughput: 647.90 iter/sec. Timings for 1024K FFT length (4 cpus, 4 workers): 5.76, 5.77, 5.41, 5.29 ms. Throughput: 720.83 iter/sec. Timings for 1280K FFT length (1 cpu, 1 worker): 4.95 ms. Throughput: 201.94 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 5.23, 5.21 ms. Throughput: 383.17 iter/sec. Timings for 1280K FFT length (3 cpus, 3 workers): 6.14, 5.79, 5.72 ms. Throughput: 510.31 iter/sec. Timings for 1280K FFT length (4 cpus, 4 workers): 7.29, 7.32, 6.76, 6.65 ms. Throughput: 571.91 iter/sec. Timings for 1536K FFT length (1 cpu, 1 worker): 6.00 ms. Throughput: 166.56 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 6.41, 6.30 ms. Throughput: 314.66 iter/sec. Timings for 1536K FFT length (3 cpus, 3 workers): 7.43, 6.94, 6.86 ms. Throughput: 424.38 iter/sec. Timings for 1536K FFT length (4 cpus, 4 workers): 8.94, 8.87, 8.17, 8.00 ms. Throughput: 471.97 iter/sec. Timings for 1792K FFT length (1 cpu, 1 worker): 7.19 ms. Throughput: 139.09 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 7.57, 7.52 ms. Throughput: 264.99 iter/sec. Timings for 1792K FFT length (3 cpus, 3 workers): 8.95, 8.36, 8.23 ms. Throughput: 352.84 iter/sec. Timings for 1792K FFT length (4 cpus, 4 workers): 10.40, 10.34, 9.72, 9.53 ms. Throughput: 400.69 iter/sec. Timings for 2048K FFT length (1 cpu, 1 worker): 8.49 ms. Throughput: 117.84 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 8.64, 8.70 ms. Throughput: 230.66 iter/sec. Timings for 2048K FFT length (3 cpus, 3 workers): 10.46, 9.72, 9.34 ms. Throughput: 305.48 iter/sec. Timings for 2048K FFT length (4 cpus, 4 workers): 12.02, 12.05, 11.31, 11.00 ms. Throughput: 345.52 iter/sec. Timings for 2560K FFT length (1 cpu, 1 worker): 10.45 ms. Throughput: 95.66 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 11.03, 11.00 ms. Throughput: 181.59 iter/sec. Timings for 2560K FFT length (3 cpus, 3 workers): 13.08, 12.19, 12.03 ms. Throughput: 241.58 iter/sec. Timings for 2560K FFT length (4 cpus, 4 workers): 14.99, 15.16, 14.06, 14.13 ms. Throughput: 274.58 iter/sec. Timings for 3072K FFT length (1 cpu, 1 worker): 12.87 ms. Throughput: 77.68 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 13.30, 13.28 ms. Throughput: 150.46 iter/sec. Timings for 3072K FFT length (3 cpus, 3 workers): 15.58, 14.55, 14.42 ms. Throughput: 202.29 iter/sec. Timings for 3072K FFT length (4 cpus, 4 workers): 18.44, 17.97, 16.73, 16.47 ms. Throughput: 230.37 iter/sec. Timings for 3584K FFT length (1 cpu, 1 worker): 15.05 ms. Throughput: 66.42 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 15.64, 15.58 ms. Throughput: 128.13 iter/sec. Timings for 3584K FFT length (3 cpus, 3 workers): 18.51, 17.31, 17.03 ms. Throughput: 170.51 iter/sec. Timings for 3584K FFT length (4 cpus, 4 workers): 21.54, 21.45, 19.83, 19.43 ms. Throughput: 194.95 iter/sec. Timings for 4096K FFT length (1 cpu, 1 worker): 18.21 ms. Throughput: 54.92 iter/sec. Timings for 4096K FFT length (2 cpus, 2 workers): 19.41, 18.67 ms. Throughput: 105.07 iter/sec. [/code] |
[QUOTE=pinhodecarlos;384690]Your CPU is dying....[/QUOTE]
I hope not. It is a brand new PC at a friends work place. I am prone to believe it is just a very poor setup. That is, akin to a Lamborghini CPU Chip with Ford Fiesta RAM. |
Would you be kind to post some pictures of your BIOS settings? If you don't have where to host them just send me a email (my [email]nickname@yahoo.com[/email]) and I will host them for you.
Carlos |
[QUOTE=axn;384534]I think you missed the post title which says "Well with 4 workers doing LL .... IT REALLY SUCKS".[/QUOTE]
OK, it was my reading comprehension at fault here. I missed the "from", thought iteration time doubled with TFx2 and LLx2 vs LLx2 and idle. On the bright side, that may mean this is merely a single-channel memory problem, and not a defective heatsink mount. OP- If the heatsink is not mounted flush, it would work at less than 10% efficiency, and you could have overheating problems/throttling with even a single core at full blast. However, the turbo use on single-thread suggests heat is not the culprit, leaving you with memory as the problem (as your analogy indicates). Two ECMs and two LLs seem like the plan, then. |
[QUOTE=petrw1;384700]I hope not. It is a brand new PC at a friends work place.
I am prone to believe it is just a very poor setup. That is, akin to a Lamborghini CPU Chip with Ford Fiesta RAM.[/QUOTE] A Ford Fiesta ST is a nice car :) |
[QUOTE]Would you be kind to post some pictures of your BIOS settings? [/QUOTE]
I wonder if this is possible on a remote system. Can a EUFI system make BIOS settings accessible to the Borgmeister? |
[QUOTE=kladner;384716]I wonder if this is possible on a remote system. Can a EUFI system make BIOS settings accessible to the Borgmeister?[/QUOTE]
I didn't know it was a remote machine. Just ask your friend to take some pictures with a digital camera. Also take a snap of windows manager, we want to see which services are on and off. On my laptop cruncher only 40 services on, the rest is off including windows updates, windows searcher, NET [I]Framework, [/I]etc...off the ones which consume a little of cpu. (list services that are automatic on startup) |
Does NOT NOT NOT look good.
[QUOTE=kracker;384691]Well, what I really meant was for you to post the log of Prime95's benchmark(in results.txt I think after it's finished)
[/QUOTE] Very little gain running more than 1 core.... I cut out the Hyperthreaded lines to stay under 10000 characters. At the very bottom I am seeing timing for 3584K almost exactly what I was actually seeing. They come close to nullifying all cores after the first. [CODE]Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz CPU speed: 3392.39 MHz, 4 hyperthreaded cores CPU features: Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA L1 cache size: 32 KB L2 cache size: 256 KB, L3 cache size: 8 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 64 Prime95 64-bit version 28.5, RdtscTiming=1 Best time for 1024K FFT length: 4.316 ms., avg: 4.365 ms. Best time for 1280K FFT length: 5.955 ms., avg: 7.817 ms. Best time for 1536K FFT length: 7.514 ms., avg: 7.606 ms. Best time for 1792K FFT length: 9.178 ms., avg: 9.232 ms. Best time for 2048K FFT length: 10.764 ms., avg: 10.803 ms. Best time for 2560K FFT length: 13.936 ms., avg: 14.004 ms. Best time for 3072K FFT length: 16.882 ms., avg: 16.940 ms. Best time for 3584K FFT length: 20.248 ms., avg: 20.325 ms. Best time for 4096K FFT length: 23.207 ms., avg: 23.296 ms. Best time for 5120K FFT length: 29.450 ms., avg: 31.504 ms. Best time for 6144K FFT length: 35.581 ms., avg: 35.686 ms. Best time for 7168K FFT length: 42.121 ms., avg: 43.570 ms. Best time for 8192K FFT length: 49.889 ms., avg: 53.093 ms. Timing FFTs using 2 threads on 1 physical CPU. Best time for 1024K FFT length: 4.619 ms., avg: 7.095 ms. Best time for 1280K FFT length: 6.279 ms., avg: 6.588 ms. Best time for 1536K FFT length: 7.950 ms., avg: 12.200 ms. Best time for 1792K FFT length: 9.690 ms., avg: 10.387 ms. Best time for 2048K FFT length: 11.176 ms., avg: 12.600 ms. Best time for 2560K FFT length: 14.717 ms., avg: 17.455 ms. Best time for 3072K FFT length: 17.774 ms., avg: 18.123 ms. Best time for 3584K FFT length: 20.664 ms., avg: 20.921 ms. Best time for 4096K FFT length: 24.266 ms., avg: 24.808 ms. Best time for 5120K FFT length: 30.842 ms., avg: 32.873 ms. Best time for 6144K FFT length: 37.197 ms., avg: 38.001 ms. Best time for 7168K FFT length: 43.322 ms., avg: 44.531 ms. Best time for 8192K FFT length: 51.229 ms., avg: 51.685 ms. Timing FFTs using 2 threads on 2 physical CPUs. Best time for 1024K FFT length: 2.561 ms., avg: 2.654 ms. Best time for 1280K FFT length: 4.102 ms., avg: 5.561 ms. Best time for 1536K FFT length: 5.307 ms., avg: 5.794 ms. Best time for 1792K FFT length: 6.667 ms., avg: 7.568 ms. Best time for 2048K FFT length: 8.151 ms., avg: 8.493 ms. Best time for 2560K FFT length: 10.508 ms., avg: 10.574 ms. Best time for 3072K FFT length: 13.018 ms., avg: 13.196 ms. Best time for 3584K FFT length: 15.140 ms., avg: 15.296 ms. Best time for 4096K FFT length: 17.679 ms., avg: 18.545 ms. Best time for 5120K FFT length: 22.282 ms., avg: 22.396 ms. Best time for 6144K FFT length: 26.889 ms., avg: 28.809 ms. Best time for 7168K FFT length: 32.132 ms., avg: 33.696 ms. Best time for 8192K FFT length: 37.879 ms., avg: 38.186 ms. Timing FFTs using 3 threads on 3 physical CPUs. Best time for 1024K FFT length: 2.235 ms., avg: 2.450 ms. Best time for 1280K FFT length: 3.710 ms., avg: 5.687 ms. Best time for 1536K FFT length: 5.045 ms., avg: 5.331 ms. Best time for 1792K FFT length: 6.458 ms., avg: 7.982 ms. Best time for 2048K FFT length: 7.901 ms., avg: 8.435 ms. Best time for 2560K FFT length: 10.202 ms., avg: 11.442 ms. Best time for 3072K FFT length: 13.207 ms., avg: 13.826 ms. Best time for 3584K FFT length: 14.804 ms., avg: 16.189 ms. Best time for 4096K FFT length: 17.347 ms., avg: 19.702 ms. Best time for 5120K FFT length: 21.842 ms., avg: 22.895 ms. Best time for 6144K FFT length: 26.374 ms., avg: 27.837 ms. Best time for 7168K FFT length: 30.957 ms., avg: 31.961 ms. Best time for 8192K FFT length: 37.411 ms., avg: 38.285 ms. Timing FFTs using 4 threads on 4 physical CPUs. Best time for 1024K FFT length: 1.876 ms., avg: 1.986 ms. Best time for 1280K FFT length: 3.710 ms., avg: 5.285 ms. Best time for 1536K FFT length: 5.165 ms., avg: 5.856 ms. Best time for 1792K FFT length: 6.548 ms., avg: 11.305 ms. Best time for 2048K FFT length: 7.919 ms., avg: 8.049 ms. Best time for 2560K FFT length: 10.321 ms., avg: 10.382 ms. Best time for 3072K FFT length: 12.952 ms., avg: 13.039 ms. Best time for 3584K FFT length: 15.076 ms., avg: 17.847 ms. Best time for 4096K FFT length: 17.652 ms., avg: 17.917 ms. Best time for 5120K FFT length: 22.523 ms., avg: 23.260 ms. Best time for 6144K FFT length: 26.981 ms., avg: 27.493 ms. Best time for 7168K FFT length: 31.531 ms., avg: 31.758 ms. Best time for 8192K FFT length: 37.925 ms., avg: 39.585 ms. Timing FFTs using 8 threads on 4 physical CPUs. Best time for 1024K FFT length: 2.405 ms., avg: 2.651 ms. Best time for 1280K FFT length: 3.778 ms., avg: 3.913 ms. Best time for 1536K FFT length: 5.284 ms., avg: 5.867 ms. Best time for 1792K FFT length: 6.808 ms., avg: 7.622 ms. Best time for 2048K FFT length: 8.152 ms., avg: 8.698 ms. Best time for 2560K FFT length: 10.570 ms., avg: 11.091 ms. Best time for 3072K FFT length: 13.206 ms., avg: 13.439 ms. Best time for 3584K FFT length: 15.440 ms., avg: 15.906 ms. Best time for 4096K FFT length: 18.105 ms., avg: 18.417 ms. Best time for 5120K FFT length: 22.407 ms., avg: 23.024 ms. Best time for 6144K FFT length: 27.319 ms., avg: 27.660 ms. Best time for 7168K FFT length: 31.856 ms., avg: 32.639 ms. Best time for 8192K FFT length: 38.538 ms., avg: 39.446 ms. Timings for 1024K FFT length (1 cpu, 1 worker): 4.42 ms. Throughput: 226.34 iter/sec. Timings for 1024K FFT length (2 cpus, 2 workers): 8.27, 8.17 ms. Throughput: 243.41 iter/sec. Timings for 1024K FFT length (3 cpus, 3 workers): 12.92, 12.75, 12.68 ms. Throughput: 234.71 iter/sec. Timings for 1024K FFT length (4 cpus, 4 workers): 18.66, 18.15, 17.97, 17.44 ms. Throughput: 221.68 iter/sec. Timings for 1280K FFT length (1 cpu, 1 worker): 6.12 ms. Throughput: 163.45 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 10.85, 10.72 ms. Throughput: 185.51 iter/sec. Timings for 1280K FFT length (3 cpus, 3 workers): 16.60, 16.23, 16.03 ms. Throughput: 184.22 iter/sec. Timings for 1280K FFT length (4 cpus, 4 workers): 22.92, 22.73, 22.49, 22.68 ms. Throughput: 176.18 iter/sec. Timings for 1536K FFT length (1 cpu, 1 worker): 7.86 ms. Throughput: 127.22 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 13.15, 13.07 ms. Throughput: 152.59 iter/sec. Timings for 1536K FFT length (3 cpus, 3 workers): 20.14, 19.87, 19.54 ms. Throughput: 151.17 iter/sec. Timings for 1536K FFT length (4 cpus, 4 workers): 32.81, 27.20, 27.11, 26.50 ms. Throughput: 141.87 iter/sec. Timings for 1792K FFT length (1 cpu, 1 worker): 9.30 ms. Throughput: 107.48 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 15.39, 15.27 ms. Throughput: 130.44 iter/sec. Timings for 1792K FFT length (3 cpus, 3 workers): 23.09, 22.71, 22.74 ms. Throughput: 131.31 iter/sec. Timings for 1792K FFT length (4 cpus, 4 workers): 35.02, 32.56, 32.56, 32.20 ms. Throughput: 121.04 iter/sec. Timings for 2048K FFT length (1 cpu, 1 worker): 11.00 ms. Throughput: 90.89 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 18.12, 17.90 ms. Throughput: 111.06 iter/sec. Timings for 2048K FFT length (3 cpus, 3 workers): 27.46, 26.76, 26.45 ms. Throughput: 111.59 iter/sec. Timings for 2048K FFT length (4 cpus, 4 workers): 43.68, 35.49, 35.52, 35.25 ms. Throughput: 107.60 iter/sec. Timings for 2560K FFT length (1 cpu, 1 worker): 14.16 ms. Throughput: 70.62 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 22.59, 22.30 ms. Throughput: 89.12 iter/sec. Timings for 2560K FFT length (3 cpus, 3 workers): 34.03, 33.28, 33.39 ms. Throughput: 89.39 iter/sec. Timings for 2560K FFT length (4 cpus, 4 workers): 45.87, 45.14, 44.72, 44.20 ms. Throughput: 88.94 iter/sec. Timings for 3072K FFT length (1 cpu, 1 worker): 17.13 ms. Throughput: 58.39 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 29.53, 27.82 ms. Throughput: 69.81 iter/sec. Timings for 3072K FFT length (3 cpus, 3 workers): 42.69, 41.27, 41.13 ms. Throughput: 71.97 iter/sec. Timings for 3072K FFT length (4 cpus, 4 workers): 59.33, 57.18, 57.36, 57.09 ms. Throughput: 69.30 iter/sec. Timings for 3584K FFT length (1 cpu, 1 worker): 20.60 ms. Throughput: 48.55 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 32.28, 31.86 ms. Throughput: 62.37 iter/sec. Timings for 3584K FFT length (3 cpus, 3 workers): 49.15, 47.99, 46.11 ms. Throughput: 62.87 iter/sec. Timings for 3584K FFT length (4 cpus, 4 workers): 68.90, 64.68, 65.81, 64.67 ms. Throughput: 60.63 iter/sec.[/CODE] |
Can you install CoreTemp and get temps for idle, 1 thread, 2 threads and so on?
|
[QUOTE=petrw1;384859]Very little gain running more than 1 core....
I cut out the Hyperthreaded lines to stay under 10000 characters. At the very bottom I am seeing timing for 3584K almost exactly what I was actually seeing. They come close to nullifying all cores after the first.[/QUOTE] Actually, looking at the numbers, there might be a way to greatly boost your throughput. Try running 1 LL test using 4 threads (or even 3 threads). That might actually give you more throughput than running 2 LLs x 1 thread each (hopefully, closer to 3 LLs). EDIT:- Not really. So based on actual numbers, [code] Best time for 3584K FFT length: 20.248 ms., avg: 20.325 ms. Timing FFTs using 2 threads on 1 physical CPU. Best time for 3584K FFT length: 20.664 ms., avg: 20.921 ms. Timing FFTs using 2 threads on 2 physical CPUs. Best time for 3584K FFT length: 15.140 ms., avg: [B]15.296 ms[/B]. Timing FFTs using 3 threads on 3 physical CPUs. Best time for 3584K FFT length: [B]14.804 ms[/B]., avg: 16.189 ms. Timing FFTs using 4 threads on 4 physical CPUs. Best time for 3584K FFT length: 15.076 ms., avg: 17.847 ms. Timing FFTs using 8 threads on 4 physical CPUs. Best time for 3584K FFT length: 15.440 ms., avg: 15.906 ms. [/code] 1 LL 2 threads gives a throughput of 1000/15.3 = 65.3 iter/sec, which is only marginally better than 2 LL x 1 thread (62.4 iter/sec). Nonetheless, that is probably the way to go. While 3 threads have better "best-case", it's average case is really bad. |
[QUOTE=garo;384861]Can you install CoreTemp and get temps for idle, 1 thread, 2 threads and so on?[/QUOTE]
[CODE]Timing FFTs using 4 threads on 4 physical CPUs. Best time for 1024K FFT length: 1.876 ms., avg: 1.986 ms. Timings for 1024K FFT length (4 cpus, 4 workers): 18.66, 18.15, 17.97, 17.44 ms. Throughput: 221.68 iter/sec.[/CODE] These three lines effectively demolish any temp throttling hypotheses. When the entire FFT fits within the cache (or nearly so), you get a thruput of more than twice that of the 4 test case. |
[QUOTE=axn;384866]Actually, looking at the numbers, there might be a way to greatly boost your throughput. Try running 1 LL test using 4 threads (or even 3 threads). That might actually give you more throughput than running 2 LLs x 1 thread each (hopefully, closer to 3 LLs).
EDIT:- Not really. So based on actual numbers, [code] Best time for 3584K FFT length: 20.248 ms., avg: 20.325 ms. Timing FFTs using 2 threads on 1 physical CPU. Best time for 3584K FFT length: 20.664 ms., avg: 20.921 ms. Timing FFTs using 2 threads on 2 physical CPUs. Best time for 3584K FFT length: 15.140 ms., avg: [B]15.296 ms[/B]. Timing FFTs using 3 threads on 3 physical CPUs. Best time for 3584K FFT length: [B]14.804 ms[/B]., avg: 16.189 ms. Timing FFTs using 4 threads on 4 physical CPUs. Best time for 3584K FFT length: 15.076 ms., avg: 17.847 ms. Timing FFTs using 8 threads on 4 physical CPUs. Best time for 3584K FFT length: 15.440 ms., avg: 15.906 ms. [/code] 1 LL 2 threads gives a throughput of 1000/15.3 = 65.3 iter/sec, which is only marginally better than 2 LL x 1 thread (62.4 iter/sec). Nonetheless, that is probably the way to go. While 3 threads have better "best-case", it's average case is really bad.[/QUOTE] Or sadly is the best overall gimps benefit to run 1 LL at 20ms for 3548fft and tf or ecm on the other 3 cores. Currently testing with # LL and 2 ECM |
It's either temps or memory by now...
|
[QUOTE=petrw1;384903]Or sadly is the best overall gimps benefit to run 1 LL at 20ms for 3548fft and tf or ecm on the other 3 cores. Currently testing with # LL and 2 ECM[/QUOTE]
Oops should say 2 LL and 2 ECM (Small). I already know that adding 2 TF does NOT impact the LL time by more than a percent or two. I was doing 50xxx TF to 61,62 bits on 2 cores when this PC first turned up. These assignments were giving me 7 GhzDays / Day / Core. I find that TF assignments for bit levels below 65 are quite generous. So I know that if I let it do the current TF assignments that PrimeNet would hand out (66 - 72 bits) it would not be quite that productive. But it will be a LOT more than 2 ... see next point. I have noted so far that it completes 2 ECM (one each from core #1 and #3) in 4:30. That works out to 2 GhzDays / Day / Core. I have 800MB RAM of the 4000MB total allocated. I think this should be enough for ECM Small even if both are in Stage 2. I need a day or two to determine if this slows down the LL workers. If it does cause a LL slowdown I'll try 1 TF and 1 ECM. |
Have you checked the temperatures yet? :smile:
|
[QUOTE=kracker;384922]Have you checked the temperatures yet? :smile:[/QUOTE]
Not yet.... 1. Not sure it can be installed on that PC 2. The friend is off this week. 3. axn 3 posts below suggested temp can be ruled out. |
[QUOTE=petrw1;384928]Not yet....
1. Not sure it can be installed on that PC 2. The friend is off this week. 3. axn 3 posts below suggested temp can be ruled out.[/QUOTE] Coretemp doesn't need installation, click on the "More Downloads..." link and use the standalone. Maybe temps aren't the problem but it doesn't hurt to be 100% sure :smile: |
[QUOTE=kracker;384922]Have you checked the temperatures yet? :smile:[/QUOTE]
[STRIKE]Hasn't that been addressed? He can't install anything which requires Admin privileges.[/STRIKE] Oops. Crosspost. |
[QUOTE=kladner;384931][STRIKE]Hasn't that been addressed? He can't install anything which requires Admin privileges.[/STRIKE]
[/QUOTE] Whoops. :blush: |
[QUOTE=kracker;384929]click on the "More Downloads..." link and use the standalone.
[/QUOTE] I'll try this . |
| All times are UTC. The time now is 10:24. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.