mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Riesel Prime Search (https://www.mersenneforum.org/forumdisplay.php?f=59)
-   -   RPS benchmarks (https://www.mersenneforum.org/showthread.php?t=19050)

pinhodecarlos 2014-01-03 08:54

RPS benchmarks
 
Just for interest. Post your bench!!!

i7 3630QM at 3.2 GHz "28 watts with core temp software" with HT on (can't turn HT off on this laptop)
AVX FFT length 200K, Pass1=640, Pass2=320 on both instances
1.158 ms k=5 n=3801694
1.164 ms k=5 n=3812730

pinhodecarlos 2014-01-04 12:05

Paul's 4770K at 3.9 GHz, HT off.

0.975 ms k=5 n=4100030 AVX FFT length 224K
0.977 ms k=5 n=4200010 AVX FFT length 224K
1.050 ms k=5 n=4300000 AVX FFT length 240K
1.042 ms k=5 n=4400024 AVX FFT length 240K

Edit: Paul, correct me if I am wrong. Thank you.

diep 2014-01-06 14:28

benchmarks only useful with all cores busy and then divide it by number of cores you run for total throughput.

At the oldie Xeon machines here it takes pretty long for 1 test, yet each box i can run 8 cores meanwhile total box consumption 170 watt. Built those machines a few years ago for total peanuts, like 200 euro each.

Chip inside L5420 has SSSE so not AVX.

All those new i7's, they have fast AVX, yet only 4 real cores. Where is the big progress in crunching there?

the L5420 or something i run here at 2.5Ghz, they were produced

[url]http://ark.intel.com/nl/products/33929/Intel-Xeon-Processor-L5420-12M-Cache-2_50-GHz-1333-MHz-FSB[/url]

So januari 2008.

If we calculate then derived truth from Moore's Law each 18 months doubling in speed, then now januari 2014 we should have a new chip available that's:

6 years == 72 months => 72 / 18 = 4 doublings => 2^4 = 16 times faster than the chips i got.

Only gpu's seem to speed up, though also goes slower now than a few years ago for progress there.

pinhodecarlos 2014-01-06 15:34

[QUOTE=diep;363930]benchmarks only useful with all cores busy and then divide it by number of cores you run for total throughput.

At the oldie Xeon machines here it takes pretty long for 1 test, yet each box i can run 8 cores meanwhile total box consumption 170 watt. Built those machines a few years ago for total peanuts, like 200 euro each.

Chip inside L5420 has SSSE so not AVX.

All those new i7's, they have fast AVX, yet only 4 real cores. Where is the big progress in crunching there?

the L5420 or something i run here at 2.5Ghz, they were produced

[URL]http://ark.intel.com/nl/products/33929/Intel-Xeon-Processor-L5420-12M-Cache-2_50-GHz-1333-MHz-FSB[/URL]

So januari 2008.

If we calculate then derived truth from Moore's Law each 18 months doubling in speed, then now januari 2014 we should have a new chip available that's:

6 years == 72 months => 72 / 18 = 4 doublings => 2^4 = 16 times faster than the chips i got.

Only gpu's seem to speed up, though also goes slower now than a few years ago for progress there.[/QUOTE]

You didn't answer the thread question.

diep 2014-01-06 15:41

You didn't write a questionmark :)

pinhodecarlos 2014-01-06 15:45

[QUOTE=diep;363935]You didn't write a questionmark :)[/QUOTE]

Post your k=69 LLR iteration speed of your machines.

paulunderwood 2014-01-06 16:00

[QUOTE=pinhodecarlos;363796]Paul's 4770K at 3.9 GHz, HT off.

0.975 ms k=5 n=4100030 AVX FFT length 224K
0.977 ms k=5 n=4200010 AVX FFT length 224K
1.050 ms k=5 n=4300000 AVX FFT length 240K
1.042 ms k=5 n=4400024 AVX FFT length 240K

Edit: Paul, correct me if I am wrong. Thank you.[/QUOTE]

Note that this is with all 4 cores loaded and with 2400MHz RAM. :smile:

diep 2014-01-06 16:12

I only have completiontime for all iterations.
LLR graphical shows iteration times
CLLR64 is considerable faster however. It is textmode.
Just prints result in lresults.txt

Machine was only actively used at 8 cores by CLLR64 when it printed result of the prime find here.

Which is pretty much average time of the n's around it that it tested.

69*2^2649939-1 is prime! Time : 5925.793 seconds.

So if i use calculator that makes iteration time 5925793 /2649939 = 2.2362 ms

Really great for 2.5Ghz @ 8 cores Xeon from 2008.

pinhodecarlos 2014-05-04 10:11

Anyone here with some benches on AMD Opteron 6300 series? Does AVX works as on the Intel processors?

pepi37 2014-05-04 11:22

[QUOTE=pinhodecarlos;372605]Anyone here with some benches on AMD Opteron 6300 series? Does AVX works as on the Intel processors?[/QUOTE]

I think, until now there is no successful implementation of AVX on AMD ( or you can read it as there is no increased speed using AVX on any AMD chips)

pinhodecarlos 2014-05-04 11:39

I was in doubt because it is much cheaper to buy an AMD server with 4 processors than one from Intel. Thank you.

kracker 2014-05-04 16:16

Intel is much faster(right now) than AMD, especially with FMA3.

pinhodecarlos 2014-05-04 16:20

But Intel servers costs much more. I was hoping that AMD's were capable of running LLR with avx but it is not the case. I will have to stick with a fleet of desktop cpu's.

Batalov 2014-05-04 20:59

[CODE]Starting Lucas Lehmer Riesel prime test of 179*2^1636808-1
Using AMD K8 FFT length 112K, Pass1=448, Pass2=256
179*2^1636808-1, iteration : 10000 / 1636808 [0.61%]. Time per iteration : 3.301 ms.
@vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220 SE
stepping : 2
cpu MHz : 2800.305
cache size : 1024 KB

Starting Lucas Lehmer Riesel prime test of 179*2^1636808-1
Using Pentium4 FFT length 112K, Pass1=448, Pass2=256
179*2^1636808-1, iteration : 10000 / 1636808 [0.61%]. Time per iteration : 1.298 ms.
@vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 2926.102
cache size : 8192 KB[/CODE]
Opteron 8220 is a [B]very [/B]old series though (~2007).
[URL="http://www.cpubenchmark.net/multi_cpu.html"]X5570[/URL] is not new either (~2009).

P.S. We are talking about multi-CPU servers here.
[QUOTE=pinhodecarlos;372612]I was in doubt because it is much cheaper to buy an AMD server with 4 processors than one from Intel. Thank you.[/QUOTE]
Servers are conservatively clocked and quite expensive.
Tom's got some special deal at some point. Ask him.

kracker 2014-05-04 22:35

Intel i5 4670k at 4.0GHz

[code]
Starting Lucas Lehmer Riesel prime test of 179*2^1636808-1
Using FMA3 FFT length 100K, Pass1=320, Pass2=320
V1 = 4 ; Computing U0...
V1 = 4 ; Computing U0...done.
Starting Lucas-Lehmer loop...
179*2^1636808-1, iteration : 10000 / 1636808 [0.61%]. Time per iteration : 0.335 ms.
179*2^1636808-1, iteration : 20000 / 1636808 [1.22%]. Time per iteration : 0.318 ms.
179*2^1636808-1, iteration : 30000 / 1636808 [1.83%]. Time per iteration : 0.317 ms.
179*2^1636808-1, iteration : 40000 / 1636808 [2.44%]. Time per iteration : 0.317 ms.
179*2^1636808-1, iteration : 50000 / 1636808 [3.05%]. Time per iteration : 0.316 ms.
179*2^1636808-1, iteration : 60000 / 1636808 [3.66%]. Time per iteration : 0.317 ms.
[/code]

fivemack 2014-05-05 07:49

The four-socket Opteron system I have cost me £4344 in April 2011

£1589 for the chassis and SSD
£1075 for sixteen 4G DIMMs
£1680 (special price) for four 1.9GHz Opteron 6168

For comparable performance on GNFS sieving, I'd now need two and a half Haswell 4770 boxes; I would have change from £2000 even if I bought three. And for linear algebra the i7-4930K box is no slower than running 48-way MPI on the Opteron, and much less disruptive (because I have to reboot the Opteron and lose all the state in the jobs on it to make the MPI work at full efficiency).

So I'd not really recommend large servers; you're paying more for the convenience than it's worth. Having 64G of memory available all at once is occasionally very useful, but you can now do that on a single-socket Socket 2011 platform.

pinhodecarlos 2014-05-05 11:58

I really think for the moment the best choice is to have a fleet of Intel's i5 4670k to run some LLR. I would consider AMD servers to help NFS@home in sieving but my priority right now is LLRing.
Thank you guys.

kracker 2014-05-05 14:24

[QUOTE=pinhodecarlos;372685]I really think for the moment the best choice is to have a fleet of Intel's i5 4670k to run some LLR. I would consider AMD servers to help NFS@home in sieving but my priority right now is LLRing.
Thank you guys.[/QUOTE]

If you don't overclock, get the 4670(not the K)

pinhodecarlos 2014-05-05 16:42

1 Attachment(s)
Check this budget.

kracker 2014-05-05 16:49

[QUOTE=pinhodecarlos;372706]Check this budget.[/QUOTE]

Hmm.. I'm assuming it's not just for LLR alone.

Also, it will be AVX, not FMA3 since it is not a Haswell CPU. You may want to wait a few more months for the 8-core Haswells to come out, but all your choice :smile:

pinhodecarlos 2014-05-05 18:04

Checking AMD 6380 specs it says it features:

FMA3 / 3-operand Fused Multiply-Add instructions
FMA4 / 4-operand Fused Multiply-Add instructions
AVX / Advanced Vector Extensions

I still don't understand why it is not worth to use LLR on it.
Just asked a budge for a quad 6380.

Carlos

kracker 2014-05-05 18:16

[QUOTE=pinhodecarlos;372711]Checking AMD 6380 specs it says it features:

FMA3 / 3-operand Fused Multiply-Add instructions
FMA4 / 4-operand Fused Multiply-Add instructions
AVX / Advanced Vector Extensions

I still don't understand why it is not worth to use LLR on it.
Just asked a budge for a quad 6380.

Carlos[/QUOTE]

AMD supports then all. But AMD CPU's suck at them, and are often faster without them.

fivemack 2014-05-05 22:34

[QUOTE=pinhodecarlos;372706]Check this budget.[/QUOTE]

I'm not sure what currency you're working in, and frankly I do not believe the budget. According to Intel's own site, the e5-4620 v2 is $1611 and the e5-4657L v2 is $4394, so the price difference between four of the first and four of the second is not going to be 285.99 of any currency!

pinhodecarlos 2014-05-05 22:35

€ without taxes.

EDIT: Forgot to say that I agree with you, sent the seller an email because I notice that price discrepancy as well but due to timezone only tomorrow when I wake up I will have probably an answer.

pinhodecarlos 2014-05-06 14:15

The values of the budget are correct, the way they present is not well formulated. The value €10.645,20 is what I have to sum to €10.359,21 if I want to upgrade the processor from e5-4620 v2 to e5-4657L v2.

VBCurtis 2014-12-01 07:30

Carlos PM'ed to ask about 5820k benchmarks, thought this thread the proper place to respond:
5820k @ stock settings 3.3Ghz, 4x4GB RAM at stock 2133mhz. Power readings from kill-a-watt at wall plug.
Tests are run on k=45 at 2M, 120k FFT size.
Idle 65W
1x LLR 93W, 0.433 ms/iteration
2x LLR 114W, 0.435 ms/iter
3 through 6 were all the same speed; perhaps a bit of turboclock for 1-2 cores used?
6x LLR 185W, 0.456 ms/iter

I then fired up 6 copies of ECM (B1 = 25M, 169 digit number). LLR slowed to 0.800 ms/iter. Half-speed would be 0.912, so I gain 12-13% total throughput with 6x LLR and 6x ECM. Power use dropped to 180W.

My core2quad at 3.3ghz drew 170W with the same video card (I didn't know 5820 had no onboard video, so grabbed the ancient card from the quad to get it up while I order a real one).

The memory is rated for 2666mhz, and the CPU has a water block, so I'll mess with overclocking and report results when available. I'll also get a n=4M run later, to see if memory saturates on bigger FFTs.

pinhodecarlos 2014-12-01 08:10

Could you turn off HT off and make those tests again? Thank you in advance. Carlos.

VBCurtis 2014-12-02 03:11

[QUOTE=pinhodecarlos;388778]Could you turn off HT off and make those tests again? Thank you in advance. Carlos.[/QUOTE]

I did so, confirmed in top 6 CPUs rather than 12. Timings were either identical, or 0.001 faster. Ubuntu.

Machine does not POST under any overclocking settings, even if I try to slow it down. I may have to wait a while for a BIOS update before posting more interesting timings, though I'll still test bigger FFTs sometime soon.

pinhodecarlos 2014-12-02 09:44

Thank you Curtis. Just keep this thread updated with your benchmarks. Next year I want to buy a new computer. I read somewhere else on this forum that the best option is not going to one 5820k but for two i5-4690K. Still have to digest this (costs vs performance).

VBCurtis 2014-12-07 07:13

320k FFT timings on 5820k, still all stock settings for CPU/memory: 127 @4950k

1x: 1.182 ms
2x: 1.201 ms
3x: 1.255 ms
4x: 1.255 ms
5x: 1.30 ms
6x: 1.36 ms

The 5 and 6 core tests had more variety among the timings, so I rounded to the hundredth.

Memory default is 2133mhz; my set is rated and XMP'ed at 2666, but I can't POST with any alterations to any settings. Does anyone have ideas for some weird setting (gigabyte X99-UD4 board) that interferes with even simple OCing (e.g. XMP profile on)? I would be pleased to find timings not increase for 5 and 6 cores at 5M if the memory were set to XMP profile/rated speed.


All times are UTC. The time now is 22:37.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.