mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2018-11-21, 07:26   #23
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

222428 Posts
Default

Hm... that may be so... in spite of the fact that in the last time I didn't do much GIMPS work either... (no time, no resources, real life catching up with me...)
LaurV is online now   Reply With Quote
Old 2019-06-12, 09:14   #24
maxzor
 
Apr 2017

22·5 Posts
Default Is Zen2 still lagging behind intel for GIMPS?

Hello,
From the first arch details :


- Still 4 cores per CCX, two CCX per chiplet, two core chiplets and one IO die per CPU. Two cores on different dies communicate via the IO die.
- Cores support one AVX-256 op per cycle now.
- Memory performance is somewhat capped : Infinity fabric frequency is decoupled from RAM freq and halved from 3733MT/s and up.
- Still dual channel memory.


I had read, that on top of the AVX256 emulation (it took then two cycles), Zen 1/+ was badly starved for memory when using it for GIMPS.

Is that likely to remain the case?
What else is there that I missed that will make a difference with previous Zen generations and current Intel position?
Core topology ? I guess a 8/4 and a 6/6 3900X will not perform the same.


(stuffing here also a general question about gpgpu current situation compared to high-end cpus for GIMPS)


Edit : damn sorry, should have bumped this thread instead, feel free to move this post...

Last fiddled with by maxzor on 2019-06-12 at 09:20
maxzor is offline   Reply With Quote
Old 2019-06-12, 11:41   #25
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

1101000002 Posts
Default

I'd expect a good uplift in performance, but it will depend on the size of test being run.

The beefed up AVX performance should put it near enough at parity with Intel consumer CPUs, but still lagging AVX-512 2 unit implementations.

There are still two 4 core CCX on each 8 core chiplet. Assuming the L3 cache is shared equally with the CCXs then I don't think that would have an impact. Cores don't need to directly talk to each other in this application?

The 32MB L3 cache per chiplet should help tests that can fit inside it. I have no idea what GIMPS tests are at, but for other prime number finding with e.g. LLR/PFGW that will fit in the cache, I hope it is practically ram bandwidth unlimited and cores can show their full potential. If it exceeds the cache, the dual channel ram will probably limit. Indicators are these CPUs will support faster ram more easily than before, but it really could use more channels.

The two chiplet ones (12, 16 core) will be interesting to test out. I have no plans to get one myself. As far as I can tell each chiplet is connected to the IO die at a rate of 32B/cycle, tied to the base ram clock. Overall effect is bandwidth in each direction is comparable to the ram bandwidth (assuming dual channel). If data has to pass between chiplets that could be a choke point.
mackerel is offline   Reply With Quote
Old 2019-06-19, 18:10   #26
maxzor
 
Apr 2017

1416 Posts
Default

Starting from mackerel's numbers and playing with table-corner calculus :

From this slide of AMD presentation : there is indeed 32MB transferred to/from RAM per cycle.
The clockspeed of a 3733MT/s DDR4 RAM is 18xx MHz.
(Note that it might be allowed to be manually set higher, will see...)
So that would be a 90GB/s max bandwidth.

One channel of DDR4-3733 memory should be around 45GB/s, so a dual channel would not bottleneck much more than a quad channel incoming Threadripper?

A 4M FFT an iteration needs 133MB bandwidth. That means we could theoretically reach ~650 iter/s on a 8cores-in-one-chiplet - 1-worker?

Last fiddled with by maxzor on 2019-06-19 at 18:16
maxzor is offline   Reply With Quote
Old 2019-06-21, 09:14   #27
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

41610 Posts
Default

I make dual channel 3733 ram about 58GB/s.

I got some conflicting info elsewhere that I haven't been able to verify. To my understanding, Zen and Zen+ have infinity fabric synced to the ram base clock. I assumed this was same for Zen 2, but the unconfirmed claim I had elsewhere was that it was now tied to the ram effective clock, or effectively double that before.

The main concern I had was that there is going to be an option to run IF at half or full ratio with ram speeds (to allow higher ram speeds). If IF was at half speed relative to ram base clock, it wouldn't have enough bandwidth even for the ram. So it would make sense if it was referenced off the effective ram speed.

We'll just have to wait to get confirmation. Just over 2 weeks to go...
mackerel is offline   Reply With Quote
Old 2019-06-21, 11:52   #28
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

33010 Posts
Default

IF at full ratio up to 3733 MHz RAM speed, half rate after that. See attached marketing slide.
Attached Thumbnails
Click image for larger version

Name:	1560259834803.jpg
Views:	134
Size:	90.6 KB
ID:	20659  
nomead is offline   Reply With Quote
Old 2019-06-21, 12:03   #29
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

25·13 Posts
Default

I have seen that, but it doesn't say what the actual speed is. Only that it changes. Is the reference clock (for full speed) the ram clock or the effective ram speed?
mackerel is offline   Reply With Quote
Old 2019-06-21, 12:15   #30
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

1010010102 Posts
Default

Should be the same as before, RAM clock, i.e. half the effective speed.
nomead is offline   Reply With Quote
Old 2019-06-21, 18:11   #31
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

6408 Posts
Default

Unless I'm missing something or doing something very wrong, that would mean IF bandwidth is lower than ram once you have fast enough ram to hit the low ratio.
mackerel is offline   Reply With Quote
Old 2019-06-22, 07:49   #32
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3×263 Posts
Default

That's the scuttlebutt, one of the major things that could impact a niche like ours. It's plausible to be reality as it likely doesn't affect 99% of use cases so it seems to be a reasonable lever for them to pull.
M344587487 is offline   Reply With Quote
Old 2019-06-22, 10:04   #33
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×29×101 Posts
Default

Doesn't it defeat the point of having faster RAM though?
henryzz is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
RX470 and RX460 announced VictordeHolland GPU Computing 0 2016-07-30 13:05
Intel Xeon D announced VictordeHolland Hardware 7 2015-03-11 23:26
Factoring details mturpin Information & Answers 4 2013-02-08 02:43
Euler (6,2,5) details. Death Math 10 2011-08-03 13:49
Larrabee instruction set announced fivemack Hardware 0 2009-03-25 12:09

All times are UTC. The time now is 15:39.

Sun Apr 18 15:39:04 UTC 2021 up 10 days, 10:19, 1 user, load averages: 1.39, 1.38, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.