mersenneforum.org 48-core chip - Intel
 Register FAQ Search Today's Posts Mark Forums Read

2009-12-03, 12:10   #1
hj47

Oct 2008

26 Posts
48-core chip - Intel

Quote:
 The 1.3-billion transistor processor, called Single-chip Cloud Computer (SCC) is successor generation to the 80-core "Polaris" processor that Intel's Tera-scale research project produced in 2007. Unlike that precursor, though, the second-generation model is able to run the standard software of Intel's x86 chips such as its Pentium and Core models.
http://news.cnet.com/8301-30685_3-10...?tag=rtcol;txt

This is nuts, I wonder how it would fare crunching a few LL's .

But then again, nvidia GT300 is just around the corner...

Last fiddled with by hj47 on 2009-12-03 at 12:13

2009-12-18, 20:14   #2
diep

Sep 2006
The Netherlands

2·337 Posts

Quote:
 Originally Posted by hj47 http://news.cnet.com/8301-30685_3-10...?tag=rtcol;txt This is nuts, I wonder how it would fare crunching a few LL's . But then again, nvidia GT300 is just around the corner...
Yes for my chessprogram it's a kick butt cpu, the parallel search is all manual parallellism anyway, so i need a rather small rewrite to do manual cache coherency to get it to work. well it's a big rewrite but real simple in theory.

Say just a month work or so.

However that's integers. AFAIK this chip is not strong in floating point. It shouldn't be IMHO. There is gpu's for that.

3 Tflop that todays gpu's deliver single precision is never going to get beaten by such multicores. These multicores are interesting for stuff like chess which is all 32 bits integers in case of my program.

Also for most factorisation (i should rather say SIEVING or trial-factorisation) these chips are not so interesting as they depend upon a floating point instructions latency in case of x86 which these cpu's might not have at all.

So for LLR nor Mersenne nor GMP nor GIMPS nor gwnum nor any code derived from woltman, it will work on such cpu as it seems now. There is not much known however, so who knows?

Look to GPU's :)

Manycores are unbeatable in all this. Just writing code for them is real fulltime work. No one is funding me to do that and it is really fulltime work. But on paper it's 10x faster handsdown than a quadcore chip. In this case the latest AMD gpu's.

Nvidia has only tesla as interesting card (the gpu's forget it - too many limitations inside the hardware there
to really figure out well).

Last fiddled with by diep on 2009-12-18 at 20:17

2009-12-19, 02:01   #3
lfm

Jul 2006
Calgary

52·17 Posts

Quote:
 Originally Posted by diep However that's integers. AFAIK this chip is not strong in floating point. It shouldn't be IMHO. There is gpu's for that. 3 Tflop that todays gpu's deliver single precision is never going to get beaten by such multicores. These multicores are interesting for stuff like chess which is all 32 bits integers in case of my program.
Never say never. there is no reason why a vector instruction couldn't be run at full memory speed on any processor.

Quote:
 Also for most factorisation (i should rather say SIEVING or trial-factorisation) these chips are not so interesting as they depend upon a floating point instructions latency in case of x86 which these cpu's might not have at all.
Trial factoring doesn't use floats.

2009-12-19, 12:09   #4
diep

Sep 2006
The Netherlands

10101000102 Posts

Quote:
 Originally Posted by lfm Never say never. there is no reason why a vector instruction couldn't be run at full memory speed on any processor. Trial factoring doesn't use floats.
Of course i'm not a hardware engineer. Past few years i have time and again asked why they aren't developing a 128 or 256 core multiprocessor just like we see all those manycores scale so easily further.

The experts all work somewhere so are not really allowed to speak and directors on my chat have the duty to do public information as their companies are on wallstreet or other stock exchanges.

There seems to be a number of problems.

Problem A) cache coherency

Manycores do not suffer from this. This is a very big issue. It is one of the FEW reasons why the 8 core Xeon Nehalem processor (16 "logical" cores using hyperthreading) has been delayed for so many years. Snooping the cache and the cache coherency is a problem.

Manycores do not have this problem. Which cache coherency?
Note this 48 core chip doesn't have cache coherency automatically,
you have to do that manual with messaging. Ok for me.

Yet i'm one of worlds biggest experts on mass parallellizing searches.
How many others are that good that they are prepared to do cache coherency manual?

Hopefully a lot more in future.

Problem B) the yields issue

If i would know yield rates i would need to keep them secret. I do not of course know them. So if i would gamble the yields of AMD and Intel for the common processor quite normal, let's say X.

Now multicores there is a big problem here. It is very difficult to produce correctly those cpu's. Let's take a random number.

Let's say some 4 core chip with say 1 billion transistors delivers 80% yields.
This 80% is a fictional number; it has no reality sense whatsoever. I took it random.

Let's be happy if you get 5% yields.

A 48 core chip with 1 billion transistors then delivers maybe 5%?

Stupid math formula, total irrelevant as reality is worse,
48 / 4 = 12 times the yields.

Think of: 0.8 ^ 12 = 6.8%

So what price do you need to sell it for then including profit. $2k -$5k a cpu?
That's not realistic of course.

See that is a SERIOUS problem. If you can't produce a product cheap enough then you can't see it as a serious product, because the factories cost billions; in short you want to earn billions with a chip. So it HAS to be having some sort of cheap version.

Manycores do not have this problem. Each core that doesn't work, you just turn it off.

Problem C) The Seymour Cray principle: "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?"

If you see my postings in some other forums cheering for this 48 core chip then you will notice that hardware engineers and some IT folks are less enthusiastic. A single core will be a lot slower IPC obviously than from an i7 Nehalem, to give an example. The i7 nehalem has just 4 cores, but it has a latency of 70 nanoseconds to get 8 bytes @ random from a 2 GB memory buffer, which is what the benchmark programmed by me indicates, using 4 cores at the same time doing this (just the blocked read latency in short).

Compare with Phenom2 using DDR3 having 100 ns, and you'll see the main IPC difference between the 2 chips is basically that memory controller for most applications. Yet that already means that most see i7 as 'faster' than the 6 core AMD.

6 core AMD * 2.8Ghz = 16.8Ghz
i7 965 @ 4 x 3.2Ghz = 12.8Ghz

Most software when well
(so objective, when not taking into account business decisions to make it faster for 1 specific manufacturer) compiled doesn't have a big IPC difference intel versus AMD.

What's faster in most benchmarks if i may ask you?

Shall i show you what's faster?

What do you guess latency to RAM is of a single core of a 48 core chip to the RAM, even though it has more RAM channels?

How many programs can work parallel well, scale well and have good speedup?

May i remind you that if they go benchmark GIMPS that they will conclude it's factor 10 SLOWER at a 1024 core chip?

Because they test 1 number at 1 core and it is SLOWER.

This is a laptop os/x macbookpro. It has os/x version 10.4.11 out of june 2007. That is quite recent. My linux machines and windows machines have older windows and linux versions than that.

Yet not a single open source program, last time thati checked, can play video for me at HD very well, as it only works bugfree for 1 core and crashes instantly at more than 1 core.

quicktime works, but doesn't play all MP4's, it just plays just very few, it is very incompatible nowadays.

Now you'll argue about point C that you don't care. Well you SHOULD. Because odds you will have a much better chip than what can get produced cheap for billions of people on this planet is very tiny.

Mainstream software is simply parallel not so exciting. It is total wrong to see prime crunching as standard software.

Last fiddled with by diep on 2009-12-19 at 12:12

2009-12-19, 16:04   #5
diep

Sep 2006
The Netherlands

10101000102 Posts

Quote:
 Originally Posted by lfm Never say never. there is no reason why a vector instruction couldn't be run at full memory speed on any processor. Trial factoring doesn't use floats.
Actually on the question of systematic sieving: there is a handy x86 80 bits floating point instruction to determine whether it divides using inverse calculation for most sieving codes.

It is by far fastest on AMD, so i use k7's and k8's therefore to run trial factoring / sieving, as the k7's don't have SSE2 and the k8's the SSE2 is rather slow compared to todays cpu's.

Vincent

Last fiddled with by diep on 2009-12-19 at 16:09

2009-12-19, 16:11   #6
diep

Sep 2006
The Netherlands

2×337 Posts

Quote:
 Originally Posted by hj47 http://news.cnet.com/8301-30685_3-10...?tag=rtcol;txt This is nuts, I wonder how it would fare crunching a few LL's . But then again, nvidia GT300 is just around the corner...
Small note there is a huge difference between these 2 cpu's.

The 80 core cpu you can directly throw away, as it is a floating point chip and doesn't do integers. Trying to compete heads on with gpu's in short. The 48 core cpu has integer cores. It's not clear whether it even has any floating point and unclear is whether the cores are 32 bits or 64 bits.

I hope it doesn't have any floating point in fact :)

 2009-12-20, 13:59 #7 joblack     Oct 2008 n00bville 52·29 Posts I would really like a Nehalem EX or a double AMD Istanbul (2 x 6 cores). That's enough for the first time.

 Similar Threads Thread Thread Starter Forum Replies Last Post tServo Hardware 12 2016-06-22 22:04 kladner Hardware 6 2013-06-07 10:09 S485122 Software 0 2007-05-13 09:15 TehPenguin Hardware 10 2006-11-27 11:13 drew Hardware 5 2006-05-29 07:00

All times are UTC. The time now is 06:57.

Thu Jul 2 06:57:01 UTC 2020 up 99 days, 4:30, 0 users, load averages: 1.32, 1.05, 1.07