mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   128-bit OS'es and GIMPS? (https://www.mersenneforum.org/showthread.php?t=16141)

ixfd64 2011-10-16 02:02

128-bit OS'es and GIMPS?
 
Some sources claim that Windows 8 or Windows 9 will have 128-bit editions. I do know that 64-bit versions of Windows are twice as fast as their 32-bit counterparts when it comes to TF (although not so in other areas). Can the same be said for 128-bit versions when compared to 64-bit versions?

kladner 2011-10-16 02:47

"Can the same be said for 128-bit versions when compared to 64-bit versions? "

I am a dilettante on the level of very simple batch files and shortcut start lines. But I thought that the big advantage of 64 bit over 32 bit is the vastly expanded memory address space. Are there significant commercial software interests (besides Adobe) which would benefit from that next level jump? (Which might drive the OS development?) And would Prime128 be able to add immediate advantage over Prime64? What would the programming/processing advantages be re: 128 v 64?

Christenson 2011-10-16 03:50

The big one would be the wider datapath and the more compact instruction stream...all these programs depend on accessing a lot of memory quickly...

LaurV 2011-10-17 04:03

128 bit OS can't do too much without a suitable 128 bit processor. You can "emulate" 128 bit operations even on 8-bit microcontrollers like Atmel or so (programming such controllers is what I am doing everyday for my job), but that is not real 128 bit "OS". First we need suitable 128 bit hardware, and I don't see much improvements on stuff like LL tests for example, where the double precision floats are more then enough (see discussion on a parallel thread about using specialized hardware to find primes or factors, there is an argument about single precision FFT against double precision FFT, which is very explanatory). Where we would get some speed would be integer calculus, that is for example trial factoring, if the new hardware (OS?) would allow 128 bits hardware multipliers, it could speed the trial factoring by a factor of 3 on the range of over 64 bits (I assume that now is done by a kind of Karatsuba algorithm to square numbers on two 64-bit registers with the result on 4 64-bit registers).

axn 2011-10-17 05:10

[QUOTE=ixfd64;274688]Some sources claim that Windows 8 or Windows 9 will have 128-bit editions.[/QUOTE]

The bit-ness of an OS/CPU is based on _memory_ addressability and not on computation word size. So, no, there won't be any 128-bit OSes for the foreseeable future. Not even the largest super computer requires the full capability of 64-bit addressing, let alone 128-bit.

Now, onto the computation word size. Native 128-bit integer and floating point data types can, in theory, speed up TF and LL (resp). However, AFAIK, none is planned for x86 line.

Jwb52z 2011-10-17 05:22

If 128 bit processing won't help, which is part of my meager and possibly wrong understanding of this thread, is there any other way to speed up the computation other than "throwing" more cores and RAM at an exponent? I keep wondering if there's a way to actually make calculations themselves process faster with current settings, but I am thinking there's probably not one other than the one I mentioned.

Dubslow 2011-10-17 06:28

Errr... axn says [quote]Now, onto the computation word size. Native 128-bit integer and floating point data types can, in theory, speed up TF and LL (resp).[/quote] I believe that's what you mean by "128 bit processing". That means 128 bit processing would help, but axn says it's also not being implemented.

This is different from what the OP was asking about, which is 128-bit memory address sizes. This is not going to happen for a long time (though knowing computers that might be only ~50 years). 64-bit addressing, at its max (which we're not at) allows 16 EB = 16,000,000 TB of memory to be addressed; my computer has more-than-average memory, at 12 GB = .012 TB. therefore, 128-bit OSs and processor architectures will not be around anytime in the next half a century (at least).

LaurV 2011-10-17 06:57

I don't believe OP was asking about memory width. I took it like CPU register size, and since my previous post. It is not uncommon to have X bits external buss and n*X bits internal processing power, and the best example is I8088. When Intel did 8086 (like a full 16 bits toy) it founds out than the new cpu is not really compatible with all existent buses and the people out there want compatibility first of all. I would by a better cpu if I can substitute my old CPU without needing to throw away half of the things in the case. So they downgraded 8086 to 8088, which had 8 bit external bus.

Maybe we will never go to more then 64 bit addressing space, as said before, this is more then enough for how much memory one can have in its box, but "[B]probably[/B]" we will go to higher bit count to increase the speed (see RAID drives, VGA cards with 256 bit bus for memory already in production, etc), and "[B]for sure[/B]" we will see 128 bit (and more!) internal registers in the future. Integer 128 ALU are already implemented in some industrial systems (see Vipa and Siemens processors, they use Virtex, we had the opportunity to manufacture some cards for them).

Again, it is not uncommon to have a narrower external bus and a wider internal ALU. I8088 is the best example.

The question is if this would speed up searching for huge primes. My answer stands: with the current algorithms, most probably not, or not so much. Having a 3x or 4x speed for TF means nothing, you could do one or two bits more. The real gain would come from discovering new algorithms, as somebody said on a [URL="http://www.mersenneforum.org/showpost.php?p=274344&postcount=25"]parallel thread[/URL].

Dubslow 2011-10-17 07:17

Well, even from a memory standpoint, I wouldn't be surprised if it became necessary sometime way off in the future. I'm sure someone in the late 40's/early 50's said "we'll never ever get anywhere close to 4 GB of memory, 32 bits is more than enough" and look at where we are now.

axn 2011-10-17 07:35

[QUOTE=LaurV;274828] The real gain would come from discovering new algorithms, as somebody said on a [URL="http://www.mersenneforum.org/showpost.php?p=274344&postcount=25"]parallel thread[/URL].[/QUOTE]
That one was for factoring. Primality proving is a different kettle of fish. There are good reasons to suspect that LL test will be _the_ fastest test possible for a Mersenne prime.

[QUOTE=LaurV;274828]The question is if this would speed up searching for huge primes. My answer stands: with the current algorithms, most probably not, or not so much. Having a 3x or 4x speed for TF means nothing, you could do one or two bits more.[/quote]
128-bit floating point implemented in hardware could speed up LL testing by 2x. Even implemented in microcode, it could potentially speed it up by 25%.

davieddy 2011-10-17 09:13

PC History
 
[QUOTE=Dubslow;274833]Well, even from a memory standpoint, I wouldn't be surprised if it became necessary sometime way off in the future. I'm sure someone in the late 40's/early 50's said "we'll never ever get anywhere close to 4 GB of memory, 32 bits is more than enough" and look at where we are now.[/QUOTE]

In 1980, 4 MHz and 64 KB were typical. Do you know about "segments"
on Intel machines, used to address 1MB by augmenting 16 bits to 20?

As for 1950 and 32 bits, Turing missed finding M521 because he only
had 1024 bits of memory!


David

LaurV 2011-10-17 09:22

[QUOTE=axn;274837]
128-bit floating point implemented in hardware could speed up LL testing by 2x. Even implemented in microcode, it could potentially speed it up by 25%.[/QUOTE]

You may be right, but I did not see any 128-bit-FPALU in RL yet. Maybe there are, but I don't know of any.

R.D. Silverman 2011-10-17 13:53

[QUOTE=LaurV;274828]I don't believe OP was asking about memory width. I took it like CPU register size, and since my previous post.

<snip>

Maybe we will never go to more then 64 bit addressing space, as said before, this is more then enough for how much memory one can have in its box,

<snip>

The question is if this would speed up searching for huge primes. My answer stands: with the current algorithms, most probably not, or not so much. Having a 3x or 4x speed for TF means nothing, you could do one or two bits more. The real gain would come from discovering new algorithms, as somebody said on a [URL="http://www.mersenneforum.org/showpost.php?p=274344&postcount=25"]parallel thread[/URL].[/QUOTE]

It is so refreshing to read common sense!!!! But I [b]strongly[/b] doubt
whether a faster algorthm than LL exists.

128-bit registers would help somewhat, but would only be a constant factor
speedup. And that can easily be obtained now by additional cores --> no
architectural changes required.

There is no [i]market[/i] today (well, almost no market) for a machine with
128-bit registers. Think about the economics of developing such a
processor.

OTOH, there is a small market for perhaps a 128-bit arithmetic coprocessor
implemented as (say) a PCI card. But if I were to develop such a device,
I certainly would not limit it to 128-bits. I'd build a board level coprocessor
with (say) 1024-bit registers. In fact, when I was at MITRE we designed
and built a VMW-based board (done in prototype wire-wrap using TI DSP
chips) that implemented 1024-bit arithmetic as a coprocessor to a SPARC.
[It was just a research project].

I strongly doubt whether there is sufficient demand to make development
of a commercial MP processor economically viable.

GIMPS has no real practical importance. It will find the next prime when it
finds it. Patience!

chris2be8 2011-10-17 16:25

[QUOTE=Dubslow;274833]Well, even from a memory standpoint, I wouldn't be surprised if it became necessary sometime way off in the future. I'm sure someone in the late 40's/early 50's said &quot;we'll never ever get anywhere close to 4 GB of memory, 32 bits is more than enough&quot; and look at where we are now.[/QUOTE]

When IBM designed the system 360 in the 1960s they said 24 bits would be enough. As a ex systems programmer on the platform's successors I can testify that was *big* a mistake. It was subsequently extended, first to 31 bits (don't ask why not 32), then to 64 bits. But with lots of 24 bit only code still around the complications it caused were a big pain.

Chris K

chris2be8 2011-10-17 16:29

[QUOTE=R.D. Silverman;274870] There is a small market for perhaps a 128-bit arithmetic coprocessor
implemented as (say) a PCI card. But if I were to develop such a device,
I certainly would not limit it to 128-bits. I'd build a board level coprocessor
with (say) 1024-bit registers.[/QUOTE]

The most likely market would be for a public key accelerator doing 1024 or 2048 bit arithmetic. As long as it's not crippled to limit the max data length it would be helpful.

Chris K

R.D. Silverman 2011-10-17 16:41

[QUOTE=chris2be8;274890]The most likely market would be for a public key accelerator doing 1024 or 2048 bit arithmetic. As long as it's not crippled to limit the max data length it would be helpful.

Chris K[/QUOTE]

We already have such hardware. Banks use it. Current performance is more than adequate.

fivemack 2011-10-17 17:14

Moreover, the highest-end hardware security modules used by banks have about one-third the performance of an i7/2600 CPU when doing 1024-bit or 2048-bit RSA (I have a friend who works for a company that makes them).

R.D. Silverman 2011-10-17 17:39

[QUOTE=fivemack;274894]Moreover, the highest-end hardware security modules used by banks have about one-third the performance of an i7/2600 CPU when doing 1024-bit or 2048-bit RSA (I have a friend who works for a company that makes them).[/QUOTE]

"A company"? I presume that you mean IBM.

ixfd64 2011-10-17 17:42

This reminds me, AVX is supposed to support 512- and 1024-bit registers in the future. But whether we will actually have them remains to be seen.

E_tron 2011-10-28 13:24

A modern FPGA might be fast enough to implement the 1024-bit arithmetic coprocessor your thinking about R.D. Silverman. Perhaps we should stick with 64-bit commodity hardware instead :smile: . Commodity hardware is highly optimized and might/probably will outperform a custom design on an FPGA anyway.

fivemack 2011-10-28 14:56

[QUOTE=R.D. Silverman;274898]"A company"? I presume that you mean IBM.[/QUOTE]

nCipher (now Thales) of Cambridge; make 'hardware security modules', maybe more for internet than banking businesses but also sell to (for example) the UK Passport Agency

fivemack 2011-10-28 15:03

[QUOTE=E_tron;276108]A modern FPGA might be fast enough to implement the 1024-bit arithmetic coprocessor your thinking about R.D. Silverman. Perhaps we should stick with 64-bit commodity hardware instead :smile: . Commodity hardware is highly optimized and might/probably will outperform a custom design on an FPGA anyway.[/QUOTE]

I spent quite a while both in amateur and professional contexts contemplating large multipliers on FPGA, and could never get them to be remotely performance-per-pound competitive with Opterons. The individual multipliers are small and slow (25-bit x 18-bit, both signed, feeding a 48-bit accumulator, running at 638MHz), so you need about sixty of them to compete with one Opteron core, and the price of a single FPGA dev board gets you a complete off-the-shelf box with 48 Opteron cores.

Christenson 2011-10-31 22:23

These days, GPUs beat the tar out of FPGAs....
So I'd expect those to have the greatest performance per X....


All times are UTC. The time now is 03:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.