![]() |
Hi Craig,
[QUOTE=nucleon;241859]Speaking of benchmarks... Is there any difference with say a GTX460 when it's connected at x4 or x8 PCIe vs x16? -- Craig[/QUOTE] Each factor candidate (FC) needs 4 bytes transferd to the GPU, all other transfers are so small that we can ignore them safely (e.g. 128 bytes downloaded from the GPU [B]once per class[/B]). :smile: There is some additionally data transfered for starting and query of the GPU kernels but I assume that this isn't much data. So a GPU which is capable of doing 250M FCs per seconds needs 1 GB/sec transferred to the GPU. 250M/sec is a typical value for a GTX 470, I haven't benched a GTX 460 for a while now, I would expect something like 150-180M/sec for a [B]stock[/B] clocked GTX 460 so 600-720MB/sec are needed. Even a PCIe x4 1.x [B]should[/B] be able to handle this but there won't be much headroom. [QUOTE=vsuite;241864]Thanks Oliver I'm considering a 460GTX but I don't know how much work it can do. How fast should I expect please?[/QUOTE] As mentioned above the raw GPU speed is expected 150-180M/sec. The overall performance depends on the CPU, too. Assume e.g. 2/3 the performance of my benchmark I've posted a few posts before this for a 2.5+GHz Core2 or Core iX + a single GTX 460. Oliver |
Hi ckdo,
[QUOTE=ckdo;241950]Have a good laugh at this one: Ideas, anyone? :confused:[/QUOTE] are you sure that you're actually using the 32bit CUDA runtime for the 64bit binary? Perhaps there is a 64bit CUDA runtime installed on your system allready. Rename the 32bit runtime dll and see if there is any difference for the 64bit binary. Oliver |
[QUOTE=TheJudger;241997]Even a PCIe x4 1.x [B]should[/B] be able to handle this but there won't be much headroom.[/QUOTE]Also, an AGP 4x slot should be able to handle this (1066 MB/s), and of course an AGP 8x slot (2133 MB/s), for anyone running AGP cards still.
|
Out of plain out and out curiosity, has anyone thought about taking a GPU and trying to get the last few exponents way down at the beginning up from 58 to 60?
Last I remember hearing mfaktc could not go that low. Is it still the case? Oh, and I know that with the P-1 and ECM that has been done, this would likely be a waste. |
Hi Uncwilly,
[QUOTE=Uncwilly;243011]Out of plain out and out curiosity, has anyone thought about taking a GPU and trying to get the last few exponents way down at the beginning up from 58 to 60? [/QUOTE] How can I request those (on primenet)? Feel free to sent me a PM with those exponents and the current TF progress! :smile: [QUOTE=Uncwilly;243011] Last I remember hearing mfaktc could not go that low. Is it still the case? [/QUOTE] *hmmm* was it ever the case? :grin: Of course it is not optimized for such low factor candidates, but they are possible. For those bitlevels one could write a kernel using 64/128 bit math (2x 32 bit / 4x 32 bit) instead of the current 96/192 bit math. But this is just a possible performance optimization. The selftest contains known factors as low as 30 bit for some 100M digit numbers. :smile: There are some kernels which have an additional lower limit for the size of the factor candidates (while all kernels have an upper limit, of course). The exponents are still limited to 1.000.000 < exp < 2^32. And I have no plans to change this. :yucky: Oliver |
[QUOTE=TheJudger;243014]The exponents are still limited to 1.000.000 < exp < 2^32. And I have no plans to change this. :yucky:[/QUOTE]
They are [URL="http://v5www.mersenne.org/report_factoring_effort/?exp_lo=1&exp_hi=1000000&bits_lo=0&bits_hi=59&txt=1&exassigned=1&B1=Get+Data"]1700-6700[/URL]. That is way below 1*10[SUP]6[/SUP]. That is what I meant originally by "that low". |
D'oh, my fault... whenever I read Uncwilly I think about 100M digits :blush:
Oliver |
[QUOTE=TheJudger;241282]Hi vsuite,
want some benchmarks? Here we go! - stock GTX 470 - 3.5GHz Core i7 - Linux x86_64, CUDA 3.2 [/QUOTE] The GPU speed for my 8600 GT is ~12 M/sec. How fast would be a GTX570 approximately? |
[QUOTE=moebius;243042]The GPU speed for my 8600 GT is ~12 M/sec.[/QUOTE]
Depending on the exponent and GPU kernel, of course. [QUOTE=moebius;243042]How fast would be a GTX570 approximately?[/QUOTE] The GTX 570 has the same architecture as the GTX 470, just more shaders and more clockrate. Raw GPU power scales very well with mfaktc! :smile: GTX 470: 448 cores @ 1215MHz GTX 570: 480 cores @ 1464MHz (480 * 1464) / (448 * 1215) = ~1.29, so a GTX 570 should be ~29% faster than a GTX 470 in mfaktc (raw GPU power). Of course the CPU cores need to feed 29% more candidates at the same time to the GPU. - take a CPU which is 29% faster - reduce SievePrimes (which means less candidates are eliminated on CPU by sieving) For a single instance of mfaktc on a CPU with the same speed as my CPU there should be no difference between a GTX 470 and a GTX 570 because it is limited by CPU totally. With two instances (and same CPU speed) you might see something like 15-20% more throughput on a GTX 570 compared to my system. Oliver |
[QUOTE=TheJudger;243087]
For a single instance of mfaktc on a CPU with the same speed as my CPU there should be no difference between a GTX 470 and a GTX 570 because it is limited by CPU totally. With two instances (and same CPU speed) you might see something like 15-20% more throughput on a GTX 570 compared to my system. [/QUOTE] I meant of course 12 M/sec for 77M exponents 2^65 to 2^66 with CPU Athlon64@2.5 GHz. And what for a Super-processor is at all worthy of a GTX 570. |
[QUOTE=moebius;243089]I meant of course 12 M/sec for 77M exponents 2^65 to 2^66 with CPU Athlon64@2.5 GHz.
And what for a Super-processor is at all worthy of a GTX 570.[/QUOTE] I'd go a AMD x6 and nvidia chipset mboard, and just throw more cores at the problem. -- Craig |
| All times are UTC. The time now is 23:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.