![]() |
[QUOTE=vsuite;240641]When I tested with GPU Caps Viewer, CUDA was said to be supported and some OpenCL, etc graphics demos ran, so I thought everything was ready for CUDA. Turns out I needed to install the CUDA driver (normal graphics driver - my original drivers were graphics-only no compute or PhysX). Then Voila, it worked.
[/QUOTE] Thx, it works now well with GeForce/Ion Driver Release 260, 112MB. and Nvidia Systemtools 89 MB [B] [/B] |
The balance between the amount of work done on the CPU and the amount of work done on the GPU is controlled via SievePrimes in mfaktc.ini. The value can vary between 5000 and 100000 with the larger number doing more sieving on the CPU. However, the default settings in mfaktc.ini will automatically vary SievePrimes until a good balance is achieved. Once you have found it, you can change SievePrimes so that it defaults to this setting. SievePrimesAdjust in mfaktc.ini controls whether this automatic adjustment is enabled.
|
Hi!
[QUOTE=vsuite;240650]Is mfaktc faster on a 64-bit OS than a 32-bit OS and if so, why? Does CUDA operate faster?[/QUOTE] The CPU part (preselection of factor candidates aka sieving) runs faster in 64bit mode, perhaps due to the bigger amount of registers. So for "slow" GPUs there should be no difference. [QUOTE=ATH;240591]I couldn't compile it with Mingw64, but I succeeded with Visual Studio, and it passes all selftests. I tried to remove the line: [CODE]else if(!isprime(exp)) printf("WARNING: exponent is not prime! Ignoring this assignment!\n");[/CODE] but it crashes when I try to factor a composite exponent, specifically M(p[sup]2[/sup]).[/QUOTE] ATH: I'll sent you a PM soon. [QUOTE=vsuite;240641]I bumped the Core and Shader up to 690/1380 (max, range 230-690/460-1380), and it ran mfaktc faster but then rebooted. When I decoupled the Core and Shader clocks, pushing the Core only to 480, and pushing the Shader to 1380, mfaktc ran seemingly proportionally faster. Memory and Core speed did not seem to affect mfaktc speed. Great work. Since mfaktc is GPU, not CPU bound, does the CPU spend a lot of time actively waiting on the slower GPUs? Can mfaktc perform some of the TF on the CPU, if a slower GPUs is being used, to use the wait cycles more efficiently or can the wait be made passive not active, so as to reduce the CPU load from the spin wait?[/QUOTE] Be carefully with overclocking. As mentioned allready the builtin selftest is designed as a software test and not as a hardware test... I don't think that I'll write some code which does the TF computation on CPU, too. Reason: for faster GPUs there is no time for the CPU to do other things than preselection of candidates. This is only a "problem" for some "old" GPUs, a current entrylevel Geforce GT 430 can do something like ~40M/sec (20 times faster than a 8400GS) which keeps a single CPUcore busy easily a higher SievePrimes values. Oliver |
[QUOTE=TheJudger;240934]Be carefully with overclocking. As mentioned allready the builtin selftest is designed as a software test and not as a hardware test...[/QUOTE]
I understand. I was gratified that it reported passing all the tests at the max shader rate. [QUOTE=TheJudger;240934]This is only a "problem" for some "old" GPUs, a current entrylevel Geforce GT 430 can do something like ~40M/sec (20 times faster than a 8400GS) which keeps a single CPUcore busy easily a higher SievePrimes values. Oliver[/QUOTE] I will look for a couple fast GPUs. >100M/s looks nice. Meanwhile on my 8400GS (desktop), with a Q6600 (Core-2 quad-core) at stock 2.4Ghz mfaktc took 66362159 from 1-64 bits in 36m 7.160s at the stock shader rate (918MHz). The rate fluctuated around 3.05M/s. When running at a shader rate of 1380MHz, the rate fluctuated around 4.6M/s. mfaktc took 66362159 from 64 to 65 bits in 22m 1.464s at the max shader rate of 1380MHz, with a rate generally around 5.10M/s. |
Have the GTX 460, 465, 470 and 480 been tested on the same version and similar CPUs?
Can users of these GPUs submit speed results with v.13 (single-threaded mfaktc, and no prime95) for the 66362159 and 100M digit benchmark along with CPU and GPU speeds, thanks. |
Hi vsuite,
want some benchmarks? Here we go! - stock GTX 470 - 3.5GHz Core i7 - Linux x86_64, CUDA 3.2 M66362159, single instance of mfaktc 0.13 defaults except SievePrimes=5000 and SievePrimesAdjust=0 [CODE] 2^1 to 2^64: 1m 7.489s (GPU load ~60%) 2^64 to 2^65: 1m 4.429s (GPU load ~60%) 2^65 to 2^66: 1m 44.854s (GPU load 60-70%) 2^66 to 2^67: 3m 6.242s (GPU load 70-75%) [/CODE] M66362159, [B]two[/B] instances of mfaktc 0.13 defaults except SievePrimes=13000 and SievePrimesAdjust=0 [CODE] 2^1 to 2^64: 1m 38.549s 2^64 to 2^65: 1m 25.648s 2^65 to 2^66: 2m 16.876s 2^66 to 2^67: 4m 8.370s [/CODE] M332192863, [B]two[/B] instances of mfaktc 0.13 defaults except SievePrimes=25000 and SievePrimesAdjust=0 [CODE] 2^65 to 2^66: 57.087s 2^66 to 2^67: 1m 23.543s 2^67 to 2^68: 2m 9.409s ... 2^70 to 2^71: 13m 28.317s [/CODE] When running two instances each one does the full job! Oliver |
Speaking of benchmarks...
Is there any difference with say a GTX460 when it's connected at x4 or x8 PCIe vs x16? -- Craig |
Thanks Oliver
I'm considering a 460GTX but I don't know how much work it can do. How fast should I expect please? |
Just to clarify any potential future misunderstanding, there are three versions of PCIe. I don't know of any motherboards with PCIe 3.0 on yet, but there could well be some.
A x16 slot for PCIe v1.x has the same bandwidth as an x8 slot for PCIe v2.x, and has the same bandwidth as a x4 slot for PCIe v3.0. Basically: v3.0 = 2*v2.x = 4*v1.x So it is not simply a question of knowing how many lanes a slot has (which can be complicated by the fact that some x16 physical slots are only x8 electrical), but also the version of the slot on both the motherboard AND graphics card, and then taking the lowest value. For reference: x16 PCIe 1.x slot has a total bandwidth of 4 GB/s in both directions (250 MB/s per lane bidirectional). x16 PCIe 2.x slot has a total bandwidth of 8 GB/s in both directions (500 MB/s per lane bidirectional). x16 PCIe 3.0 slot has a total bandwidth of 16 GB/s in both directions (1 GB/s per lane bidirectional). Note that's giga[B]bytes[/B] per second, and the units use standard SI prefixes (10^6 and 10^9), for the slight amount it matters. |
64 bit system must use 32 bit dll?
Have a good laugh at this one:
[code] Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. Alle Rechte vorbehalten. C:\Windows\system32>cd \cuda\mfaktc\0.13 C:\CUDA\mfaktc\0.13>mfaktc-win-64.exe mfaktc v0.13-Win Compiletime options THREADS_PER_GRID_MAX 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 VERBOSE_TIMING disabled MORE_CLASSES enabled Runtime options SievePrimes 100000 SievePrimesAdjust 1 NumStreams 3 CPUStreams 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor bitlevel CUDA device info name GeForce GT 220 compute capability 1.2 maximum threads per block 512 number of multiprocessors 6 (48 shader cores) clock rate 1200MHz CUDA version info binary compiled for CUDA 3.10 CUDA driver version 3.10 CUDA runtime version 3.10 Automatic parameters threads per grid 786432 [COLOR=Red]cudaStreamCreate() failed for stream 0[/COLOR] C:\CUDA\mfaktc\0.13>mfaktc-win-32.exe mfaktc v0.13p1-Win <snip!> running a simple selftest... Selftest statistics number of tests 31 successfull tests 31 selftest PASSED! <snip!> C:\CUDA\mfaktc\0.13>[COLOR=Red]ren cudart64_31_9.dll *.___[/COLOR] C:\CUDA\mfaktc\0.13>mfaktc-win-64.exe mfaktc v0.13-Win <snip!> running a simple selftest... Selftest statistics number of tests 31 successfull tests 31 [COLOR=Red]selftest PASSED![/COLOR] <snip!> [/code]Ideas, anyone? :confused: |
[QUOTE=lavalamp;241873]Just to clarify any potential future misunderstanding, there are three versions of PCIe. I don't know of any motherboards with PCIe 3.0 on yet, but there could well be some.
A x16 slot for PCIe v1.x has the same bandwidth as an x8 slot for PCIe v2.x, and has the same bandwidth as a x4 slot for PCIe v3.0. Basically: v3.0 = 2*v2.x = 4*v1.x So it is not simply a question of knowing how many lanes a slot has (which can be complicated by the fact that some x16 physical slots are only x8 electrical), but also the version of the slot on both the motherboard AND graphics card, and then taking the lowest value. For reference: x16 PCIe 1.x slot has a total bandwidth of 4 GB/s in both directions (250 MB/s per lane bidirectional). x16 PCIe 2.x slot has a total bandwidth of 8 GB/s in both directions (500 MB/s per lane bidirectional). x16 PCIe 3.0 slot has a total bandwidth of 16 GB/s in both directions (1 GB/s per lane bidirectional). Note that's giga[B]bytes[/B] per second, and the units use standard SI prefixes (10^6 and 10^9), for the slight amount it matters.[/QUOTE] Yep. Understood. I was assuming current tech in my question. i.e. pcie2 Original question still stands - is there any difference in speed for the program "mfaktc" between having the video card connected at x4, x8 or x16? I don't know enough about the inner workings to make a call. -- Craig |
| All times are UTC. The time now is 23:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.