![]() |
|
|
#3202 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152C16 Posts |
Quote:
The ini file says gpusievesize is in units of megaBITS there. If the relevant parameter is maximum pitch, that is ~2047MBYTES. There's a daunting variety of CUDA parameters. Following is GPU-Z 2.25.0's Advanced, CUDA output for an older gpu: Code:
General CUDA Device Name Tesla C2075 Compute Capability 2.0 Processor Count 14 Cores per Processor 32 GPU Clock Rate 1147.0 MHz Memory Clock Rate 1566.0 MHz Memory Bus Width 384 L2 Cache Size 768 KB Global Memory Size 5316 MB Async Engines 2 SP to DP Ratio 1:2 ECC Supported Yes Using TCC Driver Yes Compute Mode Default Multi-GPU Board No (0) PCI ID Bus 6, Dev 0, Domain 0 Threads per Multiprocessor 1536 Max Shmem per Multiprocessor 48 KB Execute Multiple Kernels Yes Preemption Supported No Memory Native Atomic Supported No Unified Address Space Yes Integrated w/ Host Memory No Can map Host Memory Yes Can allocate Managed Memory No Pageable Memory Access No Concurrent Managed Memory Access No Can use Host Memory Pointers No Supports Stream Priorities No Can Cache Globals in L1 Yes Can Cache Locals in L1 Yes Max Block Size 1024 x 1024 x 64 Max # of Threads per Block 1024 Max Shmem per Block 48 KB Max Grid Size 65535 x 65535 x 65535 Max Registers per Block 32768 Max Registers per Block 32768 Total Constant Memory 64 KB Warp Size 32 Threads Maximum Pitch 2097151 KB Texture Alignment 0 KB Surface Alignment 512 Texture Pitch Alignment 32 GPU Overlap Yes Kernel Runtime Limit No Size Constraints 1D Texture Size 65536 1D Layered Texture Size 16384 x 2048 2D Texture Size 65536 x 65535 2D Layered Texture Size 16384 x 16384 x 2048 2D Texture Size Gather 16384 x 16384 3D Texture Size 2048 x 2048 x 2048 3D Texture Size Alt 0 x 0 x 0 Cubemap Texture Size 16384 x 16384 Layered Cubemap Texture Size 32768 x 32768 x 2046 1D Surface Size 65536 1D Layered Surface Size 65536 x 2048 2D Surface Size 65536 x 32768 2D Layered Texture Size 65536 x 65536 x 2048 3D Surface Size 65536 x 32768 x 2048 Cubemap Surface Size 32768 x 32768 Cubemap Layererd Surface Size 32768 x 32768 x 2046 1D Linear Texture Size 134217728 2D Linear Texture Size 65000 x 65000 2D Linear Texture Pitch 1048544 1D Mipmapped Texture Size 16384 2D Mipmapped Texture Size 16384 x 16384 |
|
|
|
|
|
|
#3203 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
5·479 Posts |
I thought I'd share some scripts that users might find useful:
1. For those with multi-GPU systems, start-mfaktc.sh allows multiple mfaktc instances to be launched from a single copy of mfaktc. You'll first need to create a folder called device-<device number> for each device. For example, you'll need to create the sub-folders device-0 through device-3 in your mfaktc folder on a system with four GPUs. You can then run ./start-mfaktc.sh <device number> on launch an instance. If no device number is specified, then the script will simply call mfaktc from its root folder. This script works by creating symbolic links to your mfaktc executable and configuration file. It also uses a lock file to prevent the user from accidentally running more than one instance on the same device. Once the mfaktc instance terminates, the symbolic links and lock file are deleted. You can set the SHARED_SETTINGS variable to false if you want a separate mfaktc.ini file for each GPU. This is useful if you want each device to use specific settings. I'll try to port this script to Windows If I ever get access to a Windows system with multiple GPUs. 2. archive-mfaktc.sh backs up your mfaktc results and is intended for people who use remote systems. You can then use SCP to copy the .zip file to a local computer, etc. Last fiddled with by ixfd64 on 2019-09-26 at 22:40 |
|
|
|
|
|
#3204 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
469210 Posts |
When I run my GTX-2080Ti all alone doing mfaktc TF it runs at 4,300 GhzDays/Day
However if I start up all 8 cores of the corresponding i5-7820x doing RAM intensive P-1 the GPU drops to under 4,000. In other words it drops by more than the entire GDays throughput of the CPU. Is this normal or do is there a GPU setting I simply need to change? Thanks Relevant GPU Config parms: Code:
SieveOnGPU=1 GPUSievePrimes=82486 GPUSieveSize=128 GPUSieveProcessSize=16 |
|
|
|
|
|
#3205 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11×311 Posts |
I see the same thing (with much smaller numbers) with my RX 480 if there's something taking up all cores. I normally let P-1 use 5of6 cores leaving one free for mfakto and general system responsiveness. Even though mfaktx takes up almost no CPU resources, if it doesn't get it immediately when it wants it the throughput seems to suffer.
|
|
|
|
|
|
#3206 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152C16 Posts |
Quote:
Looks like the only parameter you've changed from the initial mfaktc.ini defaults is gpusievesize from 64 to 128. On a gtx1080Ti I found these improved throughput: gpusiveprimes=90000 gpusivesize=128 gpusievprocesssize=32. number of instances running simultaneously on the gpu= 2 added a couple percent itself; 3 instances was better than one but not as much overall throughput as 2 instances. All of that was with prime95 saturating all cpu cores. The effect of multiple instances is stronger the faster the gpu is, but increasing total throughput on gtx1050Ti also. (Presumably it's waiting for the result of a class to be written to display and disk. When two instances are run, providing they are not synchronized, one can keep the gpu busy while another waits.) Others with RTX20xx have found gains of several percent with recompiled versions of mfaktc using gpusievesize up to 2047. |
|
|
|
|
|
|
#3207 | |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
22×3×17×23 Posts |
I've only tried it with P-1 on the CPU.
I don't have the tools (maybe the mental tools) to recompile mfaktc. Larger gpuseiveprimes or gpusievprocesssize seem to have negligible impact for me. Thanks Quote:
|
|
|
|
|
|
|
#3208 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
124548 Posts |
Quote:
or for linux https://www.mersenneforum.org/showpo...postcount=3189 Last fiddled with by kriesel on 2019-09-29 at 20:54 |
|
|
|
|
|
|
#3209 | |
|
May 2011
Orange Park, FL
3×5×59 Posts |
Quote:
|
|
|
|
|
|
|
#3210 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
22×3×17×23 Posts |
Wow too bad I didn't see this months ago.
Throughput changed from 3,900 to 4,500 on my 2080Ti doing TF to 74 in the 4xM ranges. 15% improvement. When would I used the Less Classes version? |
|
|
|
|
|
#3212 |
|
Einyen
Dec 2003
Denmark
315810 Posts |
On my 2080 assignments that take 9.3 seconds with normal mfaktc takes 8.8 seconds with Less Classes, so the cutoff is somewhere near that.
Last fiddled with by ATH on 2019-09-30 at 01:10 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |