mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2019-01-13, 15:43   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

579910 Posts
Default PrimeNet API

This is a reference thread. Post any comments in the reference discussion thread https://www.mersenneforum.org/showthread.php?t=23383, not here.

Posts in this thread:
  1. The PrimeNet API, with documentation notes (remainder of this post)
  2. Sample GPU device parameters https://www.mersenneforum.org/showpo...45&postcount=2
  3. Draft Extension to PrimeNet Server Web API to support GPU applications https://www.mersenneforum.org/showpo...54&postcount=3
  4. Constraints imposed by server hardware https://www.mersenneforum.org/showpo...21&postcount=4
(an earlier version of this post appeared at https://www.mersenneforum.org/showth...768#post505768)

The PrimeNet API is documented at http://v5.mersenne.org/v5design/v5webAPI_0.97.html
It's indicated as Kurowski/Woltman 11/14/2007 RELEASE CANDIDATE 0.97(c)
That's from before the first GPU-based GIMPS software, originated in 2009, and long before the addition of the Jacobi check, or the discovery of the Gerbicz check and rollout of PRP3 code to capitalize on it.

Per manual testing in prime95 v30.6b4, these characters are permitted unchanged in cpu names:
-._a-zA-Z0-9
(upper and lower case letters, digits 0 to 9, dash, period, underscore, and case is preserved)

These map to underscore: _ space / ? ; : , \ | < > ' ` ~ ! @ # $ % ^ & * ( ) = + { } [ ]

This is more restrictive than indicated in 3.1, but slightly broader than given for strings in 2.0:
"For non-message identifiers such as userIDs ('A'-'Z', 'a'-'z', '0'-'9', '_', '-') are considered valid characters;"

In various sections, I noticed several things. I think they mean the document could use an update to incorporate some extensions already implemented. By section:

5.6.5.1.1 GIMPS Request Parameters
The stage permitted work_unit_values description does not include 'PRP' related work types.

5.7.5.1.1 GIMPS Request Parameters
Error counts field is not described, for what types of errors are counted and encoded, in what character positions, to what limits, etc.

6.0 PrimeNet API Result Error codes
The listing in prime95 v30.4b8's primenet.h is somewhat larger than what's in the API document.
Code:
/* Error codes returned to client */

#define PRIMENET_ERROR_OK        0    /* no error */
#define PRIMENET_ERROR_SERVER_BUSY    3    /* server is too busy now */
#define PRIMENET_ERROR_INVALID_VERSION    4
#define PRIMENET_ERROR_INVALID_TRANSACTION 5
#define PRIMENET_ERROR_INVALID_PARAMETER 7
#define PRIMENET_ERROR_ACCESS_DENIED    9
#define PRIMENET_ERROR_DATABASE_CORRUPT    11
#define PRIMENET_ERROR_DATABASE_FULL_OR_BROKEN    13
#define PRIMENET_ERROR_INVALID_USER    21
#define PRIMENET_ERROR_UNREGISTERED_CPU    30
#define PRIMENET_ERROR_OBSOLETE_CLIENT    31
#define PRIMENET_ERROR_STALE_CPU_INFO    32
#define PRIMENET_ERROR_CPU_IDENTITY_MISMATCH 33
#define PRIMENET_ERROR_CPU_CONFIGURATION_MISMATCH 34
#define PRIMENET_ERROR_NO_ASSIGNMENT    40
#define PRIMENET_ERROR_INVALID_ASSIGNMENT_KEY 43
#define PRIMENET_ERROR_INVALID_ASSIGNMENT_TYPE 44
#define PRIMENET_ERROR_INVALID_RESULT_TYPE 45
#define PRIMENET_ERROR_INVALID_WORK_TYPE 46
#define PRIMENET_ERROR_WORK_NO_LONGER_NEEDED 47

/* These error codes are not returned by the server but are generated */
/* by the client code that communicates with the server. */

#define PRIMENET_NO_ERROR        0
#define PRIMENET_FIRST_INTERNAL_ERROR    1000
#define PRIMENET_ERROR_CONNECT_FAILED    1000
#define PRIMENET_ERROR_SEND_FAILED    1001
#define PRIMENET_ERROR_RECV_FAILED    1002
#define PRIMENET_ERROR_SERVER_UNSPEC    1003
#define PRIMENET_ERROR_PNERRORRESULT    1004
#define PRIMENET_ERROR_PNERRORDETAIL    1005
#define PRIMENET_ERROR_CURL_INIT    1100
#define PRIMENET_ERROR_CURL_PERFORM    1101
#define PRIMENET_ERROR_MODEM_OFF    2000
7.2 GIMPS Work Types
does not include PRP related work types PRP, PRP-DC, PRP-CF, PRP-CF-D, P+1, Cert etc.
from prime95 v30.4b8's primenet.h:
Code:
/* Valid work_preference values */
#define PRIMENET_WP_WHATEVER        0    /* Whatever makes most sense */
#define PRIMENET_WP_FACTOR_LMH        1    /* Factor big numbers to low limits */
#define PRIMENET_WP_FACTOR        2    /* Trial factoring */
#define PRIMENET_WP_PMINUS1        3    /* P-1 of small Mersennes --- not supported */
#define PRIMENET_WP_PFACTOR        4    /* P-1 of large Mersennes */
#define PRIMENET_WP_ECM_SMALL        5    /* ECM of small Mersennes looking for first factors */
#define PRIMENET_WP_ECM_FERMAT        6    /* ECM of Fermat numbers */
#define PRIMENET_WP_ECM_CUNNINGHAM    7    /* ECM of Cunningham numbers --- not supported */
#define PRIMENET_WP_ECM_COFACTOR    8    /* ECM of Mersenne cofactors */
#define PRIMENET_WP_LL_FIRST        100    /* LL first time tests */
#define PRIMENET_WP_LL_DBLCHK        101    /* LL double checks */
#define PRIMENET_WP_LL_WORLD_RECORD    102    /* LL test of world record Mersenne */
#define PRIMENET_WP_LL_100M        104    /* LL 100 million digit */
#define PRIMENET_WP_PRP_FIRST        150    /* PRP test of big Mersennes */
#define PRIMENET_WP_PRP_DBLCHK        151    /* PRP double checks */
#define PRIMENET_WP_PRP_WORLD_RECORD    152    /* PRP test of world record Mersennes */
#define PRIMENET_WP_PRP_100M        153    /* PRP test of 100M digit Mersennes */
#define PRIMENET_WP_PRP_COFACTOR    160    /* PRP test of Mersenne cofactors */
#define PRIMENET_WP_PRP_COFACTOR_DBLCHK    161    /* PRP double check of Mersenne cofactors */
#define PRIMENET_WORK_TYPE_CERT        200   /* Certification of correct proof of PRP test */ (dropped in v30.6b4)

  /* Obsolete work preferences */
#define PRIMENET_WP_LL_10M        103    /* LL 10 million digit --- no longer supported */
#define PRIMENET_WP_LL_FIRST_NOFAC    105    /* LL first time tests, no trial factoring or P-1 factoring -- superfluous */
Per https://mersenneforum.org/showpost.p...4&postcount=45 and related, 155 is PRP&proof for LL DC candidates, not present in prime95/mprime 30.4b8 code, but present in v30.6b4.
Code:
#define PRIMENET_WP_PRP_NO_PMINUS1    154    /* PRP test that if possible also needs P-1 */
#define PRIMENET_WP_PRP_DC_PROOF    155    /* PRP double-check where a proof will be produced */
Also from prime95 v30.6b4's primenet.h:
Code:
/* Valid work_types returned by ga */
#define PRIMENET_WORK_TYPE_FACTOR    2
#define PRIMENET_WORK_TYPE_PMINUS1    3
#define PRIMENET_WORK_TYPE_PFACTOR    4
#define PRIMENET_WORK_TYPE_ECM        5
#define PRIMENET_WORK_TYPE_PPLUS1    6        /* Not yet supported by the server */
#define PRIMENET_WORK_TYPE_FIRST_LL    100
#define PRIMENET_WORK_TYPE_DBLCHK    101
#define PRIMENET_WORK_TYPE_PRP        150
#define PRIMENET_WORK_TYPE_CERT        200
7.3 GIMPS Work Preferences
does not include PRP related work types etc.

8.2 GIMPS Result Types
does not include PRP related work types etc.
Again from prime95 v30.4b8's primenet.h:
Code:
/* This structure is passed for the ar - Assignment Result call */

#define PRIMENET_AR_NO_RESULT    0    /* No result, just sending done msg */
#define PRIMENET_AR_TF_FACTOR    1    /* Trial factoring, factor found */
#define PRIMENET_AR_P1_FACTOR    2    /* P-1, factor found */
#define PRIMENET_AR_ECM_FACTOR    3    /* ECM, factor found */
#define PRIMENET_AR_TF_NOFACTOR    4    /* Trial Factoring no factor found */
#define PRIMENET_AR_P1_NOFACTOR    5    /* P-1 Factoring no factor found */
#define PRIMENET_AR_ECM_NOFACTOR 6    /* ECM Factoring no factor found */
#define PRIMENET_AR_LL_RESULT    100    /* LL result, not prime */
#define PRIMENET_AR_LL_PRIME    101    /* LL result, Mersenne prime */
#define PRIMENET_AR_PRP_RESULT    150    /* PRP result, not prime */
#define PRIMENET_AR_PRP_PRIME    151    /* PRP result, probably prime */
#define PRIMENET_AR_CERT    200    /* Certification result */

// There are (at least) 5 PRP residue types for testing N=(k*b^n+c)/d:
#define    PRIMNET_PRP_TYPE_FERMAT        1    // Fermat PRP.  Calculate a^(N-1) mod N.  PRP if result = 1
#define    PRIMNET_PRP_TYPE_SPRP        2    // SPRP variant.  Calculate a^((N-1)/2) mod N.  PRP if result = +/-1
#define    PRIMNET_PRP_TYPE_FERMAT_VAR    3    // Type 1 variant,b=2,d=1. Calculate a^(N-c) mod N.  PRP if result = a^-(c-1)
#define    PRIMNET_PRP_TYPE_SPRP_VAR    4    // Type 2 variant,b=2,d=1. Calculate a^((N-c)/2) mod N.  PRP if result = +/-a^-((c-1)/2)
#define    PRIMNET_PRP_TYPE_COFACTOR    5    // Cofactor variant.  Calculate a^(N*d-1) mod N*d.  PRP if result = a^(d-1) mod N
// Primenet encourages programs to return type 1 PRP residues as that has been the standard for prime95, PFGW, LLR for many years.
V30.6b4's primenet.h also includes:
Code:
#define PRIMENET_AR_PP1_FACTOR    7    /* P+1, factor found */
#define PRIMENET_AR_PP1_NOFACTOR 8    /* P+1 factoring no factor found */
8.2 has a defined result_type 0, so
"GIMPS result_type values include the range 1-256:" probably should say 0 not 1.

Assignment result_type values 257-1023 appear to be undefined; neither reserved for future purposes, nor excluded, nor part of the assignable result type range.

There's nothing there in the API document online about the new longer 2048-bit residues either. Nor the CERT work type added in prime95 v30.

Is there some other document that contains errata/addenda or clarifications?

https://www.mersenne.org/worktypes/ has a list of work types but not the corresponding numerical codes.

Looking around some, I found:
https://www.mersenneforum.org/showth...048#post499204 (extending result text to 2048 characters. it's not clear without further digging, what this means in detail)
https://www.mersenneforum.org/showpo...&postcount=609
https://www.mersenneforum.org/showth...net#post469202 Preda and Prime95 re extending gpuOwL to use the PrimeNet interface. Preda subsequently created a primenet.py script to automate manual web communication for work assignment and result reporting, similar to most other GIMPS client management software.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-12 at 09:25 Reason: added cpu name constraints; updated for prime95 v30.6b4
kriesel is online now  
Old 2019-01-14, 06:09   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10110101001112 Posts
Default Sample gpu device parameters

GPU properties and the Primenet API's cpu-oriented hardware description are not a good match.

Gpus are designed to be massively parallel. The NVIDIA GTX1070 specifications say it has 1920 cores. (https://www.geforce.com/hardware/des...specifications) Following is what a GTX 1070's OpenCl parameters look like, as displayed by GPU-Z:
Code:
General
Platform Name    NVIDIA CUDA
Platform Vendor    NVIDIA Corporation
Platform Profile    FULL_PROFILE
Platform Version    OpenCL 1.2 CUDA 8.0.0
Vendor    NVIDIA Corporation
Device Name    GeForce GTX 1070
Version    OpenCL 1.2 CUDA
Driver Version    378.66
C Version    OpenCL C 1.2 
Profile    FULL_PROFILE
Global Memory Size    8192 MB
Clock Frequency    1708 MHz
Compute Units    15
Device Available    Yes
Compiler Available    Yes
Linker Available    Yes
Preferred Synchronization    Device
CMD Queue Properties    Out of Order, Profiling
SVM Capabilities    Coarse
DP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
SP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Half FP Capability    None
Address Bits    64
Preferred On-Device Queue    256 KB
Global Memory Cache    240 KB (RW Cache)
Global Memory Cacheline    0 KB
Local Memory    Local (48 KB)
Memory Alignment    4096 bits
Built-in Kernels    
Little Endian    Yes
Error Correction    No
Execution Capability    Kernel
Unified Memory    No
Image Support    Yes

Limits
Max Device Events    2048
Max Device Queues    4
Max On-Device Queue    256 KB
Max Memory Allocation    2048 MB
Max Constant Buffer    64 KB
Max Constant Args    9
Max Read Image Args    256
Max Write Image Args    16
Max Samplers    32
Max Work Item Dims    3
Max Write Image Args    16

Native Vectors
Native Vector Width (CHAR)    1
Native Vector Width (SHORT)    1
Native Vector Width (INT)    1
Native Vector Width (LONG)    1
Native Vector Width (FLOAT)    1
Native Vector Width (DOUBLE)    1
Native Vector Width (HALF)    N/A
Preferred Vector Width (CHAR)    1
Preferred Vector Width (SHORT)    1
Preferred Vector Width (INT)    1
Preferred Vector Width (LONG)    1
Preferred Vector Width (FLOAT)    1
Preferred Vector Width (DOUBLE)    1
Preferred Vector Width (HALF)    N/A
Here's what CUDALucas lists in its info output:
Code:
------- DEVICE 0 -------
name                GeForce GTX 1070
UUID                **64-bit only on Windows**
ECC Support?        Disabled
Compatibility       6.1
clockRate (MHz)     1708
memClockRate (MHz)  4004
totalGlobalMem      4294967295
totalConstMem       65536
l2CacheSize         2097152
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 15
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    512
deviceOverlap       1
pciDeviceID         0
pciBusID            3
Here's the CUDAPm1 equivalent, similar but not identical:
Code:
name                GeForce GTX 1070
Compatibility       6.1
clockRate (MHz)     1708
memClockRate (MHz)  4004
totalGlobalMem      8589934592
totalConstMem       65536
l2CacheSize         2097152
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 15
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    512
deviceOverlap       1
Here's the mfaktc equivalent:
Code:
CUDA device info
  name                      GeForce GTX 1070
  compute capability        6.1
  max threads per block     1024
  max shared memory per MP  98304 byte
  number of multiprocessors 15
  clock rate (CUDA cores)   1708MHz
  memory clock rate:        4004MHz
  memory bus width:         256 bit
Here's the CUDA oriented output of GPU-Z for a GTX1080Ti. Note the 28 processors, 128 cores per processor.
Code:
General
CUDA Device Name    GeForce GTX 1080 Ti
Compute Capability    6.1
Processor Count    28
Cores per Processor    128
GPU Clock Rate    1620.0 MHz
Memory Clock Rate    5505.0 MHz
Memory Bus Width    352
L2 Cache Size    2816 KB
Global Memory Size    11264 MB
Async Engines    2
SP to DP Ratio    1:32
ECC Supported    No
Using TCC Driver    No
Compute Mode    Default
Multi-GPU Board    No (0)
PCI ID    Bus 3, Dev 0, Domain 0
Threads per Multiprocessor    2048
Max Shmem per Multiprocessor    96 KB
Execute Multiple Kernels    Yes
Preemption Supported    No

Memory
Native Atomic Supported    No
Unified Address Space    Yes
Integrated w/ Host Memory    No
Can map Host Memory    Yes
Can allocate Managed Memory    Yes
Pageable Memory Access    No
Concurrent Managed Memory Access    No
Can use Host Memory Pointers    No
Supports Stream Priorities    Yes
Can Cache Globals in L1    Yes
Can Cache Locals in L1    Yes
Max Block Size    1024 x 1024 x 64
Max # of Threads per Block    1024
Max Shmem per Block    48 KB
Max Grid Size    2147483647 x 65535 x 65535
Max Registers per Block    65536
Max Registers per Block    65536
Total Constant Memory    64 KB
Warp Size    32 Threads
Maximum Pitch    2097151 KB
Texture Alignment    0 KB
Surface Alignment    512
Texture Pitch Alignment    32
GPU Overlap    Yes
Kernel Runtime Limit    Yes

Size Constraints
1D Texture Size    131072
1D Layered Texture Size    32768 x 2048
2D Texture Size    131072 x 65536
2D Layered Texture Size    32768 x 32768 x 2048
2D Texture Size Gather    32768 x 32768
3D Texture Size    16384 x 16384 x 16384
3D Texture Size Alt    8192 x 8192 x 32768
Cubemap Texture Size    32768 x 32768
Layered Cubemap Texture Size    32768 x 32768 x 2046
1D Surface Size    32768
1D Layered Surface Size    32768 x 2048
2D Surface Size    131072 x 65536
2D Layered Texture Size    32768 x 32768 x 2048
3D Surface Size    16384 x 16384 x 16384
Cubemap Surface Size    32768 x 32768
Cubemap Layererd Surface Size    32768 x 32768 x 2046
1D Linear Texture Size    134217728
2D Linear Texture Size    131072 x 65000
2D Linear Texture Pitch    2097120
1D Mipmapped Texture Size    16384
2D Mipmapped Texture Size    32768 x 32768
The GTX1050Ti, as described by various sources:
Code:
GTX 1050 Ti
Cores 768
Base clock 1290
boost clock 1392
Vram 4096 MB
source: https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1050/

ROPs/TMUs 32/48
Shaders 768
base clock 1354
boost clock 1468

OpenCl
compute units: 6
max on-device queue 256 KB
max constant buffer 64 KB

CUDA
cores per processor 128
L2 cache size 1024 KB
global memory 4096 MB
SP to DP ratio 1:32
Threads per multiprocessor 2048
max shared mem per multiprocessor 96 KB
max shared mem per block 48 KB
total constant memory 64 KB
max registers per block 65536
source: gpu-z v2.16.0 output

name                GeForce GTX 1050 Ti
Compatibility       6.1
clockRate (MHz)     1468
memClockRate (MHz)  3504
totalGlobalMem      4294967296
totalConstMem       65536
l2CacheSize         1048576
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 6
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    512
deviceOverlap       1
source: cudapm1 -info at program start
At the low /old / slow end, the Quadro 2000 has "only" 192 cores.
Code:
General
CUDA Device Name    Quadro 2000
Compute Capability    2.1
Processor Count    4
Cores per Processor    48
GPU Clock Rate    1251.0 MHz
Memory Clock Rate    1304.0 MHz
Memory Bus Width    128
L2 Cache Size    256 KB
Global Memory Size    1024 MB
Async Engines    1
SP to DP Ratio    1:12
ECC Supported    No
Using TCC Driver    No
Compute Mode    Default
Multi-GPU Board    No (1)
PCI ID    Bus 2, Dev 0, Domain 0
Threads per Multiprocessor    1536
Max Shmem per Multiprocessor    48 KB
Execute Multiple Kernels    Yes
Preemption Supported    No

Memory
Native Atomic Supported    No
Unified Address Space    Yes
Integrated w/ Host Memory    No
Can map Host Memory    Yes
Can allocate Managed Memory    No
Pageable Memory Access    No
Concurrent Managed Memory Access    No
Can use Host Memory Pointers    No
Supports Stream Priorities    No
Can Cache Globals in L1    Yes
Can Cache Locals in L1    Yes
Max Block Size    1024 x 1024 x 64
Max # of Threads per Block    1024
Max Shmem per Block    48 KB
Max Grid Size    65535 x 65535 x 65535
Max Registers per Block    32768
Max Registers per Block    32768
Total Constant Memory    64 KB
Warp Size    32 Threads
Maximum Pitch    2097151 KB
Texture Alignment    0 KB
Surface Alignment    512
Texture Pitch Alignment    32
GPU Overlap    Yes
Kernel Runtime Limit    Yes

Size Constraints
1D Texture Size    65536
1D Layered Texture Size    16384 x 2048
2D Texture Size    65536 x 65535
2D Layered Texture Size    16384 x 16384 x 2048
2D Texture Size Gather    16384 x 16384
3D Texture Size    2048 x 2048 x 2048
3D Texture Size Alt    0 x 0 x 0
Cubemap Texture Size    16384 x 16384
Layered Cubemap Texture Size    32768 x 32768 x 2046
1D Surface Size    65536
1D Layered Surface Size    65536 x 2048
2D Surface Size    65536 x 32768
2D Layered Texture Size    65536 x 65536 x 2048
3D Surface Size    65536 x 32768 x 2048
Cubemap Surface Size    32768 x 32768
Cubemap Layered Surface Size    32768 x 32768 x 2046
1D Linear Texture Size    134217728
2D Linear Texture Size    65000 x 65000
2D Linear Texture Pitch    1048544
Gpus don't have the L1, L2, and L3 cache plus system memory model of conventional cpus. (Nor do some very old cpus have L3 cache.) GPU applications and their writers and compilers deal with dual disparate models. In addition to the cpu side, there's a set of 6 types of gpu memory to consider. http://www.eng.utah.edu/~cs5610/lect...ure%20CUDA.pdf
And that the gpu and cpu/system realms are separated by a PCIe bottleneck.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-19 at 06:54
kriesel is online now  
Old 2019-02-23, 20:28   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·1,933 Posts
Default Draft Extension to PrimeNet Server Web API to support GPU applications

The current PrimeNet API specification does not address GIMPS gpu computing.
The specification at http://v5.mersenne.org/v5design/v5webAPI_0.97.html does not contain the text string "gpu". It is CPU oriented currently. That's fine, it's served the project well on the CPU(s) side for many years.

The existing common parameters in PrimeNet API v5 are CPU-oriented and not GPU-oriented.
Many GPU-centric GIMPS applications use the CPU for certain essential operations, such as Gerbicz check or GCD in Gpuowl PRP or PRP-1, GCD in CUDAPm1, and in all applications for launch, checkpoint save, console or log file output, etc. In almost all cases, GPU applications are single threaded using a single CPU core currently.
In some of these operations, host system parameters such as CPU type, core count, clock rate, cache, system ram amount, etc. may determine performance or limit capability. So it is useful for GPU applications to have the CPU side characterized.

Therefore, GPU related parameters probably should not be overlaid into existing CPU oriented parameters, but constitute extensions of the PrimeNet API.

For backward compatibility to existing CPU-only applications, null entries or omission of GPU-related parameter names should be legal and accepted. (Examples: prime95, mprime and other variants; potentially Mlucas; any other CPU-only application.)

This draft extension initially assumes applications employ a single gGPU, so count and similarity are implied. (This assumption might be invalidated by some ambitious GPU programmer in the future!)

Note, all or almost all GPU programs in current use are open source and so would be "untrusted client programs".

Extension parameters by analogy to the existing CPU oriented parameters:

(reserved words for future multiple GPU expansion:
gpu-count total number of GPUs employed by this application. Implied one if gpu-count is absent but gpu-model is present. Precedes all gpu description entries if present.
gpu-block n n=1...last identifies beginning of description of GPU n. Expect gpu-model next. Omit when gpu-count 1 or absent.
gpu-end n identifies end of description block of GPU n. Omit when gpu-count 1 or absent.
So, a 2-GPU description would consist of:
gpu-count 2
gpu-block 1
set of gpu 1 descriptors
gpu-end 1
gpu-block 2
set of gpu 2 descriptors
gpu-end 2)

gpu-model (mandatory for GPU application)
GPU model string, with same formatting rules as for CPU.
PrimeNet server must accept manufacturers AMD, Intel, and NVIDIA.
PrimeNet server must accept GPU models currently in production or in common use.
For example, the AMD Vega or Radeon VII; the Intel HD or UHD IGPs; NVIDIA GTX 10x0 series with or without the Ti suffix, FE, or 3GB; the RTX20x0 series;
The PrimeNet server should accept legacy NVIDIA models such as the Quadro series or early GTX (460, 480, etc.)
The PrimeNet server should accept Intel IGP models (HD4600, HD530, HD620, UHD 630, etc.)
The PrimeNet server should accept legacy AMD/ATI models such as RX480, RX580, RX550.
The client application should only submit legal models for the specific application.
For example, AMD models are legal for gpuowl, Cllucas and Mfakto, and not legal for CUDA applications CUDALucas, CUDAPm1, Mfaktc.
Intel IGPs are legal for mfakto.
NVIDIA GPUs are legal for CUDA applications but not legal for Cllucas or Mfakto. Some NVIDIA GPUs are legal for some versions of Gpuowl.
The PrimeNet server may reject input from GPU models not legal for the specific application.

gpu-regmem (optional for gpu applications)
units in count per thread

gpu-sharedmem (optional for gpu application)
units in Kbytes. roughly analogous to cpu L1 cache

gpu-localmem (optional for gpu applications)

gpu-globalmem (optional for gpu application)

gpu-constmem (optional for gpu application)

gpu-texturemem (optional for gpu application)

gpu-vram (mandatory for gpu application)
analogous to m, and in same format

gpu-clock (optional for gpu application)
gpu core clock in Mhz

gpu-ncores (optional)
number of cores (Maybe this should be omitted and the entire GPU treated as a single computing resource, managed by the GPU application. We certainly do not want to mislead PrimeNet that it could make hundreds or thousands of assignments, one to run on each GPU core of a single GPU.)

(Is a gpu-f, analogous to f CPU features string, useful?)

Other things being equal, I would favor using the fewest of the optional parameters that gets the job at hand done.

All existing GPU applications known to me at this time are one worker, one GPU per executable instance.
They can select a GPU device number chosen from among the installed compatible GPUs in the system by the user and set on the command line or in an ini file.
Instance naming should accommodate all of the following; system name unique to the user account; GPU model designation; and unique GPU identifier within the given system. (One system might have multiple GPUs and a mix of models and possibly more than one of the same model installed.)
For example, in system condorella, I had an RX480 and multiple RX550s. They could be uniquely identified within the GIMPS project as user kriesel, computername condorella-RX480, condorella-RX550_1, condorella-RX550_2, etc.

PrimeNet shall support multiple GPU computing instances simultaneously per system, including disparate application and computation types running simultaneously on separate GPUs on the same system, including same or different application types on same or different GPU models.

Some of the PrimeNet API may need to remain unimplemented for extended periods in the GPU side. Benchmarking could prove problematic, since it is application specific in some cases and not an explicit part of the program in mfaktx or gpuowl, but more of a byproduct of normal running.

Work type preference in the API is really more of a worktype requirement for most applications which perform only a single type of computation: CUDALucas LL testing; CUDAPM1 P-1 testing; mfaktx TF; gpuowl PRP or P-1 (unless it is an old version of gpuowl, LL or TF). GPU applications should immediately reject and unreserve unsuitable assignments that are not within their capabilities. After a set limit of consecutive inappropriate assignments they should probably halt or revert to manual operation.

Note that some assignments while legal are beyond the application's capabilities. CUDAPm1 is rife with these, and while they are to some extent gpu-resources-dependent, even a GTX 1080 Ti with CUDAPm1 is not capable of running a broad range of exponents and bounds due to application crashes or other errors.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-05-30 at 17:03
kriesel is online now  
Old 2021-08-28, 13:35   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Constraints imposed by server hardware

The PrimeNet API can handle the full 1G range of mersenne.org. (5.7.5.1.1 specifies exponent value may be any unsigned 32 bit integer; p < 232 = 4,294,967,296.)

The current PrimeNet server is unable to automatically handle PRP proofs incoming for exponents above ~595.8M due to its CPU instruction set. 642.6M failed. 485M has worked. George is done running a 642.6M verification on an AVX512 laptop. It required significant manual intervention. He's probably hoping for high proof powers and low quantity of primality test submissions for exponent >595.8M in the future. (Gpuowl v6.11-3xx max proof level is 10). At some point large CERTs might become available as a separate work category for volunteers similar to strategic double or triple checking. M843112609 is waiting patiently.

As I understand it:
The present server has the SSE2 (or maybe AVX) instruction set.
The proof handling code on the server uses mprime/ prime95 code.
The maximum fft length implemented for the server's instruction set is 32M.
The corresponding maximum exponent that can be handled by 32M is ~595.8M.
It's only an issue with PRP proof handling, not PRP result, LL, P-1, or TF.

Some potential workarounds or solutions, which George has not implemented (I'm sure he has his reasons, including the likelihood it's more complicated than I can guess, or other priorities, and relative rarity of PRP proof files for exponents >595.8M):
  • Server upgrade to AVX512 hardware which would raise the limit to >1G (or AVX2 which would raise the limit to ~920M)
  • Creation of a single 56M SSE2 fft, which could handle the rapid proof file acceptance/cert task generation inefficiently but at perhaps acceptable compute time cost, from 596M up to 1G
  • Creation and use of a karatsuba code path to leverage smaller existing gwnum SSE2 fft sizes for larger exponents' proof file processing; possibly more efficient than a single 56M SSE2 fft for some exponents
  • Use of gmp for operations on proof files for exponents above the current SSE2 size limit (probably slow/costly compared to the preceding two)
  • Creating a path for offloading the >595M proof file handling to an AVX512 system automatically
  • Manual handling of p>595.8M proof files (the current method), and perhaps offering their corresponding longer cert work to volunteers in a forum thread

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-08-28 at 14:09
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Primenet API kriesel PrimeNet 18 2020-01-26 00:06
Primenet and GMP-ECM ET_ PrimeNet 9 2018-07-04 20:28
Hello PrimeNet!! SeeD419 PrimeNet 7 2011-07-11 18:09
56.0-57.x on PrimeNet v5 ckdo Lone Mersenne Hunters 0 2008-09-04 05:54
47.0-48.0 on PrimeNet ckdo Lone Mersenne Hunters 0 2008-02-14 20:05

All times are UTC. The time now is 21:29.


Sat Oct 23 21:29:44 UTC 2021 up 92 days, 15:58, 0 users, load averages: 0.66, 0.94, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.