mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2021-06-30, 19:58   #1673
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2F116 Posts
Default

Quote:
Originally Posted by birtwistlecaleb View Post
They both instantly close when they find that, so I can't see that. Am I supposed to manually add the file?
Yes. Add the file and fill it with work. If it's empty but present, the program will still quit on start.
Viliam Furik is offline   Reply With Quote
Old 2021-06-30, 20:01   #1674
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

2×17×109 Posts
Default

Quote:
Originally Posted by birtwistlecaleb View Post
They both instantly close when they find that
Especially when debugging, it's very useful to open a command prompt first at the location of the file you're trying to run and then run it from the command prompt rather than just double-clicking the executable, that way you can see what output it is giving.
James Heinrich is offline   Reply With Quote
Old 2021-06-30, 20:24   #1675
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·101 Posts
Default

Quote:
Originally Posted by birtwistlecaleb View Post
They both do not have a worktodo.txt, and it seems like they are broken because of that.
Something's malfunctioning, but it's clearly happening elsewhere than the software. Check between the furniture (chair or whatever) and the keyboard; do whatever maintenance needed there.

Use the readme that comes included with the software. Read it. Understand it. Follow its plain directions.

Mfaktc:
Code:
################################
# 3.1 Running mfaktc (Windows) #
################################

Similar to Linux (read above!).
Open a command window and run 'mfaktc.exe -h'.


####################################################################
# 4 How to get work and report results from/to the Primenet server #
####################################################################

Getting work:
    Step 1) go to http://www.mersenne.org/ and login with your username and
            password
    Step 2) on the menu on the left click "Manual Testing" and then
            "Assignments"
    Step 3) choose the number of assignments by choosing
            "Number of CPUs (cores) you need assignments for (maximum 12)"
            and "Number of assignments you want for each core"
    Step 4) Change "Preferred work type" to "Trial factoring"
    Step 5) click the button "Get Assignments"
    Step 6) copy&paste the "Factor=..." lines directly into the worktodo.txt
             in your mfaktc directory
Mfakto is similar:
Code:
Open a terminal window and run 'mfakto -h' for possible parameters. You may
also want to check mfakto.ini for additional settings. mfakto typically fetches
work from worktodo.txt as specified in the INI file. See section 3 on how to
obtain assignments and report results.

A typical worktodo.txt file looks like this:
  -- begin example --
  Factor=[assignment ID],66362159,64,68
  Factor=[assignment ID],3321932899,76,77
  -- end example --
Code:
########################################
# 3 Getting work and reporting results #
########################################

You must have a PrimeNet account to participate. Simply visit the GIMPS website
at https://mersenne.org to create one. Once you've signed up, you can get
assignments in several ways.

From the GIMPS website:
    Step 1) log in to the GIMPS website with your username and password
    Step 2) on the menu bar, select Manual Testing > Assignments
    Step 3) open the link to the manual GPU assignment request form
    Step 4) enter the number of assignments or GHz-days you want
    Step 5) click "Get Assignments"
...
Code:
    Once you have your assignments, copy the "Factor=..." lines directly into
    your worktodo.txt file. Start mfakto, sit back and let it do its job.
    Running mfakto is also a great way to stress test your GPU. ;-)

Use the mfakto or mfaktc reference info. "Create a worktodo file and put some assignments in there. Start with few, in case your gpu or igp does not work out. Get the type you plan to run the most. Get them from https://www.mersenne.org/manual_gpu_assignment/"
"Create a Windows batch file or Linux shell script with a short name.
Set the device number there.
Consider redirecting console output to a file or employing a good tee program."

Stop wasting other people's time, and apply some of your own intelligence and time.

Last fiddled with by kriesel on 2021-06-30 at 20:40
kriesel is online now   Reply With Quote
Old 2021-06-30, 21:10   #1676
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

3·31·113 Posts
Default

And MISFIT can handle all of the work filling a worktodo and submitting results.
https://www.mersenneforum.org/forumdisplay.php?f=103

Last fiddled with by Uncwilly on 2021-06-30 at 21:10
Uncwilly is offline   Reply With Quote
Old 2021-06-30, 21:23   #1677
birtwistlecaleb
 
birtwistlecaleb's Avatar
 
Jun 2021

23·3·5 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Yes. Add the file and fill it with work. If it's empty but present, the program will still quit on start.
Thanks! I got it working now.
This is what I got if you find something else that happened.
Code:
got assignment: exp=219633173 bit_min=73 bit_max=74 (8.71 GHz-days)
Starting trial factoring M219633173 from 2^73 to 2^74 (8.71Ghz-days)
Using GPU kernel "[Just to be safe, censored.]"
There were no extra lines for a couple minutes and none now.
Edit: No lines for around 30 minutes. Does it only make a line when you finish a tf assignment/1 bitlevel?

Last fiddled with by birtwistlecaleb on 2021-06-30 at 22:03
birtwistlecaleb is offline   Reply With Quote
Old 2021-08-24, 04:44   #1678
DrobinsonPE
 
Aug 2020

8716 Posts
Default Ryzen 5 5600G

Code:
MFAKTO, AMD Ryzen 5 5600G, ASROCK B450-HDV R4.0, DDR4-3600 RAM
----------------------------------------------------------------
Selftest statistics
  number of tests           34026
  successful tests          34026

selftest PASSED!


C:\Users\User\mfakto>mfakto
mfakto 0.15pre7-MGW (64bit build)


Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24 Kib
  GPUSieveSize              96 Mib
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300 s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   AUTO
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "gfx90c (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
  name                      gfx90c (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 2.0 AMD-APP (3276.6) (3276.6 (PAL,HSAIL))
  maximum threads per block 1024
  maximum threads per grid  1073741824
  number of multiprocessors 7 (448 compute elements)
  clock rate                1900 MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
  GPUSievePrimes (adjusted) 81206
  GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
  number of tests           30
  successful tests          30

selftest PASSED!

got assignment: exp=103399837 bit_min=75 bit_max=76 (74.00 GHz-days)
Starting trial factoring M103399837 from 2^75 to 2^76 (74.00 GHz-days)
Using GPU kernel "cl_barrett15_82_gs_2"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Aug 23 20:51 |   24   0.6% | 40.947  10h51m |    162.66    81206    0.00%
mfakto will exit once the current class is finished.
press ^C again to exit immediately
Aug 23 20:52 |   27   0.7% | 40.947  10h50m |    162.66    81206    0.00%
DrobinsonPE is offline   Reply With Quote
Old 2021-08-28, 22:57   #1679
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

Not sure if numbers for RDNA2 have been posted here... haven't kept up.

6700 XT <- probably thermal throttling. owner/tester said parts of the card hit 99C
Code:
Resulting speed for M78000071:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      69      1573.192  cl_barrett15_69_gs 
     69 -      70      1491.949  cl_barrett15_71_gs 
     70 -      73      1311.655  cl_barrett15_73_gs 
     73 -      74      1283.831  cl_barrett15_74_gs 
     74 -      76      1254.737  cl_barrett32_76_gs 
     76 -      77      1251.052  cl_barrett32_77_gs 
     77 -      81      1180.985  cl_barrett15_82_gs 
     81 -      87      1105.831  cl_barrett32_87_gs 
     87 -      88      1100.490  cl_barrett32_88_gs 
     88 -      92       957.069  cl_barrett32_92_gs
RX 6600 XT
Code:
Resulting speed for M78000071:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      69      1251.068  cl_barrett15_69_gs  
     69 -      70      1185.451  cl_barrett15_71_gs  
     70 -      73      1040.123  cl_barrett15_73_gs  
     73 -      74      1018.516  cl_barrett15_74_gs  
     74 -      76      1015.305  cl_barrett32_76_gs  
     76 -      77      1012.099  cl_barrett32_77_gs  
     77 -      81       939.008  cl_barrett15_82_gs  
     81 -      87       893.350  cl_barrett32_87_gs  
     87 -      88       891.557  cl_barrett32_88_gs  
     88 -      92       776.045  cl_barrett32_92_gs
kracker is offline   Reply With Quote
Old 2021-08-28, 23:27   #1680
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

370610 Posts
Default

Quote:
Originally Posted by kracker View Post
Not sure if numbers for RDNA2 have been posted here...
I have not seen a single mfakto benchmark for any RX 6xxx yet. If someone who has one (of any kind) would like to submit a benchmark I would be most grateful:
https://www.mersenne.ca/mfaktc.php#benchmark
James Heinrich is offline   Reply With Quote
Old 2021-09-13, 17:01   #1681
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×101 Posts
Default Has anyone gotten an Iris Xe IGP working with mfakto on Windows?

Not me despite several tries. Ideas?
kriesel is online now   Reply With Quote
Old 2022-01-21, 03:39   #1682
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

1428 Posts
Default

1770 GHz-d/day from a Radeon VII with mfakto 0.14:

Code:
got assignment: exp=5340017 bit_min=69 bit_max=70 (22.39 GHz-days)
Starting trial factoring M5340017 from 2^69 to 2^70 (22.39GHz-days)
  k_min = 55270967334660 - k_max = 110541934671501
Using GPU kernel "cl_barrett32_77_gs_4"


Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 20 18:13 | 2163  46.8% |  1.139   9m43s |   1769.20    30005    0.00%
Code:


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                    
  718 root      30  10  719240 292672   5736 S 100.3  1.9  29:39.14 mprime                                                                                                                     
22047 root      20   0 32.425g 134184  86248 S   1.0  0.9   0:02.11 mfakto-x64
All default parameters except:

Code:
GPUType = APU
GPUSievePrimes = 30000
GPUSieveSize = 9
FlushInterval = 0
FlushInterval = 0 and GPUSieveSize = 10 had major impacts on speed (~80% combined increase)

CLinfo:
Code:
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3180.7)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon VII
  Device Topology:				 PCI[ B#7, D#0, F#0 ]
  Max compute units:				 60
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1840Mhz
  Address bits:					 64
  Max memory allocation:			 14360458035
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 64
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 17163091968
  Constant buffer size:				 14360458035
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 1475556147
  Max global variable size:			 12924412160
  Max global variable preferred total size:	 17163091968
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:				 
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7fae26339f30
  Name:						 gfx906
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 3180.7 (PAL,HSAIL)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0 AMD-APP (3180.7)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int3
2_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_devi
ce_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_imag
e cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p
amdinfo:
Code:
Thu Jan 20 18:21:28 PST 2022

=== GPU 0, 07:00.0 Radeon VII 16368 MB ===
  Bios: 113-D3600200-106, UUID: T2CY67L113160501
  Core: 1806 MHz 981mV, Mem: 1100 MHz, REF: 7500
  PerfCtrl: manual, Load: 100%, MemLoad: 0%, Power: 176.0 W, Cap: 275 W
  Core: 68°C, HotSpot: 93°C, Mem: 67°C, Fan: 26%, RPM: 1072
  Core state: 8, clocks: 700 808 1146 1394 1574 1717 1785 1810 1840*
  Mem  state: 2, clocks: 350 800 1100*
  SOC  state: 7, clocks: 309 523 566 618 680 755 850 971*
  DCEF state: 0, clocks: 357* 453 566 680 755 850 971 1133
  F    clocks: 550 610 690 760 870 960 1080 1225
  PCIE Link speed: GEN2 (5.0GT/s), PCIE Link width: x4
  Memory total: 16368.00 MB, used: 89.24 MB, free: 16278.76 MB, type: Hynix HBM2
  VDDGfx: 1000mV, VDDCR_SOC: 931mV, VDDCI_MEM: 850mV, VDDIO_MEM: 1218mV, VDDCR_HBM: 1218mV
--perftest:
Code:
Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            30000
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              9Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                4
  GPUType                   APU
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Select device - Get device info - Compiling kernels.


Perftest

Generate list of the first 1000000 primes: 1807.24 ms

Generate list of the first 1075766 primes for GPU sieving: 266.84 ms

1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 3 iterations)
	Init_class(sieveprimes=   5000):     0.83 ms
	Init_class(sieveprimes=  20000):     3.50 ms
	Init_class(sieveprimes=  80000):    15.30 ms
	Init_class(sieveprimes= 200000):    40.62 ms
	Init_class(sieveprimes= 500000):   107.34 ms
	Init_class(sieveprimes=1000000):   224.55 ms

2. CPU-Sieve (output rate M/s)
Sieve size is fixed at compile time, cannot test with variable sizes. Just running 3 fixed tests.

SievePrimes:     254     396     611     945    1460    2257    3487    5389    8328   12871   19890   30738   47503   73411  113449  175323  270944  418716  647083 1000000
SieveSizeLimit
    36 kiB     480.4   432.8   390.9   354.5   320.1   291.8   266.6   242.9   220.2   198.9   175.7   158.6   132.9   106.5    85.5    66.8    55.8    44.7    34.6    25.7
    36 kiB     480.4   434.2   391.4   351.9   320.9   292.0   266.1   243.0   220.4   198.6   176.0   158.7   133.0   106.4    86.0    69.6    55.7    43.4    33.3    25.5
    36 kiB     480.5   433.4   391.8   353.8   320.3   292.1   264.3   242.5   220.7   198.7   175.7   158.3   133.0   106.4    86.2    69.7    55.9    44.6    34.5    25.3

Best SieveSizeLimit for
SievePrimes:     254     396     611     945    1460    2257    3487    5389    8328   12871   19890   30738   47503   73411  113449  175323  270944  418716  647083 1000000
at kiB:           36      36      36      36      36      36      36      36      36      36      36      36      36      36      36      36      36      36      36      36
max M/s:       480.5   434.2   391.8   354.5   320.9   292.1   266.6   243.0   220.7   198.9   176.0   158.7   133.0   106.5    86.2    69.7    55.9    44.7    34.6    25.7
Survivors:    36.41%  34.07%  32.06%  30.27%  28.69%  27.28%  26.00%  24.83%  23.78%  22.82%  21.93%  21.11%  20.37%  19.67%  19.01%  18.40%  17.83%  17.30%  16.78%  16.32%
removal rate   839.3   840.4   830.3   816.5   797.5   778.7   758.9   735.5   707.2   672.7   626.7   593.1   520.2   434.8   367.4   309.1   257.7   213.6   171.5   132.0


3. Memory copy to GPU (blocks of 8388608 bytes)

  Standard copy, standard queue:
     240 MB in    0.1 ms (3355443.2 MB/s) (real)

  Standard copy, profiled queue:
     240 MB in    0.1 ms (4125544.9 MB/s) (real)
     240 MB in    0.0 ms (   inf MB/s) (profiled data)
       8 MB in    0.0 ms (   inf MB/s) (profiled data, peak)

  Standard copy, two queues:
     240 MB in    0.2 ms (1553445.9 MB/s) (real)

Reinitializing with gpu_sieving enabled.
Select device - Get device info - Compiling kernels.

4. GPU sieve, 3 iterations each

 gpusieve_init: 16.524000 ms (CPU work)
 gpusieve_init_exponent: 1.352000 ms (CalcModularInverses)
 gpusieve_init_class: 0.345333 ms (CalcBitToClear)
 gpusieve: 11.457333 ms (SegSieve)
  tf: 11.666667 ms = 11324.620800 M/s (raw rate, cl_barrett15_69_gs)

 GPU sieve raw rate (input rate M/s)
SievePrimes:      54     396     611     945    1460    2257    3487    5389    8328   12871   19890   30738   47503   73411  113449  175323  270944  418716  647083 1075766
GPUSieveSize  
     4 MBit   67144.7 55897.9 46651.4 47567.2 47719.0 44864.2 44350.6 33783.1 31516.4 28207.1 30356.8 21151.7 16961.0 13197.5  7731.4  3728.7  1846.4   797.4   409.8   183.5
     5 MBit   80432.8 58118.3 56611.8 52614.5 57999.1 54424.4 47021.3 49629.2 45024.7 41184.3 35061.6 28766.1 20894.9 16010.0  7669.4  4100.9  2163.0   984.1   511.4   228.5
     6 MBit   94965.4 76790.8 66693.9 69571.6 67049.3 63493.1 61039.8 56956.1 52881.7 46405.5 41147.5 32349.9 24912.5 15195.0  8105.8  4596.3  2482.9  1166.9   607.3   274.5
     7 MBit   98523.9 76711.0 78596.4 69931.2 75540.6 71109.0 60317.2 61720.8 52180.3 52202.8 46387.4 38451.9 25545.4 13105.0  8407.8  4872.6  2713.3  1309.7   709.0   318.8
     8 MBit   87351.0 75656.7 74923.1 66174.6 70852.2 68212.4 64932.1 57345.4 54967.2 48345.1 40175.3 38532.2 26071.8 14798.5  9123.0  5256.2  2973.4  1499.4   799.4   362.2
     9 MBit   94576.8 78769.9 69504.6 73223.2 68832.2 62923.9 66840.1 57760.5 58839.4 52060.7 47742.9 31783.0 22332.1 13762.6  8640.8  5569.4  3192.6  1670.4   893.1   409.5
    10 MBit   79487.8 72676.1 71520.9 70812.2 63945.7 59375.8 57689.5 51111.7 49184.0 41028.0 36663.5 25067.8 19034.7 11769.2  8343.4  5551.4  3344.9  1813.0   990.5   452.5
    12 MBit   47313.1 42530.0 39160.7 39331.2 37001.8 36336.5 37946.6 34166.5 30534.9 27125.4 24789.0 20286.5 16061.6 12119.5  8604.0  5930.3  3745.8  2101.3  1176.1   537.8
    16 MBit   22587.5 21808.5 21509.8 21477.9 21030.7 20776.5 19966.1 19902.4 19072.8 17849.3 16723.7 14571.1 12884.5 10701.1  8758.7  6492.2  4458.7  2578.7  1412.5   649.9
    20 MBit   15453.9 15185.0 15132.2 15221.4 15066.6 14882.8 14787.3 14682.8 14482.5 13661.7 13035.0 12573.4 11687.9 10430.4  9021.7  6990.8  4982.8  3089.2  1744.9   807.9
    24 MBit   12942.5 12791.0 12866.7 12805.2 12851.6 12730.2 12594.8 12655.6 12564.2 12325.6 12147.8 11651.2 11087.9 10117.6  9182.0  7465.4  5375.6  3427.5  1894.0   855.3
    36 MBit   11300.3 11422.1 11510.5 11520.6 11526.4 11503.4 11558.1 11513.8 11577.9 11554.3 11500.8 11152.0 10877.9 10332.0  9545.8  8228.9  6584.6  4368.2  2543.6  1223.9
    48 MBit   11570.8 11544.3 11555.6 11530.2 11582.2 11596.1 11618.3 11600.9 11642.0 11562.4 11576.5 11385.3 11221.2 10702.4 10117.7  8969.0  7350.4  5160.2  3032.2  1505.7
    96 MBit   11680.3 11685.6 11701.2 11464.8 11719.0 11770.2 11788.6 11806.0 11853.3 11784.6 11820.7 11733.9 11665.5 11400.5 11138.8 10382.2  9262.4  7109.1  4370.9  2269.4
   101 MBit   11708.2 11692.3 11713.3 11682.9 11745.7 11755.0 11780.7 11730.7 11770.4 11825.7 11861.9 11757.6 11718.3 11432.5 11189.0 10466.5  9412.2  7270.0  4564.3  2382.8
   102 MBit   11678.8 11687.7 11711.7 11705.0 11763.0 11769.1 11789.8 11784.5 11828.4 11809.3 11857.6 11683.7 11520.7 11409.1 11110.6 10543.9  9416.0  7304.5  4596.0  2406.8
   103 MBit   11706.2 11687.2 11703.6 11679.3 11744.7 11768.8 11775.1 11816.2 11823.8 11820.9 11860.6 11716.9 11739.0 11450.1 11170.4 10546.0  9420.6  7284.3  4435.2  2422.8
   104 MBit   11714.7 11703.0 11718.9 11709.3 11747.5 11777.2 11738.6 11821.8 11855.0 11839.9 11856.1 11751.1 11746.2 11430.0 11191.3 10524.9  9413.5  7247.2  4945.0  2689.0
   105 MBit   11489.0 11682.7 11718.2 11706.5 11750.7 11754.8 11781.6 11790.3 11850.9 11811.9 11827.1 11745.4 11719.1 11417.7 11171.5 10490.0  9425.0  7249.4  4989.1  2712.0
   106 MBit   11708.1 11692.9 11661.2 11520.2 11743.3 11766.2 11786.9 11805.0 11832.2 11795.0 11831.3 11734.9 11719.8 11414.8 11158.9 10547.9  9429.5  7206.8  4800.5  2595.6
   120 MBit   11746.6 11727.7 11711.1 11723.0 11718.8 11664.1 11821.2 11839.7 11870.8 11816.1 11877.3 11794.6 11767.6 11433.1 11319.4 10693.6  9725.4  7620.0  5295.4  2887.6
   121 MBit   11752.6 11740.9 11719.6 11724.0 11777.3 11539.4 11787.7 11840.9 11874.5 11857.7 11866.9 11779.4 11793.1 11513.9 11317.9 10738.1  9695.5  7585.6  5118.6  2781.0
   123 MBit   11728.8 11728.1 11723.2 11713.4 11755.0 11741.9 11599.1 11843.5 11884.1 11846.1 11886.1 11769.9 11786.1 11535.4 11336.3 10748.9  9791.2  7664.4  5145.4  2821.8
   124 MBit   11730.7 11714.8 11723.3 11716.5 11749.9 11738.0 11551.4 11820.3 11874.5 11843.5 11882.5 11774.2 11798.5 11477.8 11331.5 10763.4  9794.0  7646.6  5189.6  2829.4
   125 MBit   11740.0 11713.1 11726.4 11717.8 11672.1 11742.7 11807.8 11847.5 11870.3 11841.3 11870.7 11781.5 11807.0 11490.3 11310.9 10786.8  9820.4  7656.5  5224.7  2861.0
   126 MBit   11725.1 11709.6 11717.2 11731.4 11705.2 11688.9 11819.9 11851.9 11878.1 11835.8 11879.2 11787.2 11807.5 11542.7 11299.6 10796.9  9820.8  7731.2  5254.1  2874.5
   127 MBit   11725.7 11725.4 11711.1 11705.3 11602.0 11797.7 11817.2 11845.8 11857.8 11847.2 11879.4 11795.2 11793.5 11531.3 11354.6 10811.1  9795.0  7701.9  5240.1  2883.2
   128 MBit   11739.6 11725.8 11733.2 11648.0 11703.5 11797.5 11832.7 11858.5 11885.6 11843.9 11911.3 11792.9 11800.4 11565.2 11393.2 10789.1  9871.8  7716.1  5325.8  2925.0

Best GPUSieveSize for
SievePrimes:      54     310    1078    1078    1846    2614    3382    5686    8502   13622   19766   31030   47414   74038  113206  175670  270902  419382  647734 1075766
at MiB:            7       9       7       9       7       7       9       7       9       7       9       8       8       5     128     127     128     126     128     128
max M/s:     98523.9 78769.9 78596.4 73223.2 75540.6 71109.0 66840.1 61720.8 58839.4 52202.8 47742.9 38532.2 26071.8 16010.0 11393.2 10811.1  9871.8  7731.2  5325.8  2925.0
Survivors:    48.30%  35.57%  30.05%  30.05%  28.19%  27.10%  26.36%  24.98%  24.00%  22.95%  22.19%  21.33%  20.58%  19.85%  19.21%  18.58%  18.00%  17.46%  16.96%  16.42%
removal rate
  average:   50941.7 50750.5 54978.7 51219.1 54243.3 51836.2 49222.0 46304.9 44719.3 40223.8 37150.6 30312.8 20705.0 12831.3  9204.8  8802.2  8094.6  6381.0  4422.4  2444.7
  incremental:   n/a 49988.1 1970775.4   -15.2 -44339.6 13211.7  8290.9 11134.5 12340.0  4863.2  4252.2  1707.2   601.9   303.2   255.5  1323.5   657.6   192.2    85.7    35.3

5. mfakto_cl_71 kernel
  soon
6. barrett_79 kernel
  soon
7. barrett_92 kernel
  soon
Ethan (EO) is offline   Reply With Quote
Old 2022-01-21, 06:39   #1683
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

2×72 Posts
Default

Quote:
Originally Posted by Ethan (EO) View Post
1770 GHz-d/day from a Radeon VII with mfakto 0.14:
Okay, took the time to build 0.15pre8 and it's much faster:

Code:
got assignment: exp=5340017 bit_min=69 bit_max=70 (22.39 GHz-days)
Starting trial factoring M5340017 from 2^69 to 2^70 (22.39 GHz-days)
Using GPU kernel "cl_barrett32_76_gs_2"

Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 20 22:21 | 3780  82.0% |  0.795   2m18s |   2534.74    99894    0.00%
None of the mfakto.ini changes I made for 0.14 were needed - this is w/ default values for mfakto.ini in 0.15pre8.

A bit of exponent cherry picking and some overclocking, and we can hit 3200GHz-d/day:

Code:
got assignment: exp=1140863 bit_min=69 bit_max=70 (104.80 GHz-days)
Starting trial factoring M1140863 from 2^69 to 2^70 (104.80 GHz-days)
Using GPU kernel "cl_barrett32_76_gs_2"

Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 20 22:46 |   61   1.5% |  2.937  46m18s |   3211.48    81206    0.00%
perftest snippets [Default Clocks]:
Code:
Resulting speed for M2000093:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      2424.561  cl_barrett15_69_gs  
     64 -      76      2779.709  cl_barrett32_76_gs  
     76 -      77      2575.103  cl_barrett32_77_gs  
     77 -      87      2454.118  cl_barrett32_87_gs  
     87 -      88      2271.072  cl_barrett32_88_gs  
     88 -      92      2152.565  cl_barrett32_92_gs 

Resulting speed for M39000037:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1918.657  cl_barrett15_69_gs  
     64 -      76      2226.954  cl_barrett32_76_gs  
     76 -      77      2054.627  cl_barrett32_77_gs  
     77 -      87      1952.592  cl_barrett32_87_gs  
     87 -      88      1799.562  cl_barrett32_88_gs  
     88 -      92      1695.062  cl_barrett32_92_gs 

Resulting speed for M66362159:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1919.665  cl_barrett15_69_gs  
     64 -      76      2227.070  cl_barrett32_76_gs  
     76 -      77      2018.324  cl_barrett32_77_gs  
     77 -      87      1952.919  cl_barrett32_87_gs  
     87 -      88      1800.070  cl_barrett32_88_gs  
     88 -      92      1695.632  cl_barrett32_92_gs 

Resulting speed for M74000077:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1858.822  cl_barrett15_69_gs  
     64 -      76      2150.142  cl_barrett32_76_gs  
     76 -      77      1977.225  cl_barrett32_77_gs  
     77 -      87      1882.914  cl_barrett32_87_gs  
     87 -      88      1731.277  cl_barrett32_88_gs  
     88 -      92      1632.379  cl_barrett32_92_gs 

Resulting speed for M78000071:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1845.433  cl_barrett15_69_gs  
     64 -      76      2143.972  cl_barrett32_76_gs  
     76 -      77      1977.783  cl_barrett32_77_gs  
     77 -      87      1878.918  cl_barrett32_87_gs  
     87 -      88      1732.445  cl_barrett32_88_gs  
     88 -      92      1629.175  cl_barrett32_92_gs  

Resulting speed for M332900047:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1706.641  cl_barrett15_69_gs  
     64 -      76      1990.833  cl_barrett32_76_gs  
     76 -      77      1835.323  cl_barrett32_77_gs  
     77 -      87      1740.026  cl_barrett32_87_gs  
     87 -      88      1603.641  cl_barrett32_88_gs  
     88 -      92      1505.604  cl_barrett32_92_gs  

Resulting speed for M999900079:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1662.079  cl_barrett15_69_gs  
     64 -      76      1928.562  cl_barrett32_76_gs  
     76 -      77      1771.161  cl_barrett32_77_gs  
     77 -      87      1684.428  cl_barrett32_87_gs  
     87 -      88      1546.434  cl_barrett32_88_gs  
     88 -      92      1454.844  cl_barrett32_92_gs 

Resulting speed for M2001862367:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1582.528  cl_barrett15_69_gs  
     64 -      76      1855.355  cl_barrett32_76_gs  
     76 -      77      1711.931  cl_barrett32_77_gs  
     77 -      87      1619.642  cl_barrett32_87_gs  
     87 -      88      1493.767  cl_barrett32_88_gs  
     88 -      92      1398.778  cl_barrett32_92_gs 

Resulting speed for M4201971233:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      64      1554.085  cl_barrett15_69_gs  
     64 -      76      1805.688  cl_barrett32_76_gs  
     76 -      77      1656.103  cl_barrett32_77_gs  
     77 -      87      1574.842  cl_barrett32_87_gs  
     87 -      88      1444.049  cl_barrett32_88_gs  
     88 -      92      1357.261  cl_barrett32_92_gs

Last fiddled with by Ethan (EO) on 2022-01-21 at 06:48
Ethan (EO) is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2760 2022-05-15 00:00
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3541 2022-04-21 22:37
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 15:00.


Mon May 16 15:00:02 UTC 2022 up 32 days, 13:01, 1 user, load averages: 1.62, 1.59, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔