mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

DrobinsonPE 2020-10-07 02:51

[QUOTE=James Heinrich;559028]You can try running [c]mfakto --perftest[/c] and see if that happens to reveal anything interesting.[/QUOTE]

Thank you for the suggestion. I ran [c]mfakto --perftest[/c]. It produced a large amount of text that I went through and did not see anything of interest. It first ran the sieve on the CPU then recompiled to sieve on the GPU. There was a lot of raw data output and then it presented a bunch of tables showing the output speed of each different kernel for a large range of exponents. I did not see any warnings or errors so I can only assume that the program is working properly.

The CPU usage must have something to do with what LaurV said about the IGPU using the resources and bandwidth of the CPU or what you said about the older IGPU not having a needed hardware feature. It is still fun to see that mfakto can run and produce a little work on an old, half-broken, free laptop.

kracker 2020-10-09 23:35

My guess would be drivers. (mfakto on my i5-4670k also uses a full cpu core, while on a i3-8100 it uses almost no resources)

James Heinrich 2020-10-11 19:20

[QUOTE=kracker;559378]mfakto on my ... i3-8100[/QUOTE]Thanks for posting that -- my "other" computer has an i3-8100 and I didn't know it could run mfakto. :redface:
Free extra 21Ghd/d, thanks! :smile:

Viliam Furik 2020-10-13 09:51

Radeon VII - poor performance against promised numbers
 
It was probably discussed earlier, but can anyone tell me, why can't I squeeze more than 1300 GHz-D/D (2.6 TFLOPS) from Radeon VII, despite it promising 13.8 TFLOPS (about 10 % more than RTX 2080Ti) of SP throughput?

Also in gpuOwl, only about 0.75 TFLOPS despite AMD promising 3.46 TFLOPS.

It seems to always be a fifth of what it should be according to the [URL="https://www.amd.com/en/products/graphics/amd-radeon-vii"]official AMD page[/URL].

James Heinrich 2020-10-13 10:42

[QUOTE=Viliam Furik;559728]It was probably discussed earlier, but can anyone tell me, why can't I squeeze more than 1300 GHz-D/D (2.6 TFLOPS) from Radeon VII, despite it promising 13.8 TFLOPS (about 10 % more than RTX 2080Ti) of SP throughput?[/QUOTE]I'm not sure how you're calculating FLOPS performance in mfakto/gpuowl?

The mfakto performance you quote seems to be in line with expected [url=https://www.mersenne.ca/mfaktc.php?filter=Radeon%20VII|RTX%202080%20Ti]mfakto performance[/url] (noting that my chart shows stock-clock performance). Also note AMD's somewhat deceptive practice of quoting "peak performance" numbers, which mean performance at 1750 boost instead of 1400 stock. The numbers on my pages are all stock-clock values.

kriesel 2020-10-13 12:58

[QUOTE=Viliam Furik;559728]It was probably discussed earlier, but can anyone tell me, why can't I squeeze more than 1300 GHz-D/D (2.6 TFLOPS) from Radeon VII, despite it promising 13.8 TFLOPS (about 10 % more than RTX 2080Ti) of SP throughput?

Also in gpuOwl, only about 0.75 TFLOPS despite AMD promising 3.46 TFLOPS.

It seems to always be a fifth of what it should be according to the [URL="https://www.amd.com/en/products/graphics/amd-radeon-vii"]official AMD page[/URL].[/QUOTE]If you're using mfakto as the measure of sp performance, note that mfakto & mfaktc use int32, not fp32.

Also the AMD spec sheet says "peak". Gpuowl is memory bandwidth constrained per Mihai. Which would make sustained rate difficult to maintain at peak rate.

[URL]https://www.amd.com/en/products/graphics/amd-radeon-vii[/URL]

[URL]https://www.techpowerup.com/gpu-specs/radeon-vii.c3358[/URL]

LaurV 2020-10-13 14:19

[QUOTE=kriesel;559734]If you're using mfakto as the measure of sp performance, note that mfakto & mfaktc use int32, not fp32[/QUOTE]
This!
Radeon VII has a poor integer performance.

axn 2020-10-13 14:21

[QUOTE=James Heinrich;559730]I'm not sure how you're calculating FLOPS performance in mfakto/gpuowl?[/QUOTE]
He's using standard Primenet conversion 1Gd/d = 2GFLOPS

kriesel 2020-10-27 01:23

lsgpu
 
A small utility to list the OpenCL platforms and devices on them and a bit of description.
See [URL="https://www.mersenneforum.org/showpost.php?p=488474&postcount=6"]https://www.mersenneforum.org/showpo...74&postcount=6[/URL]

Could help the occasional mfakto installation startup.

DrobinsonPE 2020-12-06 16:08

After switching to Windows 10, I got mfakto to run on my ASRock Deskmini A300W, AMD A8-9600, 16GB DDR-4, SSD.

[CODE]Selftest statistics
number of tests 34026
successful tests 34026

selftest PASSED!

mfakto 0.15pre7-MGW (64bit build)
OpenCL device info
name Bristol Ridge (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (2841.19) (2841.19)
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 6 (384 compute elements)
clock rate 900 MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
number of tests 30
successful tests 30

selftest PASSED!

got assignment: exp=212335483 bit_min=73 bit_max=74 (9.01 GHz-days)
Starting trial factoring M212335483 from 2^73 to 2^74 (9.01 GHz-days)
Using GPU kernel "cl_barrett15_74_gs_2"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Dec 05 23:13 | 4617 100.0% | 11.685 0m00s | 69.39 81206 0.00%
no factor for M212335483 from 2^73 to 2^74 [mfakto 0.15pre7-MGW cl_barrett15_74_gs_2]
tf(): total time spent: 3h 6m 46.416s (69.46 GHz-days / day)
[/CODE]

Next up is a little tuning to see if it can get above 70 GHz-days / day.

DrobinsonPE 2020-12-07 09:01

Here is the results of mfakto tuning on the ASRock Deskmini A300W, AMD A8-9600, Radeon R7 IGPU, 16GB DDR-4, SSD, Windows 10, mfakto 0.15pre7, Prime95 v30.3 b6 system.

[CODE]mfakto tuning.
AMD A8-9600, Radeon R7 IGPU, 16GB ram, SSD, Windows 10, mfakto 0.15pre7
exp=103122301 bit 73 to 74
Initial settings and speed.
GPUSieveProcessSize=24, GPUSieveSize=96, GPUSievePrimes=81157, 67.97GHz-day
Final settings and speed.
GPUSieveProcessSize=32, GPUSieveSize=128, GPUSievePrimes=179766, 70.39GHz-day
Speed with prime95 also running a P-1 stage 2, 70.32GHz-day

Step 1, vary GPUSieveProcessSize
Possible values: 8, 16, 24, 32
# Also must divide GPUSieveSize * 1024
# Default: GPUSieveProcessSize=24

GPUSieveProcessSize=8 64.93GHz-day
GPUSieveProcessSize=16 67.10GHz-day
GPUSieveProcessSize=24 67.97GHz-day
GPUSieveProcessSize=32 68.68GHz-day *

Step 2: vary GPUSieveSize with GPUSieveProcessSize=32
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
# Default: GPUSieveSize=96
GPUSieveSize=32 68.36GHz-day
GPUSieveSize=64 68.54GHz-day
GPUSieveSize=96 68.68GHz-day
GPUSieveSize=128 68.70GHz-day *

Step 3: vary GPUSievePrimes with GPUSieveSize=128, GPUSieveProcessSize=32
# Minimum: GPUSievePrimes=54
# Maximum: GPUSievePrimes=1075766
# Default: GPUSievePrimes=81157
GPUSievePrimes=21814 62.88GHz-day
GPUSievePrimes=67894 68.35GHz-day
GPUSievePrimes=81157 68.70GHz-day
GPUSievePrimes=99894 69.10GHz-day
GPUSievePrimes=120374 69.52GHz-day
GPUSievePrimes=139830 69.84GHz-day
GPUSievePrimes=160310 70.14GHz-day
GPUSievePrimes=179766 70.39GHz-day*
GPUSievePrimes=200246 70.22GHz-day
[/CODE]

Dylan14 2020-12-10 06:07

I have created a PKGBUILD for this software for use on Arch. You can find it here: [url]https://aur.archlinux.org/packages/mfakto/[/url]

kriesel 2020-12-10 15:13

[QUOTE=axn;559745]He's using standard Primenet conversion 1Gd/d = 2GFLOPS[/QUOTE]
Hmm, wouldn't that be based on Core2Duo instruction set and performance? By definition? [url]https://www.mersenneforum.org/showpost.php?p=533167&postcount=4[/url]

It might be considered standard, but it seems to me far from valid for application to a gpu with a completely different architecture and instruction set and its own performance constraints.

axn 2020-12-10 18:15

[QUOTE=kriesel;565865]It might be considered standard, but it seems to me far from valid for application to a gpu with a completely different architecture and instruction set and its own performance constraints.[/QUOTE]
As long as the algorithm used is same-ish (i.e. IBDWT), the architectural differences are irrelevant.

Given that the algorithm is same, 1 iteration at a given FFT size takes x floating point operations, snd will get you y GHzD credit. This won't change because of the processor. Only thing that is affected by the processor is the time to complete that iteration; the faster a processor completes an iteration, the higher its GHzD/d rating, but also its GFLOPS rating. But the conversion factor between these two doesn't change and will remain independent of the processor.

kriesel 2020-12-10 19:16

Mfakto is performing TF, by a variety of kernels, not FFT multiplication by IBDWT.
FFT almost always uses DP. TF kernels may use SP or int. Different hardware designs have different ratios among them. Gpus have vastly different DP/SP or I think DP/int32 ratios than cpus.

VBCurtis 2020-12-10 19:40

So what?
Each TF bit takes some specific amount of work. You're complaining about how we measure work done, but not suggesting some alternative.

axn 2020-12-11 02:45

[QUOTE=kriesel;565888]Mfakto is performing TF, by a variety of kernels, not FFT multiplication by IBDWT.
FFT almost always uses DP. TF kernels may use SP or int. Different hardware designs have different ratios among them. Gpus have vastly different DP/SP or I think DP/int32 ratios than cpus.[/QUOTE]

Well, whoops. I guess I don't remember the context of posts from two months ago that well :-(

All good points; you can inform OP to take those into consideration and adjust their calculations.

DrobinsonPE 2020-12-20 16:54

GB-BRi5H-8250, i508250U, UHD 620, 16GB DDR-4, SSD, Windows 10.

mfakto 0.15pre7

[CODE]C:\Users\user\mfakto\015pre7>mfakto -st

Selftest statistics
number of tests 34026
successful tests 33288
no factor found 738

selftest FAILED![/CODE]

I need to find what exponent levels it is failing on and see if there is just a range to avoid.

[CODE]C:\Users\user\mfakto\015pre7>mfakto
mfakto 0.15pre7-MGW (64bit build)
OpenCL device info
name Intel(R) UHD Graphics 620 (Intel(R) Corporation)
device (driver) version OpenCL 2.1 NEO (27.20.100.8681)
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 24 (24 compute elements)
clock rate 1100 MHz
Automatic parameters
threads per grid 0
optimizing kernels for INTEL
selftest PASSED!
got assignment: exp=115746439 bit_min=73 bit_max=74 (16.53 GHz-days)
Starting trial factoring M115746439 from 2^73 to 2^74 (16.53 GHz-days)
Using GPU kernel "cl_barrett32_76_gs_2"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Dec 20 08:18 | 0 0.1% | 60.505 16h07m | 24.58 81206 0.00%
Dec 20 08:19 | 5 0.2% | 60.985 16h13m | 24.39 81206 0.00%
Dec 20 08:20 | 9 0.3% | 61.054 16h13m | 24.36 81206 0.00%
Dec 20 08:21 | 12 0.4% | 61.219 16h15m | 24.30 81206 0.00%
Dec 20 08:22 | 17 0.5% | 61.204 16h14m | 24.30 81206 0.00%[/CODE]

DrobinsonPE 2020-12-22 15:34

[QUOTE=DrobinsonPE;566771]GB-BRi5H-8250, i508250U, UHD 620, 16GB DDR-4, SSD, Windows 10.

mfakto 0.15pre7

[CODE]C:\Users\user\mfakto\015pre7>mfakto -st

Selftest statistics
number of tests 34026
successful tests 33288
no factor found 738

selftest FAILED![/CODE]

I need to find what exponent levels it is failing on and see if there is just a range to avoid.
[/QUOTE]

All 738 -st errors are in the 61-62, 62-63, and 63-64 ranges.

DrobinsonPE 2020-12-22 15:38

I5-4570T with HD4600 Graphics

mfakto 0.15pre7

[CODE]C:\Users\user\mfakto>mfakto
mfakto 0.15pre7-MGW (64bit build)
Starting trial factoring M115801657 from 2^73 to 2^74 (16.52 GHz-days)
Using GPU kernel "cl_barrett32_76_gs_2"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Dec 22 06:48 | 12 0.4% | 89.923 23h52m | 16.53 81206 0.00%[/CODE]

birtwistlecaleb 2021-06-30 18:17

[QUOTE=James Heinrich;531269]Perhaps this one?
[url]https://download.mersenne.ca/mfakto/mfakto-0.12-hd4000[/url]

edit: Although based on what Ken says below, you're likely fine with the normal latest-version (0.15-pre6):
[url]https://download.mersenne.ca/mfakto/mfakto-0.15pre6[/url][/QUOTE]
They both do not have a worktodo.txt, and it seems like they are broken because of that.

James Heinrich 2021-06-30 18:25

[QUOTE=birtwistlecaleb;582324]They both do not have a worktodo.txt, and it seems like they are broken because of that.[/QUOTE]What do you mean? No program will come with [c]worktodo.txt[/c], that's what you supply with the assignments you're working on.
What error message(s) do you see when you run whichever version it is you're running?

birtwistlecaleb 2021-06-30 19:49

[QUOTE=James Heinrich;582326]What do you mean? No program will come with [c]worktodo.txt[/c], that's what you supply with the assignments you're working on.
What error message(s) do you see when you run whichever version it is you're running?[/QUOTE]
They both instantly close when they find that, so I can't see that. Am I supposed to manually add the file?

Viliam Furik 2021-06-30 19:58

[QUOTE=birtwistlecaleb;582335]They both instantly close when they find that, so I can't see that. Am I supposed to manually add the file?[/QUOTE]

Yes. Add the file and fill it with work. If it's empty but present, the program will still quit on start.

James Heinrich 2021-06-30 20:01

[QUOTE=birtwistlecaleb;582335]They both instantly close when they find that[/QUOTE]Especially when debugging, it's very useful to open a command prompt first at the location of the file you're trying to run and then run it from the command prompt rather than just double-clicking the executable, that way you can see what output it is giving.

kriesel 2021-06-30 20:24

[QUOTE=birtwistlecaleb;582324]They both do not have a worktodo.txt, and it seems like [B]they are broken [/B]because of that.[/QUOTE]Something's malfunctioning, but it's clearly happening elsewhere than the software. Check between the furniture (chair or whatever) and the keyboard; do whatever maintenance needed there.

Use the readme that comes included with the software. Read it. Understand it. Follow its plain directions.

Mfaktc:[CODE]################################
# 3.1 Running mfaktc (Windows) #
################################

Similar to Linux (read above!).
[B]Open a command window[/B] and run 'mfaktc.exe -h'.


####################################################################
# 4 How to get work and report results from/to the Primenet server #
####################################################################

Getting work:
Step 1) go to http://www.mersenne.org/ and login with your username and
password
Step 2) on the menu on the left click "Manual Testing" and then
"Assignments"
Step 3) choose the number of assignments by choosing
"Number of CPUs (cores) you need assignments for (maximum 12)"
and "Number of assignments you want for each core"
Step 4) Change "Preferred work type" to "Trial factoring"
Step 5) click the button "Get Assignments"
Step 6) [B]copy&paste the "Factor=..." lines directly into the worktodo.txt
[/B][B] in your mfaktc directory[/B][/CODE]
Mfakto is similar:
[CODE]Open a terminal window and run 'mfakto -h' for possible parameters. You may
also want to check mfakto.ini for additional settings. mfakto typically fetches
work from worktodo.txt as specified in the INI file. See section 3 on how to
obtain assignments and report results.

A typical worktodo.txt file looks like this:
-- begin example --
Factor=[assignment ID],66362159,64,68
Factor=[assignment ID],3321932899,76,77
-- end example --[/CODE]

[CODE]
########################################
# 3 Getting work and reporting results #
########################################

You must have a PrimeNet account to participate. Simply visit the GIMPS website
at https://mersenne.org to create one. Once you've signed up, you can get
assignments in several ways.

From the GIMPS website:
Step 1) log in to the GIMPS website with your username and password
Step 2) on the menu bar, select Manual Testing > Assignments
Step 3) open the link to the manual GPU assignment request form
Step 4) enter the number of assignments or GHz-days you want
Step 5) click "Get Assignments"[/CODE]...[CODE] Once you have your assignments, copy the "Factor=..." lines directly into
your worktodo.txt file. Start mfakto, sit back and let it do its job.
Running mfakto is also a great way to stress test your GPU. ;-)[/CODE]


Use the [URL="https://www.mersenneforum.org/showthread.php?t=23394"]mfakto[/URL] or mfaktc [URL="https://www.mersenneforum.org/showpost.php?p=488518&postcount=1"]reference info[/URL]. "Create a worktodo file and put some assignments in there. Start with few, in case your gpu or igp does not work out. Get the type you plan to run the most. Get them from [URL]https://www.mersenne.org/manual_gpu_assignment/[/URL]"
"Create a Windows batch file or Linux shell script with a short name.
Set the device number there.
Consider redirecting console output to a file or employing a good tee program."

Stop wasting other people's time, and apply some of your own intelligence and time.

Uncwilly 2021-06-30 21:10

And [URL="https://www.mersenneforum.org/showthread.php?t=22029"]MISFIT[/URL] can handle all of the work filling a worktodo and submitting results.
[url]https://www.mersenneforum.org/forumdisplay.php?f=103[/url]

birtwistlecaleb 2021-06-30 21:23

[QUOTE=Viliam Furik;582338]Yes. Add the file and fill it with work. If it's empty but present, the program will still quit on start.[/QUOTE]
Thanks! I got it working now.:bow wave:
This is what I got if you find something else that happened.
[CODE]got assignment: exp=219633173 bit_min=73 bit_max=74 (8.71 GHz-days)
Starting trial factoring M219633173 from 2^73 to 2^74 (8.71Ghz-days)
Using GPU kernel "[Just to be safe, censored.]"[/CODE]
There were no extra lines for a couple minutes and none now.
Edit: No lines for around 30 minutes. Does it only make a line when you finish a tf assignment/1 bitlevel?

DrobinsonPE 2021-08-24 04:44

Ryzen 5 5600G
 
[CODE]MFAKTO, AMD Ryzen 5 5600G, ASROCK B450-HDV R4.0, DDR4-3600 RAM
----------------------------------------------------------------
Selftest statistics
number of tests 34026
successful tests 34026

selftest PASSED!


C:\Users\User\mfakto>mfakto
mfakto 0.15pre7-MGW (64bit build)


Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
GPUSieveSize 96 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults yes
VectorSize 2
GPUType AUTO
SmallExp no
UseBinfile mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "gfx90c (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
name gfx90c (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (3276.6) (3276.6 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 7 (448 compute elements)
clock rate 1900 MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
number of tests 30
successful tests 30

selftest PASSED!

got assignment: exp=103399837 bit_min=75 bit_max=76 (74.00 GHz-days)
Starting trial factoring M103399837 from 2^75 to 2^76 (74.00 GHz-days)
Using GPU kernel "cl_barrett15_82_gs_2"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Aug 23 20:51 | 24 0.6% | 40.947 10h51m | 162.66 81206 0.00%
mfakto will exit once the current class is finished.
press ^C again to exit immediately
Aug 23 20:52 | 27 0.7% | 40.947 10h50m | 162.66 81206 0.00%[/CODE]

kracker 2021-08-28 22:57

Not sure if numbers for RDNA2 have been posted here... haven't kept up.

6700 XT <- probably thermal throttling. owner/tester said parts of the card hit 99C
[code]
Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 69 1573.192 cl_barrett15_69_gs
69 - 70 1491.949 cl_barrett15_71_gs
70 - 73 1311.655 cl_barrett15_73_gs
73 - 74 1283.831 cl_barrett15_74_gs
74 - 76 1254.737 cl_barrett32_76_gs
76 - 77 1251.052 cl_barrett32_77_gs
77 - 81 1180.985 cl_barrett15_82_gs
81 - 87 1105.831 cl_barrett32_87_gs
87 - 88 1100.490 cl_barrett32_88_gs
88 - 92 957.069 cl_barrett32_92_gs
[/code]

RX 6600 XT
[code]
Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 69 1251.068 cl_barrett15_69_gs
69 - 70 1185.451 cl_barrett15_71_gs
70 - 73 1040.123 cl_barrett15_73_gs
73 - 74 1018.516 cl_barrett15_74_gs
74 - 76 1015.305 cl_barrett32_76_gs
76 - 77 1012.099 cl_barrett32_77_gs
77 - 81 939.008 cl_barrett15_82_gs
81 - 87 893.350 cl_barrett32_87_gs
87 - 88 891.557 cl_barrett32_88_gs
88 - 92 776.045 cl_barrett32_92_gs
[/code]

James Heinrich 2021-08-28 23:27

[QUOTE=kracker;586766]Not sure if numbers for RDNA2 have been posted here...[/QUOTE]I have not seen a single mfakto benchmark for any RX 6xxx yet. If someone who has one (of any kind) would like to submit a benchmark I would be most grateful:
[url]https://www.mersenne.ca/mfaktc.php#benchmark[/url]

kriesel 2021-09-13 17:01

Has anyone gotten an Iris Xe IGP working with mfakto on Windows?
 
Not me despite [URL="https://www.mersenneforum.org/showpost.php?p=587816&postcount=9"]several tries[/URL]. Ideas?

Ethan (EO) 2022-01-21 03:39

1770 GHz-d/day from a Radeon VII with mfakto 0.14:

[CODE]got assignment: exp=5340017 bit_min=69 bit_max=70 (22.39 GHz-days)
Starting trial factoring M5340017 from 2^69 to 2^70 (22.39GHz-days)
k_min = 55270967334660 - k_max = 110541934671501
Using GPU kernel "cl_barrett32_77_gs_4"


Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 20 18:13 | 2163 46.8% | 1.139 9m43s | 1769.20 30005 0.00%[/CODE]

[CODE]



PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
718 root 30 10 719240 292672 5736 S 100.3 1.9 29:39.14 mprime
22047 root 20 0 32.425g 134184 86248 S 1.0 0.9 0:02.11 mfakto-x64
[/CODE]

All default parameters except:

[CODE]
GPUType = APU
GPUSievePrimes = 30000
GPUSieveSize = 9
FlushInterval = 0
[/CODE]

FlushInterval = 0 and GPUSieveSize = 10 had major impacts on speed (~80% combined increase)

CLinfo:
[CODE]
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3180.7)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon VII
Device Topology: PCI[ B#7, D#0, F#0 ]
Max compute units: 60
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1840Mhz
Address bits: 64
Max memory allocation: 14360458035
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 17163091968
Constant buffer size: 14360458035
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 1475556147
Max global variable size: 12924412160
Max global variable preferred total size: 17163091968
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7fae26339f30
Name: gfx906
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3180.7 (PAL,HSAIL)
Profile: FULL_PROFILE
Version: OpenCL 2.0 AMD-APP (3180.7)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int3
2_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_devi
ce_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_imag
e cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p
[/CODE]

amdinfo:
[CODE]
Thu Jan 20 18:21:28 PST 2022

=== GPU 0, 07:00.0 Radeon VII 16368 MB ===
Bios: 113-D3600200-106, UUID: T2CY67L113160501
Core: 1806 MHz 981mV, Mem: 1100 MHz, REF: 7500
PerfCtrl: manual, Load: 100%, MemLoad: 0%, Power: 176.0 W, Cap: 275 W
Core: 68°C, HotSpot: 93°C, Mem: 67°C, Fan: 26%, RPM: 1072
Core state: 8, clocks: 700 808 1146 1394 1574 1717 1785 1810 1840*
Mem state: 2, clocks: 350 800 1100*
SOC state: 7, clocks: 309 523 566 618 680 755 850 971*
DCEF state: 0, clocks: 357* 453 566 680 755 850 971 1133
F clocks: 550 610 690 760 870 960 1080 1225
PCIE Link speed: GEN2 (5.0GT/s), PCIE Link width: x4
Memory total: 16368.00 MB, used: 89.24 MB, free: 16278.76 MB, type: Hynix HBM2
VDDGfx: 1000mV, VDDCR_SOC: 931mV, VDDCI_MEM: 850mV, VDDIO_MEM: 1218mV, VDDCR_HBM: 1218mV
[/CODE]

--perftest:
[CODE]

Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 30000
GPUSieveProcessSize 24Ki bits
GPUSieveSize 9Mi bits
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults yes
VectorSize 4
GPUType APU
SmallExp no
UseBinfile mfakto_Kernels.elf
Select device - Get device info - Compiling kernels.


Perftest

Generate list of the first 1000000 primes: 1807.24 ms

Generate list of the first 1075766 primes for GPU sieving: 266.84 ms

1. CPU-Sieve-Init (once per class, 960 times per test, avg. for 3 iterations)
Init_class(sieveprimes= 5000): 0.83 ms
Init_class(sieveprimes= 20000): 3.50 ms
Init_class(sieveprimes= 80000): 15.30 ms
Init_class(sieveprimes= 200000): 40.62 ms
Init_class(sieveprimes= 500000): 107.34 ms
Init_class(sieveprimes=1000000): 224.55 ms

2. CPU-Sieve (output rate M/s)
Sieve size is fixed at compile time, cannot test with variable sizes. Just running 3 fixed tests.

SievePrimes: 254 396 611 945 1460 2257 3487 5389 8328 12871 19890 30738 47503 73411 113449 175323 270944 418716 647083 1000000
SieveSizeLimit
36 kiB 480.4 432.8 390.9 354.5 320.1 291.8 266.6 242.9 220.2 198.9 175.7 158.6 132.9 106.5 85.5 66.8 55.8 44.7 34.6 25.7
36 kiB 480.4 434.2 391.4 351.9 320.9 292.0 266.1 243.0 220.4 198.6 176.0 158.7 133.0 106.4 86.0 69.6 55.7 43.4 33.3 25.5
36 kiB 480.5 433.4 391.8 353.8 320.3 292.1 264.3 242.5 220.7 198.7 175.7 158.3 133.0 106.4 86.2 69.7 55.9 44.6 34.5 25.3

Best SieveSizeLimit for
SievePrimes: 254 396 611 945 1460 2257 3487 5389 8328 12871 19890 30738 47503 73411 113449 175323 270944 418716 647083 1000000
at kiB: 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
max M/s: 480.5 434.2 391.8 354.5 320.9 292.1 266.6 243.0 220.7 198.9 176.0 158.7 133.0 106.5 86.2 69.7 55.9 44.7 34.6 25.7
Survivors: 36.41% 34.07% 32.06% 30.27% 28.69% 27.28% 26.00% 24.83% 23.78% 22.82% 21.93% 21.11% 20.37% 19.67% 19.01% 18.40% 17.83% 17.30% 16.78% 16.32%
removal rate 839.3 840.4 830.3 816.5 797.5 778.7 758.9 735.5 707.2 672.7 626.7 593.1 520.2 434.8 367.4 309.1 257.7 213.6 171.5 132.0


3. Memory copy to GPU (blocks of 8388608 bytes)

Standard copy, standard queue:
240 MB in 0.1 ms (3355443.2 MB/s) (real)

Standard copy, profiled queue:
240 MB in 0.1 ms (4125544.9 MB/s) (real)
240 MB in 0.0 ms ( inf MB/s) (profiled data)
8 MB in 0.0 ms ( inf MB/s) (profiled data, peak)

Standard copy, two queues:
240 MB in 0.2 ms (1553445.9 MB/s) (real)

Reinitializing with gpu_sieving enabled.
Select device - Get device info - Compiling kernels.

4. GPU sieve, 3 iterations each

gpusieve_init: 16.524000 ms (CPU work)
gpusieve_init_exponent: 1.352000 ms (CalcModularInverses)
gpusieve_init_class: 0.345333 ms (CalcBitToClear)
gpusieve: 11.457333 ms (SegSieve)
tf: 11.666667 ms = 11324.620800 M/s (raw rate, cl_barrett15_69_gs)

GPU sieve raw rate (input rate M/s)
SievePrimes: 54 396 611 945 1460 2257 3487 5389 8328 12871 19890 30738 47503 73411 113449 175323 270944 418716 647083 1075766
GPUSieveSize
4 MBit 67144.7 55897.9 46651.4 47567.2 47719.0 44864.2 44350.6 33783.1 31516.4 28207.1 30356.8 21151.7 16961.0 13197.5 7731.4 3728.7 1846.4 797.4 409.8 183.5
5 MBit 80432.8 58118.3 56611.8 52614.5 57999.1 54424.4 47021.3 49629.2 45024.7 41184.3 35061.6 28766.1 20894.9 16010.0 7669.4 4100.9 2163.0 984.1 511.4 228.5
6 MBit 94965.4 76790.8 66693.9 69571.6 67049.3 63493.1 61039.8 56956.1 52881.7 46405.5 41147.5 32349.9 24912.5 15195.0 8105.8 4596.3 2482.9 1166.9 607.3 274.5
7 MBit 98523.9 76711.0 78596.4 69931.2 75540.6 71109.0 60317.2 61720.8 52180.3 52202.8 46387.4 38451.9 25545.4 13105.0 8407.8 4872.6 2713.3 1309.7 709.0 318.8
8 MBit 87351.0 75656.7 74923.1 66174.6 70852.2 68212.4 64932.1 57345.4 54967.2 48345.1 40175.3 38532.2 26071.8 14798.5 9123.0 5256.2 2973.4 1499.4 799.4 362.2
9 MBit 94576.8 78769.9 69504.6 73223.2 68832.2 62923.9 66840.1 57760.5 58839.4 52060.7 47742.9 31783.0 22332.1 13762.6 8640.8 5569.4 3192.6 1670.4 893.1 409.5
10 MBit 79487.8 72676.1 71520.9 70812.2 63945.7 59375.8 57689.5 51111.7 49184.0 41028.0 36663.5 25067.8 19034.7 11769.2 8343.4 5551.4 3344.9 1813.0 990.5 452.5
12 MBit 47313.1 42530.0 39160.7 39331.2 37001.8 36336.5 37946.6 34166.5 30534.9 27125.4 24789.0 20286.5 16061.6 12119.5 8604.0 5930.3 3745.8 2101.3 1176.1 537.8
16 MBit 22587.5 21808.5 21509.8 21477.9 21030.7 20776.5 19966.1 19902.4 19072.8 17849.3 16723.7 14571.1 12884.5 10701.1 8758.7 6492.2 4458.7 2578.7 1412.5 649.9
20 MBit 15453.9 15185.0 15132.2 15221.4 15066.6 14882.8 14787.3 14682.8 14482.5 13661.7 13035.0 12573.4 11687.9 10430.4 9021.7 6990.8 4982.8 3089.2 1744.9 807.9
24 MBit 12942.5 12791.0 12866.7 12805.2 12851.6 12730.2 12594.8 12655.6 12564.2 12325.6 12147.8 11651.2 11087.9 10117.6 9182.0 7465.4 5375.6 3427.5 1894.0 855.3
36 MBit 11300.3 11422.1 11510.5 11520.6 11526.4 11503.4 11558.1 11513.8 11577.9 11554.3 11500.8 11152.0 10877.9 10332.0 9545.8 8228.9 6584.6 4368.2 2543.6 1223.9
48 MBit 11570.8 11544.3 11555.6 11530.2 11582.2 11596.1 11618.3 11600.9 11642.0 11562.4 11576.5 11385.3 11221.2 10702.4 10117.7 8969.0 7350.4 5160.2 3032.2 1505.7
96 MBit 11680.3 11685.6 11701.2 11464.8 11719.0 11770.2 11788.6 11806.0 11853.3 11784.6 11820.7 11733.9 11665.5 11400.5 11138.8 10382.2 9262.4 7109.1 4370.9 2269.4
101 MBit 11708.2 11692.3 11713.3 11682.9 11745.7 11755.0 11780.7 11730.7 11770.4 11825.7 11861.9 11757.6 11718.3 11432.5 11189.0 10466.5 9412.2 7270.0 4564.3 2382.8
102 MBit 11678.8 11687.7 11711.7 11705.0 11763.0 11769.1 11789.8 11784.5 11828.4 11809.3 11857.6 11683.7 11520.7 11409.1 11110.6 10543.9 9416.0 7304.5 4596.0 2406.8
103 MBit 11706.2 11687.2 11703.6 11679.3 11744.7 11768.8 11775.1 11816.2 11823.8 11820.9 11860.6 11716.9 11739.0 11450.1 11170.4 10546.0 9420.6 7284.3 4435.2 2422.8
104 MBit 11714.7 11703.0 11718.9 11709.3 11747.5 11777.2 11738.6 11821.8 11855.0 11839.9 11856.1 11751.1 11746.2 11430.0 11191.3 10524.9 9413.5 7247.2 4945.0 2689.0
105 MBit 11489.0 11682.7 11718.2 11706.5 11750.7 11754.8 11781.6 11790.3 11850.9 11811.9 11827.1 11745.4 11719.1 11417.7 11171.5 10490.0 9425.0 7249.4 4989.1 2712.0
106 MBit 11708.1 11692.9 11661.2 11520.2 11743.3 11766.2 11786.9 11805.0 11832.2 11795.0 11831.3 11734.9 11719.8 11414.8 11158.9 10547.9 9429.5 7206.8 4800.5 2595.6
120 MBit 11746.6 11727.7 11711.1 11723.0 11718.8 11664.1 11821.2 11839.7 11870.8 11816.1 11877.3 11794.6 11767.6 11433.1 11319.4 10693.6 9725.4 7620.0 5295.4 2887.6
121 MBit 11752.6 11740.9 11719.6 11724.0 11777.3 11539.4 11787.7 11840.9 11874.5 11857.7 11866.9 11779.4 11793.1 11513.9 11317.9 10738.1 9695.5 7585.6 5118.6 2781.0
123 MBit 11728.8 11728.1 11723.2 11713.4 11755.0 11741.9 11599.1 11843.5 11884.1 11846.1 11886.1 11769.9 11786.1 11535.4 11336.3 10748.9 9791.2 7664.4 5145.4 2821.8
124 MBit 11730.7 11714.8 11723.3 11716.5 11749.9 11738.0 11551.4 11820.3 11874.5 11843.5 11882.5 11774.2 11798.5 11477.8 11331.5 10763.4 9794.0 7646.6 5189.6 2829.4
125 MBit 11740.0 11713.1 11726.4 11717.8 11672.1 11742.7 11807.8 11847.5 11870.3 11841.3 11870.7 11781.5 11807.0 11490.3 11310.9 10786.8 9820.4 7656.5 5224.7 2861.0
126 MBit 11725.1 11709.6 11717.2 11731.4 11705.2 11688.9 11819.9 11851.9 11878.1 11835.8 11879.2 11787.2 11807.5 11542.7 11299.6 10796.9 9820.8 7731.2 5254.1 2874.5
127 MBit 11725.7 11725.4 11711.1 11705.3 11602.0 11797.7 11817.2 11845.8 11857.8 11847.2 11879.4 11795.2 11793.5 11531.3 11354.6 10811.1 9795.0 7701.9 5240.1 2883.2
128 MBit 11739.6 11725.8 11733.2 11648.0 11703.5 11797.5 11832.7 11858.5 11885.6 11843.9 11911.3 11792.9 11800.4 11565.2 11393.2 10789.1 9871.8 7716.1 5325.8 2925.0

Best GPUSieveSize for
SievePrimes: 54 310 1078 1078 1846 2614 3382 5686 8502 13622 19766 31030 47414 74038 113206 175670 270902 419382 647734 1075766
at MiB: 7 9 7 9 7 7 9 7 9 7 9 8 8 5 128 127 128 126 128 128
max M/s: 98523.9 78769.9 78596.4 73223.2 75540.6 71109.0 66840.1 61720.8 58839.4 52202.8 47742.9 38532.2 26071.8 16010.0 11393.2 10811.1 9871.8 7731.2 5325.8 2925.0
Survivors: 48.30% 35.57% 30.05% 30.05% 28.19% 27.10% 26.36% 24.98% 24.00% 22.95% 22.19% 21.33% 20.58% 19.85% 19.21% 18.58% 18.00% 17.46% 16.96% 16.42%
removal rate
average: 50941.7 50750.5 54978.7 51219.1 54243.3 51836.2 49222.0 46304.9 44719.3 40223.8 37150.6 30312.8 20705.0 12831.3 9204.8 8802.2 8094.6 6381.0 4422.4 2444.7
incremental: n/a 49988.1 1970775.4 -15.2 -44339.6 13211.7 8290.9 11134.5 12340.0 4863.2 4252.2 1707.2 601.9 303.2 255.5 1323.5 657.6 192.2 85.7 35.3

5. mfakto_cl_71 kernel
soon
6. barrett_79 kernel
soon
7. barrett_92 kernel
soon
[/CODE]

Ethan (EO) 2022-01-21 06:39

[QUOTE=Ethan (EO);598463]1770 GHz-d/day from a Radeon VII with mfakto 0.14:
[/QUOTE]

Okay, took the time to build 0.15pre8 and it's much faster:

[CODE]
got assignment: exp=5340017 bit_min=69 bit_max=70 (22.39 GHz-days)
Starting trial factoring M5340017 from 2^69 to 2^70 (22.39 GHz-days)
Using GPU kernel "cl_barrett32_76_gs_2"

Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 20 22:21 | 3780 82.0% | 0.795 2m18s | 2534.74 99894 0.00%[/CODE]

None of the mfakto.ini changes I made for 0.14 were needed - this is w/ default values for mfakto.ini in 0.15pre8.

A bit of exponent cherry picking and some overclocking, and we can hit 3200GHz-d/day:

[CODE]
got assignment: exp=1140863 bit_min=69 bit_max=70 (104.80 GHz-days)
Starting trial factoring M1140863 from 2^69 to 2^70 (104.80 GHz-days)
Using GPU kernel "cl_barrett32_76_gs_2"

Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jan 20 22:46 | 61 1.5% | 2.937 46m18s | 3211.48 81206 0.00%
[/CODE]

perftest snippets [Default Clocks]:
[CODE]
Resulting speed for M2000093:
bit_min - bit_max GHz-days/day kernelname
60 - 64 2424.561 cl_barrett15_69_gs
64 - 76 2779.709 cl_barrett32_76_gs
76 - 77 2575.103 cl_barrett32_77_gs
77 - 87 2454.118 cl_barrett32_87_gs
87 - 88 2271.072 cl_barrett32_88_gs
88 - 92 2152.565 cl_barrett32_92_gs

Resulting speed for M39000037:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1918.657 cl_barrett15_69_gs
64 - 76 2226.954 cl_barrett32_76_gs
76 - 77 2054.627 cl_barrett32_77_gs
77 - 87 1952.592 cl_barrett32_87_gs
87 - 88 1799.562 cl_barrett32_88_gs
88 - 92 1695.062 cl_barrett32_92_gs

Resulting speed for M66362159:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1919.665 cl_barrett15_69_gs
64 - 76 2227.070 cl_barrett32_76_gs
76 - 77 2018.324 cl_barrett32_77_gs
77 - 87 1952.919 cl_barrett32_87_gs
87 - 88 1800.070 cl_barrett32_88_gs
88 - 92 1695.632 cl_barrett32_92_gs

Resulting speed for M74000077:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1858.822 cl_barrett15_69_gs
64 - 76 2150.142 cl_barrett32_76_gs
76 - 77 1977.225 cl_barrett32_77_gs
77 - 87 1882.914 cl_barrett32_87_gs
87 - 88 1731.277 cl_barrett32_88_gs
88 - 92 1632.379 cl_barrett32_92_gs

Resulting speed for M78000071:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1845.433 cl_barrett15_69_gs
64 - 76 2143.972 cl_barrett32_76_gs
76 - 77 1977.783 cl_barrett32_77_gs
77 - 87 1878.918 cl_barrett32_87_gs
87 - 88 1732.445 cl_barrett32_88_gs
88 - 92 1629.175 cl_barrett32_92_gs

Resulting speed for M332900047:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1706.641 cl_barrett15_69_gs
64 - 76 1990.833 cl_barrett32_76_gs
76 - 77 1835.323 cl_barrett32_77_gs
77 - 87 1740.026 cl_barrett32_87_gs
87 - 88 1603.641 cl_barrett32_88_gs
88 - 92 1505.604 cl_barrett32_92_gs

Resulting speed for M999900079:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1662.079 cl_barrett15_69_gs
64 - 76 1928.562 cl_barrett32_76_gs
76 - 77 1771.161 cl_barrett32_77_gs
77 - 87 1684.428 cl_barrett32_87_gs
87 - 88 1546.434 cl_barrett32_88_gs
88 - 92 1454.844 cl_barrett32_92_gs

Resulting speed for M2001862367:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1582.528 cl_barrett15_69_gs
64 - 76 1855.355 cl_barrett32_76_gs
76 - 77 1711.931 cl_barrett32_77_gs
77 - 87 1619.642 cl_barrett32_87_gs
87 - 88 1493.767 cl_barrett32_88_gs
88 - 92 1398.778 cl_barrett32_92_gs

Resulting speed for M4201971233:
bit_min - bit_max GHz-days/day kernelname
60 - 64 1554.085 cl_barrett15_69_gs
64 - 76 1805.688 cl_barrett32_76_gs
76 - 77 1656.103 cl_barrett32_77_gs
77 - 87 1574.842 cl_barrett32_87_gs
87 - 88 1444.049 cl_barrett32_88_gs
88 - 92 1357.261 cl_barrett32_92_gs
[/CODE]

Ethan (EO) 2022-01-22 21:51

Self-Test Failure / Missed Factors with cl_barrett15_*_gs on Radeon VII (mfakto 0.15pre8)
 
I'm getting missed factors from cl_barrett15_*_gs on Radeon VII with mfakto 0.15pre8 (on linux).
0.14 st and st2 complete without error on the same machine. Let me know if any information other than what I'm including below is helpful.

[CODE]
Self-test statistics
number of tests 335250
successful tests 331150
no factor found 4100

self-test FAILED!
[/CODE]

[CODE]
mfakto# cat mfakto_radeonVII_st2.log | grep failed | grep barrett15 | wc
4100 24600 244885
[/CODE]

[CODE]
ERROR: self-test failed for M60008387 (cl_barrett15_71_gs)
ERROR: self-test failed for M60008387 (cl_barrett15_70_gs)
ERROR: self-test failed for M60008387 (cl_barrett15_69_gs)
ERROR: self-test failed for M60008387 (cl_barrett15_73_gs)
ERROR: self-test failed for M60005497 (cl_barrett15_74_gs)
ERROR: self-test failed for M60005497 (cl_barrett15_71_gs)
ERROR: self-test failed for M60005497 (cl_barrett15_70_gs)
ERROR: self-test failed for M60005497 (cl_barrett15_69_gs)
ERROR: self-test failed for M60005497 (cl_barrett15_73_gs)
ERROR: self-test failed for M332193203 (cl_barrett15_74_gs)
ERROR: self-test failed for M332193203 (cl_barrett15_71_gs)
ERROR: self-test failed for M332193203 (cl_barrett15_70_gs)
ERROR: self-test failed for M332193203 (cl_barrett15_69_gs)
ERROR: self-test failed for M332193203 (cl_barrett15_73_gs)
ERROR: self-test failed for M800007823 (cl_barrett15_74_gs)
ERROR: self-test failed for M800007823 (cl_barrett15_71_gs)
ERROR: self-test failed for M800007823 (cl_barrett15_70_gs)
ERROR: self-test failed for M800007823 (cl_barrett15_69_gs)
ERROR: self-test failed for M800007823 (cl_barrett15_73_gs)
ERROR: self-test failed for M800005699 (cl_barrett15_74_gs)
ERROR: self-test failed for M800005699 (cl_barrett15_71_gs)
ERROR: self-test failed for M800005699 (cl_barrett15_70_gs)
ERROR: self-test failed for M800005699 (cl_barrett15_69_gs)
ERROR: self-test failed for M800005699 (cl_barrett15_73_gs)
ERROR: self-test failed for M800003137 (cl_barrett15_74_gs)
ERROR: self-test failed for M800003137 (cl_barrett15_71_gs)
ERROR: self-test failed for M800003137 (cl_barrett15_70_gs)
ERROR: self-test failed for M800003137 (cl_barrett15_69_gs)
ERROR: self-test failed for M800003137 (cl_barrett15_73_gs)
ERROR: self-test failed for M800002757 (cl_barrett15_74_gs)
ERROR: self-test failed for M800002757 (cl_barrett15_71_gs)
ERROR: self-test failed for M800002757 (cl_barrett15_70_gs)
ERROR: self-test failed for M800002757 (cl_barrett15_69_gs)
ERROR: self-test failed for M800002757 (cl_barrett15_73_gs)
[snipped]
[/CODE]


[CODE]
mfakto 0.15pre8 (64-bit build)


Runtime options
INI file mfakto.ini
Verbosity 3
SieveOnGPU yes
MoreClasses yes
GPUSievePrimes 81157
GPUSieveProcessSize 24 Kib
WARNING: GPUSieveSize=128M must be a multiple of GPUSieveProcessSize=24k, adjusting GPUSieveSize to 126M
GPUSieveSize 126 Mib
FlushInterval 0
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300 s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID EO
ComputerID Highland2017
ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait"
ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%"
TimeStampInResults yes
VectorSize 2
GPUType AUTO
SmallExp no
UseBinfile mfakto_Kernels.elf
Compile-time options

Select device - Get device info:
Device 1/1: gfx906 (Advanced Micro Devices, Inc.),
device version: OpenCL 2.0 AMD-APP (3180.7), driver version: 3180.7 (PAL,HSAIL)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_
khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_o
ps cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p
Global memory:17163091968, Global memory cache: 16384, local memory: 65536, workgroup size: 256, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:1800, compute units:60

OpenCL device info
name gfx906 (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 2.0 AMD-APP (3180.7) (3180.7 (PAL,HSAIL))
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 60 (3840 compute elements)
clock rate 1800 MHz

Automatic parameters
threads per grid 0
optimizing kernels for GCNF

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -DGCNF -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
BUILD OUTPUT

END OF BUILD OUTPUT

GPUSievePrimes (adjusted) 81206
GPUsieve minimum exponent 1037054
[/CODE]

ixfd64 2022-04-19 20:25

Anyone know if mfakto works with Intel Arc GPUs in its current state?

James Heinrich 2022-10-18 13:59

[QUOTE=ixfd64;604292]Anyone know if mfakto works with Intel Arc GPUs in its current state?[/QUOTE]I'll re-ask this question since Intel Arc is now available in more flavours.

kriesel 2022-10-18 15:04

No first-hand data, but [URL]https://www.techpowerup.com/gpu-specs/arc-a750.c3929[/URL] indicates OpenCl 3.0, which may be an issue in gpuowl or mfakto since 3.0 is a subset of 2.0. FP32 & FP64 theoretical specs look similar to Radeon VII although memory bandwidth is half that of a Radeon VII. For the [URL="https://www.techpowerup.com/gpu-specs/arc-a770.c3914"]a770[/URL], FP32 & FP64 theoretical are a bit faster, but memory bandwidth & OpenCl unchanged. Prices are good, if you can find them in stock somewhere near MSRP.

M344587487 2022-10-18 16:25

[QUOTE=kriesel;615952]No first-hand data, but [URL]https://www.techpowerup.com/gpu-specs/arc-a750.c3929[/URL] indicates OpenCl 3.0, which may be an issue in gpuowl or mfakto since 3.0 is a subset of 2.0. FP32 & FP64 theoretical specs look similar to Radeon VII although memory bandwidth is half that of a Radeon VII. For the [URL="https://www.techpowerup.com/gpu-specs/arc-a770.c3914"]a770[/URL], FP32 & FP64 theoretical are a bit faster, but memory bandwidth & OpenCl unchanged. Prices are good, if you can find them in stock somewhere near MSRP.[/QUOTE]
I was under the impression that they didn't implement hardware FP64 on consumer cards, so am suspicious of the 1:4 FP64 statistic at least.

kriesel 2022-10-18 17:41

[QUOTE][URL="https://www.digitaltrends.com/computing/intels-arc-alchemist-gpu-requirements-are-raising-eyebrows/"]Intel says[/URL] the Arc A770 and A750 require a 10th-gen Intel CPU or AMD Ryzen 3000 CPU or newer. That’s because [URL="https://www.digitaltrends.com/computing/intel-arc-alchemist-specs-rumors-news-release-date/"]Arc Alchemist cards[/URL] benefit a lot from [URL="https://www.digitaltrends.com/computing/nvidia-resizable-bar-explained/"]Resizable BAR[/URL], which is only available on the last few generations of processors. The cards will work with older CPUs, but you’ll have much lower performance if ReBAR is turned off.[/QUOTE][url]https://www.digitaltrends.com/computing/intel-arc-a770-a750-review/[/url]

syjytg 2022-11-15 02:34

How to download mfakto for GPU trial factoring?

AlvinBunk 2022-11-15 02:51

mfakto download
 
You can download from here: [url]https://download.mersenne.ca/mfakto[/url]

aperson1 2022-12-07 23:50

I've just downloaded this for my Vega 20 built-in laptop GPU and it's a clear improvement above CPU factoring, but I've been having some minor issues I was curious if anyone knew any workarounds/optimizations for.

First of all, I'm getting some performance stuttering: Looking at the compute-1 process on my task manager, it goes from 100% usage to 0% intermittently, and running the program starts more efficient (~250-300 ghzd/day) before slowly getting less efficient over the next minute (down to ~190 ghzd/day) I would guess because of the stuttering.

In the .ini file, I lowered GPUSieveSize from the default 96 down to 16, which helped significantly but did not solve the problem, and I seemed to be getting diminishing returns rather than an actual "no stuttering" point.

---

Additionally, it seems to be significantly affecting my CPU performance running Prime95, bringing the speed it does both TF and P-1 work to ~55% (for some reason, Prime95 idly uses just ~30% of my CPU when there's 70% unused power, and using this program lowers that further to just ~15%).

I know lowering the CPU portion of GPU factoring lowers its efficiency, but what particular settings work best to minimize interference with Prime95's computing work, if any?

aperson1 2022-12-08 00:55

PS- I've solved the CPU utilization problem. For some reason, my single worker was set to only use 4 cores of my 8 core CPU. I've doubled up into two workers of 4 each and it works much more efficiently.

kriesel 2022-12-08 01:08

[QUOTE=aperson1;619201]I've just downloaded this for my Vega 20 built-in laptop GPU and it's a clear improvement above CPU factoring, but I've been having some minor issues I was curious if anyone knew any workarounds/optimizations for.

First of all, I'm getting some performance stuttering: Looking at the compute-1 process on my task manager, it goes from 100% usage to 0% intermittently, and running the program starts more efficient (~250-300 ghzd/day) before slowly getting less efficient over the next minute (down to ~190 ghzd/day) I would guess because of the stuttering.

In the .ini file, I lowered GPUSieveSize from the default 96 down to 16, which helped significantly but did not solve the problem, and I seemed to be getting diminishing returns rather than an actual "no stuttering" point.

---

Additionally, it seems to be significantly affecting my CPU performance running Prime95, bringing the speed it does both TF and P-1 work to ~55% (for some reason, Prime95 idly uses just ~30% of my CPU when there's 70% unused power, and using this program lowers that further to just ~15%).

I know lowering the CPU portion of GPU factoring lowers its efficiency, but what particular settings work best to minimize interference with Prime95's computing work, if any?[/QUOTE]
Sounds like you're using the IGP (low end GPU that is part of the CPU package). That will necessarily impair CPU throughput since they share the same wattage limit. They also share the same memory & bandwidth limit, but GPU TF does not use much memory. If the CPU supports hyperthreading, note that for P-1, PRP, LLDC, hyperthreading usually is not of benefit, so one thread per core will show 50% utilization in Task Manager by prime95. It should show almost no CPU usage for mfakto or other GPU apps; you want nearly all computation in a GPU app to be performed by the GPU, including sieving of candidate factors when possible. (Kernels ending in _gs)

[QUOTE=syjytg;617788]How to download mfakto for GPU trial factoring?[/QUOTE]
For how to get started, and optimize, see reference info [url]https://mersenneforum.org/showthread.php?t=24607[/url]
igp thread [url]https://www.mersenneforum.org/showthread.php?t=25717[/url]
mfakto thread [url]https://www.mersenneforum.org/showthread.php?t=23394[/url]

aperson1 2022-12-08 06:27

[QUOTE=kriesel;619205]Sounds like you're using the IGP (low end GPU that is part of the CPU package). That will necessarily impair CPU throughput since they share the same wattage limit. They also share the same memory & bandwidth limit, but GPU TF does not use much memory. If the CPU supports hyperthreading, note that for P-1, PRP, LLDC, hyperthreading usually is not of benefit, so one thread per core will show 50% utilization in Task Manager by prime95. It should show almost no CPU usage for mfakto or other GPU apps; you want nearly all computation in a GPU app to be performed by the GPU, including sieving of candidate factors when possible. (Kernels ending in _gs)[/QUOTE]

Ack! You're entirely right on all counts, I feel silly for not realizing that, I've really got no excuse not to have considered that, but that's what stupid questions are all about! :grin:

See my above post for solving the unusually low utilization, but everything else checks out with my situation.

28add11 2022-12-11 04:51

Either ARC Isn't Working or...
 
So I've been trying to get mfakto working on my ARC GPU so I can feed my growing GIMPS addiction, but it doesn't seem to be working whatsoever. That or I'm just stupid. The issue that I am running into more specifically is that, after turning mfakto on, my PC gets unbearably slow, and then after a while just shuts the display off and seemingly stops running the program, since the GPU cools down a lot. I am running an ARC A750 on the newest drivers, and have downloaded mfakto from the source in this thread a few posts back. I disabled the -O flag since it was bringing up issues during compilation. All other settings are default iirc.

Sorry if this is something that has already been answered or is a bad question, I'm new here!
Many thanks - 39

P.S. If anyone wants to try something on my GPU within reason I'm fine with it, Thanks!

rebirther 2022-12-11 18:19

[QUOTE=28add11;619447]So I've been trying to get mfakto working on my ARC GPU so I can feed my growing GIMPS addiction, but it doesn't seem to be working whatsoever. That or I'm just stupid. The issue that I am running into more specifically is that, after turning mfakto on, my PC gets unbearably slow, and then after a while just shuts the display off and seemingly stops running the program, since the GPU cools down a lot. I am running an ARC A750 on the newest drivers, and have downloaded mfakto from the source in this thread a few posts back. I disabled the -O flag since it was bringing up issues during compilation. All other settings are default iirc.

Sorry if this is something that has already been answered or is a bad question, I'm new here!
Many thanks - 39

P.S. If anyone wants to try something on my GPU within reason I'm fine with it, Thanks![/QUOTE]

Iam also interested in Intel ARC GPUs support, we have some users with some cards, every power is counting.

kracker 2022-12-11 18:42

[QUOTE=28add11;619447]So I've been trying to get mfakto working on my ARC GPU so I can feed my growing GIMPS addiction, but it doesn't seem to be working whatsoever. That or I'm just stupid. The issue that I am running into more specifically is that, after turning mfakto on, my PC gets unbearably slow, and then after a while just shuts the display off and seemingly stops running the program, since the GPU cools down a lot. I am running an ARC A750 on the newest drivers, and have downloaded mfakto from the source in this thread a few posts back. I disabled the -O flag since it was bringing up issues during compilation. All other settings are default iirc.

Sorry if this is something that has already been answered or is a bad question, I'm new here!
Many thanks - 39

P.S. If anyone wants to try something on my GPU within reason I'm fine with it, Thanks![/QUOTE]

The output of --perftest will be useful, if it even works.
Also, I'm out of the loop but I hear Intel Arc drivers are really bad at this point in time.

28add11 2022-12-11 19:21

1 Attachment(s)
Well I ran --perftest and everything went fine, logs are attached. After that, thinking that it might've just fixed itself I ran mfakto aand it happened again. This time even with an error code!
[CODE]
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Error -5 (Out of resources): clEnqueueReadBuffer RES failed.1206 0.00%
ERROR from tf_class.
[/CODE]

Also I had notepad open, and it managed to turn the entire top bar black. ARC is fun.

kracker 2022-12-12 01:59

[QUOTE=28add11;619476]Well I ran --perftest and everything went fine, logs are attached. After that, thinking that it might've just fixed itself I ran mfakto aand it happened again. This time even with an error code!
[CODE]
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Error -5 (Out of resources): clEnqueueReadBuffer RES failed.1206 0.00%
ERROR from tf_class.
[/CODE]

Also I had notepad open, and it managed to turn the entire top bar black. ARC is fun.[/QUOTE]

I would set GPUType to INTEL with those results, and play around with GPUSievePrimes, GPUSieveProcessSize and GPUSieveSize(lowering them most likely).

kriesel 2022-12-12 14:39

1 Attachment(s)
Perhaps set VectorSize to match what GPU-Z advanced tab, OpenCL category, native vector sizes, indicates, for the GPU in question, first. (Example screen shot is illustrative of navigating GPU-Z only, not for an ARC GPU, as I don't have one.)

Welcome, 28add11, to the forum and the hunt. You may find some of the [URL="https://mersenneforum.org/showthread.php?t=24607"]reference info collection[/URL] useful or interesting.

chris2be8 2022-12-12 16:23

How old are the PC and the GPU? If more than a few weeks old start by clearing dust etc out of the system.
Then install temperature monitoring software (or start it if already installed) and see how hot things get during a run.
Sorry I can't be more specific, I don't use Windows.

And welcome to the forum.

28add11 2022-12-13 00:55

[QUOTE=kriesel;619535]Perhaps set VectorSize to match what GPU-Z advanced tab, OpenCL category, native vector sizes, indicates, for the GPU in question, first. [/QUOTE]
Just judging by how things look in a quick test run, I think you're right! I'll test it out more and make sure everything is ok during the time when I have to go out later. We shall see!

28add11 2022-12-13 23:06

After doing some longer tests, it seems kinda hit or miss. Overnight it crashed with the same error, but while I was at class it worked fine. Probably an issue on Intel's end, but that'll hopefully be fixed soon with how fast they're working on drivers. Thanks for the help!

rebirther 2022-12-22 10:32

[QUOTE=kracker;619495]I would set GPUType to INTEL with those results, and play around with GPUSievePrimes, GPUSieveProcessSize and GPUSieveSize(lowering them most likely).[/QUOTE]

Any news for ARC support? I have asked on our project forum to collect some datas.

M344587487 2022-12-23 08:34

Is this the latest version of mfakto that should be used? [url]https://github.com/Bdot42/mfakto[/url]

preda's repo is gone so I'm out of the loop.

rebirther 2022-12-23 09:20

[QUOTE=M344587487;620664]Is this the latest version of mfakto that should be used? [url]https://github.com/Bdot42/mfakto[/url]

preda's repo is gone so I'm out of the loop.[/QUOTE]

Yes, it is.

rebirther 2022-12-23 15:48

1 Attachment(s)
Here is an output file from an user with Intel ARC 770

rebirther 2022-12-23 15:59

[QUOTE=28add11;619476]Well I ran --perftest and everything went fine, logs are attached. After that, thinking that it might've just fixed itself I ran mfakto aand it happened again. This time even with an error code!
[CODE]
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Error -5 (Out of resources): clEnqueueReadBuffer RES failed.1206 0.00%
ERROR from tf_class.
[/CODE]

Also I had notepad open, and it managed to turn the entire top bar black. ARC is fun.[/QUOTE]

How did you run a successful perftest? Did you change something?

rebirther 2022-12-24 00:25

1 Attachment(s)
[QUOTE=rebirther;620689]How did you run a successful perftest? Did you change something?[/QUOTE]

We got it running. Perftest was successful and a test run with worktodo.txt. The GPUtype must be changed to Intel in .ini file.

James Heinrich 2022-12-24 00:58

[QUOTE=rebirther;620744]We got it running. Perftest was successful and a test run with worktodo.txt. The GPUtype must be changed to Intel in .ini file.[/QUOTE]I didn't see the results of your testrun in the attachment. I tried to extrapolate the needed numbers from what you did attach, but if you could [url=https://www.mersenne.ca/mfaktc.php#benchmark]submit a benchmark[/url] I would be very grateful.

rebirther 2022-12-24 07:13

[QUOTE=James Heinrich;620751]I didn't see the results of your testrun in the attachment. I tried to extrapolate the needed numbers from what you did attach, but if you could [url=https://www.mersenne.ca/mfaktc.php#benchmark]submit a benchmark[/url] I would be very grateful.[/QUOTE]

This was a local test, still need to setup the app.

Factor=3740120ADE828BE00071573643F00001,128002051,74,75

15min27s on INTEL ARC 770

James Heinrich 2022-12-24 15:26

[QUOTE=rebirther;620841]Factor=3740120ADE828BE00071573643F00001,128002051,74,75
15min27s on INTEL ARC 770[/QUOTE]That gives a result 50% faster than what I saw in the perftest attachment... :unsure:
When you're running, do see closer to 1800 or 2800 GHz-days/day?

A proper benchmark submission with all the requested fields would still be appreciated.

rebirther 2022-12-24 21:12

[QUOTE=James Heinrich;620855]That gives a result 50% faster than what I saw in the perftest attachment... :unsure:
When you're running, do see closer to 1800 or 2800 GHz-days/day?

A proper benchmark submission with all the requested fields would still be appreciated.[/QUOTE]

This was a card from a user, Iam trying to find some results but no one has attached an Intel ARC host yet, still need some tweaking with the server setup.

ixfd64 2022-12-25 17:30

[QUOTE=M344587487;620664]Is this the latest version of mfakto that should be used? [url]https://github.com/Bdot42/mfakto[/url]

preda's repo is gone so I'm out of the loop.[/QUOTE]

All the development is being done on the main repository now: [url]https://github.com/Bdot42/mfakto[/url]

Magellan3s 2022-12-26 17:32

I am getting the following error

[code]g++ sieve.o timer.o parse.o read_config.o mfaktc.o checkpoint.o signal_handler.o filelocking.o output.o mfakto.o gpusieve.o perftest.o menu.o kbhit.o -m64 -O3 -funroll-loops -ffast-math -finline-functions -frerun-loop-opt -fgcse-sm -fgcse-las -flto -L/opt/rocm/opencl/lib/x86_64 -lOpenCL -o ../mfakto
/usr/bin/ld: cannot find -lOpenCL
collect2: error: ld returned 1 exit status
make: *** [Makefile:94: ../mfakto] Error 1
jesus@Magellan:~/Mfacto/mfakto-master/src$
[/code]

rebirther 2022-12-30 08:21

[QUOTE=rebirther;620876]This was a card from a user, Iam trying to find some results but no one has attached an Intel ARC host yet, still need some tweaking with the server setup.[/QUOTE]

[QUOTE]no factor for M131174377 from 2^74 to 2^75 [mfakto 0.15pre7-MGW cl_barrett32_76_gs_2]
tf(): total time spent: 24m 10.213s (1737.73 GHz-days / day)

08:56:56 (11896): mfakto.exe exited; CPU time 1443.187500[/QUOTE]

Result from an Intel ARC 770 on linux on SRBase.

28add11 2022-12-30 19:23

I haven't checked the forum in a few days and it looks like you guys are really getting into the swing of things with ARC support. Since I have an A750 card if anyone would like me to run tests on it or help out I would be more than happy to help.
A small note: I downloaded mfakto from [URL="https://download.mersenne.ca/mfakto"]here[/URL] but by the looks of things there's a more recently updated repo, so if anyone could provide instructions on how to build that it would be much appreciated, Thanks!

kriesel 2023-01-16 15:09

Does there exist an mfaktc build that will work on Google Colab in either its Ubuntu 18.04 or 20.04 VM incarnations currently, which appear unpredictably, usually 18.04? I'm getting Cudart version discrepancies with 10.0, or glibc / libstdc issues with the cuda 12.0 for linux mmfsktc build, and updating 18.04 does not resolve that issue for mmff so likely won't for mfaktc.

Mark Rose 2023-01-16 15:51

[QUOTE=kriesel;622698]Does there exist an mfaktc build that will work on Google Colab in either its Ubuntu 18.04 or 20.04 VM incarnations currently, which appear unpredictably, usually 18.04? I'm getting Cudart version discrepancies with 10.0, or glibc / libstdc issues with the cuda 12.0 for linux mmfsktc build, and updating 18.04 does not resolve that issue for mmff so likely won't for mfaktc.[/QUOTE]

Wrong thread, but I've always had to recompile mfaktc for different versions of CUDA.

That being said, it's very quick to compile if nvidia-cuda-dev is installed: even on an old two core/two thread machine it takes less than 15 seconds.


All times are UTC. The time now is 13:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.