mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

kriesel 2019-01-03 16:57

[QUOTE=Neutron3529;504757]I use Imdisk to create a RAM drive.
The question is, when rewrite worktodo.txt, every byte in worktodo.txt must be changed, causing a lot of waste.
I think the reason why so slow is that I keep the priime95 running, which may slow down the rewrite the worktodo file[/QUOTE]
prime95 by design runs at a quite low priority, to use otherwise-idle cpu cycles, without slowing user applications and console interactive response speed. (See the end of undoc.txt)

kriesel 2019-01-03 17:30

[QUOTE=Neutron3529;504756]I use
[URL]https://www.mersenne.org/report_factoring_effort/[/URL]
to get a worktodo.txt file
So it is quite easy to get a 0KB worktodo.txt or a ~2.7M worktodo.txt[/QUOTE]
Hmm, unless I'm mistaken, that's not for assignments, that's a report generator page.
It's my understanding that assignments are issued through Primenet automatic connections in prime95 or mprime, or manually through
[URL]https://www.mersenne.org/manual_assignment/[/URL] or [URL]https://www.mersenne.org/manual_gpu_assignment/[/URL] and include unique assignment IDs in the worktodo records.

Output of [URL]https://www.mersenne.org/report_factoring_effort/[/URL] for 92M to 93M just now includes exponents I have assigned TF work for.
Exponents in such a report listing, even if current assignments are excluded, won't exclude assignments made in the near future either through automatic primenet connections or through the manual assignment pages. It seems likely to me, that the assignee and you will be duplicating work.

I did a quick test on [URL]https://www.mersenne.org/report_factoring_effort/[/URL] for 92000000 to 92002000. If I check exclude currently assigned exponents, and check worktodo format, and specify 76 bits, it does not include an assignment ID in its output, it provides a prime95 style manual assignment format record, and the resulting assignment if any does not show up in my current assignments retrieved afterward. Its output was: [CODE]Factor=N/A,92000429,75,76[/CODE]Checking the status of [URL]https://www.mersenne.org/report_exponent/?exp_lo=92000429&exp_hi=[/URL] afterward indicates no active TF assignment. Therefore, I conclude, that if I were to use this method to obtain a large list of exponents needing more factoring, and factored them over time, without taking other actions promptly to reserve them, meanwhile they very likely are being assigned to other people, by the usual mechanisms listed above, and for many if not most of the exponents so obtained, wasteful duplication of TF effort on the same exponent and bit level would occur as a result.
N/A is the dead giveaway in the record produced, that an assignment was not issued. That's where the lengthy AID would normally appear (32 upper case hexadecimal characters).

James Heinrich 2019-01-03 18:16

[QUOTE=Neutron3529;504756]So it is quite easy to get a 0KB worktodo.txt or a ~2.7M worktodo.txt[/QUOTE]Yes, but the question is, [b][i]why[/i][/b] are you generating a worktodo with ~100,000 entries (I can't call them assignments because they're not assigned).
It would be better throughput (both for you and for GIMPS) to get fewer assignments in the conventional way (ideally actually assigned to you so there's no risk of effort being duplicated).

James Heinrich 2019-01-03 18:21

[QUOTE=kriesel;504779]prime95 by design runs at a quite low priority, to use otherwise-idle cpu cycles, without slowing user applications and console interactive response speed[/QUOTE]Usually true, but can still have an effect in the right circumstances. For example, I'm currently running several large backup compressions with 7zip, and if I also have Prime95 P-1 running in the background it slows down 7zip by about 300%, I suspect the limiting resource is memory bandwidth and not CPU power (quad-channel, but only DDR3-1333). [i]Generally[/i] Prime95 has little effect on user applications, but there are always exceptions :smile:

ixfd64 2019-01-03 21:23

[QUOTE=James Heinrich;504715]That's a Windows thing, nothing to do with mfaktc.
Any program running in a command window will be suspended while you're marking/selecting text. You can also suspend a program with the Pause/Break key, and resume by hitting any (other?) key.[/QUOTE]

I see. However, I've noticed the GPU fans are still running at full power when the program is paused. What is the GPU actually doing during this time?

Uncwilly 2019-01-03 21:30

[QUOTE=James Heinrich;504791]Usually true, but can still have an effect in the right circumstances. For example, I'm currently running several large backup compressions with 7zip, and if I also have Prime95 P-1 running in the background it slows down 7zip by about 300%, I suspect the limiting resource is memory bandwidth and not CPU power (quad-channel, but only DDR3-1333). [i]Generally[/i] Prime95 has little effect on user applications, but there are always exceptions :smile:[/QUOTE]I have seen an Adobe install creeping along while Prime95 was running. I think it was trying to run at low priority. So both programs were "after you sir", 'no, after you sir', "no, after you sir", etc.

ixfd64 2019-01-04 04:45

I noticed an oddity in the timestamp. On my older system running mfaktc compiled for CUDA 6.5, the day is prepended by a zero when it is in the single digits:

[CODE][Mon Jan 08 22:50:07 2018][/CODE]

But on my laptop running mfaktc for CUDA 10, there is no preceding zero:

[CODE][Wed Jan 2 09:40:29 2019][/CODE]

Is there a reason for this inconsistency?

LaurV 2019-01-04 07:33

[QUOTE=James Heinrich;504378]No, that's not a common usecase.[/QUOTE]
+1. Misfit does a very good job of keeping the worktodo small, doing all the reports, and stuff. In fact, I don't see any reason of having large worktodo file either.

SELROC 2019-01-04 07:40

[QUOTE=LaurV;504881]+1. Misfit does a very good job of keeping the worktodo small, doing all the reports, and stuff. In fact, I don't see any reason of having large worktodo file either.[/QUOTE]


I reserved a large number of exponents when doing manual testing, but that was before setting up a loop script. Now the script will not catch new work items and consume the worktodo items until it reaches 3 items in the worktodo file.
3 items seems reasonable if you tune the catch/submit timeout accordingly.

R. Gerbicz 2019-01-04 08:19

From mfakto thread:
[QUOTE=R. Gerbicz;504831]Hm, reading the code and still don't know where do you sieve by p>11; confirmed that it is in SegSieve in gpusieve.cl, but as I can see we don't even call this in the runs:
say replacing line 1296 by big_bit_array32[j * threadsPerBlock + get_local_id(0)]=0; (this would eliminate all k values) or placing Visual Studio's breakpoints in the first line in SegSieve. Or even using
[CODE]
#define TRACE_SIEVE_KERNEL 5
// If above tracing is on, only the thread with the ID below will trace
#define TRACE_SIEVE_TID 2
[/CODE]
results we pass the selftest and we're seeing no break, no additonal debug info. Furthermore modifying the default GPUSievePrimes=81157 we get different times, suggesting that we're really sieving, but where?

One more thing that I've also seen on this forum, that at run it is displaying that the automatic parameter for threads per grid is 0:
[CODE]
OpenCL device info
name Intel(R) HD Graphics 530 (Intel(R) Corporation)
device (driver) version OpenCL 2.1 NEO (25.20.100.6471)
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 24 (24 compute elements)
clock rate 1150MHz

Automatic parameters
threads per grid 0
optimizing kernels for INTEL
[/CODE]
Still I've some ideas (not that many) to improve the current code but without understanding the basics of the code it is somewhat hard.[/QUOTE]

In mfaktc we can see that in the 1079th line and there are no those #define's, but the question is the same.
It could be something trivial.

TheJudger 2019-01-04 15:25

Hi Robert,

sieving of small primes starts right at the begining of SegSieve() in src/gpusieve.cu.
At this point (no sieving done yet on shared memory) George used 4 32 bit words (128 bit) of local variables (mask, mask2, mask3 and mask4) for sieving.
Don't be affraid about those [CODE]if (primesNotSieved == X)[/CODE] primesNotSieved is const and thus compiler knows the correct code path at compile time.

Oliver


All times are UTC. The time now is 23:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.