![]() |
|
|
#276 | |
|
Serpentine Vermin Jar
Jul 2014
2×13×131 Posts |
Quote:
|
|
|
|
|
|
|
#277 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
It is the result check-in process that looks for the Gerbicz keyword. If there it sets the result_state column in a SQL table to either "composite by prp" or "composite by reliable prp". Aaron keys of the result state to print "reliable" in his report (not the prp_type).
Last fiddled with by Prime95 on 2018-05-06 at 15:55 |
|
|
|
|
|
#278 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
Code:
void memtest(int s, int iter, int device)
{
int i, j, k, m;
int q = 60000091;
int n = 3200 * 1024;
int rand_int;
int *i_data;
double *d_data;
double *dev_data1;
int *d_compare;
int h_compare;
int total = 0;
int total_iterations;
int iterations_done = 0;
float percent_done = 0.0f;
timeval time0, time1;
unsigned long long diff;
unsigned long long diff1;
unsigned long long diff2;
unsigned long long ttime = 0;
double total_bytes;
size_t global_mem, free_mem;
cudaMemGetInfo(&free_mem, &global_mem);
printf("CUDA reports %lluM of %lluM GPU memory free.\n", (unsigned long long)free_mem/1024/1024, (unsigned long long)global_mem/1024/1024);
if((size_t) s *1024 * 1024 * 25 > free_mem )
{
s = free_mem / 1024 / 1024 / 25;
printf("Reducing size to %d\n", s);
}
printf("\nInitializing memory test using %0.0fMB of memory on device %d...\n", n / 1024.0 * s / 1024.0 * 8.0, device);
i_data = (int *) malloc (sizeof (int) * n);
d_data = (double *) malloc (sizeof (double) * n * 5);
alloc_gpu_mem(n);
write_gpu_data(q, n);
srand(time(0));
for (j = 0; j < n; j++) i_data[j] = 1;
cudaMemcpy (g_xint, i_data, sizeof (int) * n, cudaMemcpyHostToDevice);
apply_weights <<<n /1024,128 >>> (g_x, g_xint, g_ttp, g_s);
cudaMemcpy (&d_data[0 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);
for (j = 0; j < n; j++)
{
rand_int = rand() % (1 << 18);
rand_int -= (1 << 17);
i_data[j] = rand_int;
}
cudaMemcpy (g_xint, i_data, sizeof (int) * n, cudaMemcpyHostToDevice);
apply_weights <<<n /1024,128 >>> (g_x, g_xint, g_ttp, g_s);
cudaMemcpy (&d_data[1 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);
cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE);
cudaMemcpy (&d_data[2 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);
square <<< n / (4 * g_thr[0]), g_thr[0] >>> (n, g_x, g_ct);
cudaMemcpy (&d_data[3 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);
cufftExecZ2Z (g_plan, (cufftDoubleComplex *) g_x, (cufftDoubleComplex *) g_x, CUFFT_INVERSE);
cudaMemcpy (&d_data[4 * n], g_x, sizeof (double) * n, cudaMemcpyDeviceToHost);
free(i_data);
free(g_size1);
free_gpu(1);
cutilSafeCall (cudaMalloc ((void **) &d_compare, sizeof (int)));
cutilSafeCall (cudaMemset (d_compare, 0, sizeof (int)));
cutilSafeCall (cudaMalloc ((void **) &dev_data1, sizeof (double) * n * s));
total_iterations = s * 5 * iter;
iter *= 10000;
printf("Beginning test.\n\n");
fflush(NULL);
gettimeofday (&time0, NULL);
for(j = 0; j < s; j++)
{
m = (j + 1) % s;
for(i = 0; i < 5; i++)
{
cutilSafeCall (cudaMemcpy (&dev_data1[j * n], &d_data[i * n], sizeof (double) * n, cudaMemcpyHostToDevice));
for(k = 1; k <= iter; k++)
{
copy_kernel <<<n / 512, 512 >>> (dev_data1, n, j, m);
compare_kernel<<<n / 512, 512>>> (&dev_data1[m * n], &dev_data1[j * n], d_compare);
if(k%100 == 0) cutilSafeThreadSync();
if(k%10000 == 0)
{
cutilSafeCall (cudaMemcpy (&h_compare, d_compare, sizeof (int), cudaMemcpyDeviceToHost));
cutilSafeCall (cudaMemset (d_compare, 0, sizeof (int)));
total += h_compare;
iterations_done++;
percent_done = iterations_done * 100 / (float) total_iterations;
gettimeofday (&time1, NULL);
diff = time1.tv_sec - time0.tv_sec;
diff1 = 1000000 * diff + time1.tv_usec - time0.tv_usec;
time0.tv_sec = time1.tv_sec;
time0.tv_usec = time1.tv_usec;
ttime += diff1;
diff2 = ttime * (total_iterations - iterations_done) / iterations_done / 1000000;
total_bytes = 244140625 / (double) diff1;
printf("Position %d, Data Type %d, Iteration %d, Errors: %d, completed %2.2f%%, Read %0.2fGB/s, Write %0.2fGB/s, ETA ",
j, i, iterations_done * 100000, total, percent_done, 3.0 * total_bytes, total_bytes);
print_time_from_seconds ((unsigned) diff2, NULL, 0);
printf (")\n");
fflush(NULL);
}
}
}
}
printf("Test complete. Total errors: %d.\n", total);
fflush(NULL);
cutilSafeCall (cudaFree ((char *) dev_data1));
cutilSafeCall (cudaFree ((char *) d_compare));
free((char*) d_data);
}
|
|
|
|
|
|
|
#279 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Equivalently, one might ask, how do video ram testers work?
https://www.raymond.cc/blog/having-p...st-its-memory/ These are about system memory: https://docs.microsoft.com/en-us/win...ages-in-memory https://stackoverflow.com/questions/...memory-so-fast |
|
|
|
|
|
#280 | |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
Quote:
to wrap back to the last known, to the beginning, or give a message and fux out (skip) and let the user decide (however, this would complicate the thingies with erasing the lines in the worktodo file).
Last fiddled with by LaurV on 2018-06-05 at 08:45 |
|
|
|
|
|
|
#281 |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
Usually, the PRP data-set is composed of two sets of bits, of equal size (N bits, N being the exponent), that I call "data" and "check". The "data" bits are the usual 3^k (k being the iteration) of the PRP algo, while the check bits are based on Robert's *cool* PRP error checking idea.
So, a PRP save-file contains the two sets (data + check). This is about double the size of an equivalent LL save-file (because the LL does not have "check" to save). An example: for a 300M exponent, a PRP save-file is about 75MB (i.e.: (300M*2/8 bytes), and not-compressible. I just realized that, in fact, there is a simple way to derive the "data" bits from the "check" bits. This halves the size of the save-file, as it's not longer required to save the "data", only the "check" being enough. The tradeoff is a bit of computation (equivalent to 2*L iterations, where L is the "Gerbicz error checking" block size), that needs to be done when loading the save-file, to reconstruct "data" from "check". This size reduction, while nice for the disk, is more relevant if one want to send "PRP data" over the network, in the future. (basically, starting with C being the check bits, a sequence of: D = C; repeat(L-1): { D=D^2; D *= C; } D *= 3; reconstructs data D) Last fiddled with by preda on 2018-06-06 at 04:52 Reason: de-dup newlines |
|
|
|
|
|
#282 |
|
Serpentine Vermin Jar
Jul 2014
65168 Posts |
Just for general purpose... When the mod hilariously changes the title of the discussion threads, it really blows searchability down the line. While it may seem amusing, Google and the other crawlers actually do pay attention to things like that and page titles (derived from the topic title on this forum) carry weight. Mixing up the words or curious spellings make it more difficult to find things you might be interested in.
|
|
|
|
|
|
#283 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
236568 Posts |
Quote:
|
|
|
|
|
|
|
#284 | |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
Quote:
My conclusion is that "variable offset" is not worth as a method of FFT range extension. The extension achieved, ~0.5%, is too small compared to the overhead. Note: I did the offset-change by incrementing the offset of the residue ("mul-2") vs. setting the offset to a random value, but IMO this does not affect the result (0.5%) in a significant way. Last fiddled with by preda on 2018-06-30 at 14:35 |
|
|
|
|
|
|
#285 |
|
Oct 2019
5·19 Posts |
What about doing LL iterations modulo k*Mp, where k's size is suitable and is highly-composite(for example, 2^32, 2^61, 3^41)? Would modulo k*Mp operation easily handled by IBDWT?
Last fiddled with by Fan Ming on 2020-02-25 at 16:04 |
|
|
|
|
|
#286 | |
|
Oct 2019
9510 Posts |
Quote:
If we do LL iterations modulo (2^2)*(2^37-1)=2^39-4, still N=4, the bit-length of every word will be(according to Percival, C.'s paperhttp://www.ams.org/journals/mcom/200...02-01419-9.pdf): {10, 9, 10, 8}, and the weight vector a(ditto): {1, 2^(3/4), 2^(1/2), 2^(5/4)}. Will modulo such composite numbers (in this example was 2^39-4) using IBDWT allowed in fact? If so, in this example, we can check if the result modulo 2^39-4 equals the same iteration modulo 2^2 after some iteration(this 4 is not good, only a example, better candidate should exist). So as acutal Mp multiply a sutiable k described in the post above. Last fiddled with by Fan Ming on 2020-02-26 at 07:18 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Stockfish / Lutefisk game, move 14 poll. Hungry for fish and black pieces. | MooMoo2 | Other Chess Games | 0 | 2016-11-26 06:52 |
| Redoing factoring work done by unreliable machines | tha | Lone Mersenne Hunters | 23 | 2016-11-02 08:51 |
| Unreliable AMD Phenom 9850 | xilman | Hardware | 4 | 2014-08-02 18:08 |
| [new fish check in] heloo | mwxdbcr | Lounge | 0 | 2009-01-14 04:55 |
| The Happy Fish thread | xilman | Hobbies | 24 | 2006-08-22 11:44 |